My GSOC Journey #1: So it begins…
The coding period of Google Summer of Code 2021 (GSOC) began on the 7th of June! It isn’t some sort of hard line, however. I had been writing code for almost two weeks beforehand. In a discussion with my mentors, we decided there really isn’t any point in waiting. At worst, I need the time. At best, I do more than I planned. Win-win.
I should mention, I’m contributing to Agents.jl, an agent-based modeling library written in Julia. My mentors are George Datseris (Datseris) and Tim DuBois (Libbum). My proposal is to add model serialization capability, as well as extending path-finding functionality. Model serialization basically means entire models can be saved to disk as a file, and loaded back. Pretty useful in a bunch of scenarios. Path-finding refers to enabling agents to find their way around obstacles.
In a discussion with my mentors, we decided it would be appropriate to also resolve a related issue as part of my GSOC. For this, I added the functionality to read/write agent data from/to CSV files. Agent data can be dumped to a CSV file, and CSV files containing agent data can be loaded into the model. This comes in handy for initializing models, since it’s fairly easy to generate a CSV with the required data.
The idea seemed pretty easy to implement. Then again, aren’t those famous last words? I started off by experimenting with CSV.jl.
I quickly wrote up the basic functionality, and a CSV file to test it. Unsurprisingly, the first error was that CSV.jl can’t parse columns containing Tuple
s. CSV.jl allows specifying what type a column is, and uses Base.parse
to convert it to that type. It doesn’t allow specifying any other parsing function. Since Base.parse
isn’t defined for parsing Tuple
s, it threw an error. Being able to parse tuples would be a useful feature since, for example, agent positions are specified that way. To attempt this, I defined a new method for Base.parse
(a function I don’t own) to Tuple{Int,Int}
(a type I don’t own).
Some of you might already have noticed what I’ve done, and are staring with horrified expressions. I can’t describe the gravity of the situation better than George did:
I didn’t know what type piracy was at the time. In short, I overloaded a function a don’t own for a type I also don’t own. Since I don’t own the type (and it’s a feature of the language), it’s highly likely (read: guaranteed) that someone using the library will use the type. It’s also highly likely someone will use the function too. Like me, they might accidentally use it on that (Tuple
) type. If I didn’t define a new method, they would get the same error I did. However, with my new method, the function runs somewhere it shouldn’t, and returns a possibly erroneous output. The error could be anything from a nice error message with a stack trace, to crashing Julia. Either way, the debug process likely wouldn’t be fun.
Over a video call with George and Tim, we decided it’s better to follow the standard convention of splitting such non-scalar values over multiple columns (such as pos_1
and pos_2
for a field pos::Tuple{Int,Int}
). Since the CSV files could have their columns in an arbitrary order, there needs to be a way to map each field of the agent struct to any column. To this end, the function allows specifying col_map
, a dictionary that maps keyword argument to column number. Through this, the user can specify any column number for a keyword argument to the agent constructor. It’s also likely that the columns would be in the same order as the corresponding arguments to the constructor, without needing to be passed as keywords. If col_map
isn’t specified, the entire row is provided as positional arguments. Another convenience is the ability to let row number correspond to the ID field of the agent, so the CSV file may not need an explicit ID column.
The final bit of flexibility happens through the magic that is metaprogramming. Wouldn’t it be annoying to have to specify types for all your columns, if they’re named the same as the fields in your struct anyway? To solve this, if the types aren’t explicitly provided the function automatically generates a mapping from field name to it’s type. This takes into account splitting tuples across multiple columns, so a field foo::Tuple{String,Int}
maps foo_1
to String
and foo_2
to Int
. Personally, I think this is the coolest feature of the three.
The merging of this PR concluded the first two weeks of coding which, ironically enough, ended right on time for the coding period to officially begin. I’ve done a lot since then, but that deserves its own post.