I forked the master GEDCOM::Parser ruby library at GitHub, and spent some time working on it. I made changes to the API that are not backward compatible, so we’ll see if the owner of the original repo decides to pull in my changes or not…
The main changes I made:
- Changed API for defining callbacks to a more ‘modern’ approach
- Supports reading from a filename or any IO instance
- Supports GEDCOM files with different line-endings (\r, \n, and \r\n), detected automatically
- Added specs for GEDCOM::Parser class (previously only had specs for date-parsing code)
The old API didn’t match modern ruby style, so I gave it a facelift. But the changes are more than just cosmetic. Consider this example (from samples/count.rb) –
class IndividualCounter < GEDCOM::Parser attr_reader :individuals attr_reader :families def initialize super @individuals = 0 @families = 0 setPreHandler [ "INDI" ], method( :countPerson ) setPreHandler [ "FAM" ], method( :countFamily ) end def countPerson( data, state, parm ) @individuals += 1 end def countFamily( data, state, parm ) @families += 1 end end parser = IndividualCounter.new
individuals = 0 families = 0 parser = GEDCOM::Parser.new do before "INDI" do individuals += 1 end before "FAM" do families += 1 end end
IMO, the new version reads a lot better and is more concise. See the other samples and specs for more examples.
GEDCOM line endings
As part of adding specs for the GEDCOM::Parser class, I added the gedcoms from the GEDCOM torture-test suite. These had files with different line endings, which the old parser did not recognize correctly.
The new version auto-detects the line ending used in the file by scanning ahead in the IO stream. It then uses that as the record-separating for splitting the file into lines.
Specs for GEDCOM::Parser
The original code contained a number of Rspec-based specs — but only for the date-parsing stuff. No tests/sepcs defined at all for the parser, which seemed like a pretty significant oversight. I ended up adding a number of specs, testing the new API and the various sample gedcom files. Still just minimal tests at this point, but it is at least something.
Now I’m ready to use this to start extracting some test data for some interface mockups. Hopefully those will be coming in the next few days.