I forked the master GEDCOM::Parser ruby library at GitHub, and spent some time working on it. I made changes to the API that are not backward compatible, so we’ll see if the owner of the original repo decides to pull in my changes or not…
The main changes I made:
- Changed API for defining callbacks to a more ‘modern’ approach
- Supports reading from a filename or any IO instance
- Supports GEDCOM files with different line-endings (\r, \n, and \r\n), detected automatically
- Added specs for GEDCOM::Parser class (previously only had specs for date-parsing code)
API Changes
The old API didn’t match modern ruby style, so I gave it a facelift. But the changes are more than just cosmetic. Consider this example (from samples/count.rb) –
class IndividualCounter < GEDCOM::Parser
attr_reader :individuals
attr_reader :families
def initialize
super
@individuals = 0
@families = 0
setPreHandler [ "INDI" ], method( :countPerson )
setPreHandler [ "FAM" ], method( :countFamily )
end
def countPerson( data, state, parm )
@individuals += 1
end
def countFamily( data, state, parm )
@families += 1
end
end
parser = IndividualCounter.new
individuals = 0
families = 0
parser = GEDCOM::Parser.new do
before "INDI" do
individuals += 1
end
before "FAM" do
families += 1
end
end
IMO, the new version reads a lot better and is more concise. See the other samples and specs for more examples.
GEDCOM line endings
As part of adding specs for the GEDCOM::Parser class, I added the gedcoms from the GEDCOM torture-test suite. These had files with different line endings, which the old parser did not recognize correctly.
The new version auto-detects the line ending used in the file by scanning ahead in the IO stream. It then uses that as the record-separating for splitting the file into lines.
Specs for GEDCOM::Parser
The original code contained a number of Rspec-based specs — but only for the date-parsing stuff. No tests/sepcs defined at all for the parser, which seemed like a pretty significant oversight. I ended up adding a number of specs, testing the new API and the various sample gedcom files. Still just minimal tests at this point, but it is at least something.
Now I’m ready to use this to start extracting some test data for some interface mockups. Hopefully those will be coming in the next few days.
3 comments ↓
Very cool. I’ll have to try it out. I had used the old one for some projects and it was quite good, but I like this new interface.
Have you also looked at Hans Fugal’s GEDCOM parser? Last time I asked Jamis Buck if he was planning on making updates to his parser (the one you forked), he referred me to Hans’ parser.
http://hans.fugal.net/tmp/redcom/
It hasn’t seen much action lately, and I don’t know the its stability or license. Here’s the conversation that took place on the Utah Ruby User Group list:
http://groups.google.com/group/urug/browse_thread/thread/4f269619703f4b81/85ed3cf6fc9a0b06?hl=en&lnk=gst&q=gedcom#85ed3cf6fc9a0b06
I did search for other GEDCOM parser implementations in ruby, but didn’t find anything (beyond the Ruby Quiz results). Even directly googling Hans Fugal for a gedcom parser doesn’t reveal anything. Looks like his ‘redcom’ parser didn’t go anywhere…
I like this parser implementation specifically because it is a stream parser. I don’t want the parser to populate an in-memory model — that is for the client to decide.
Leave a Comment