Updates to ruby GEDOM::Parser

I forked the master GEDCOM::Parser ruby library at GitHub, and spent some time working on it.  I made changes to the API that are not backward compatible, so we’ll see if the owner of the original repo decides to pull in my changes or not…

The main changes I made:

  • Changed API for defining callbacks to a more ‘modern’ approach
  • Supports reading from a filename or any IO instance
  • Supports GEDCOM files with different line-endings (\r, \n, and \r\n), detected automatically
  • Added specs for GEDCOM::Parser class (previously only had specs for date-parsing code)

API Changes

The old API didn’t match modern ruby style, so I gave it a facelift.  But the changes are more than just cosmetic.   Consider this example (from samples/count.rb) –

old version:

class IndividualCounter < GEDCOM::Parser
  attr_reader :individuals
  attr_reader :families

  def initialize
    super

    @individuals = 0
    @families = 0

    setPreHandler [ "INDI" ], method( :countPerson )
    setPreHandler [ "FAM" ],  method( :countFamily )
  end

  def countPerson( data, state, parm )
    @individuals += 1
  end

  def countFamily( data, state, parm )
    @families += 1
  end
end
parser = IndividualCounter.new

new version:

individuals = 0
families = 0
parser = GEDCOM::Parser.new do
  before "INDI" do
    individuals += 1
  end

  before "FAM" do
    families += 1
  end
end

IMO, the new version reads a lot better and is more concise.  See the other samples and specs for more examples.

GEDCOM line endings

As part of adding specs for the GEDCOM::Parser class, I added the gedcoms from the GEDCOM torture-test suite.  These had files with different line endings, which the old parser did not recognize correctly.

The new version auto-detects the line ending used in the file by scanning ahead in the IO stream.  It then uses that as the record-separating for splitting the file into lines.

Specs for GEDCOM::Parser

The original code contained a number of Rspec-based specs — but only for the date-parsing stuff.  No tests/sepcs defined at all for the parser, which seemed like a pretty significant oversight.  I ended up adding a number of specs, testing the new API and the various sample gedcom files.  Still just minimal tests at this point, but it is at least something.

Now I’m ready to use this to start extracting some test data for some interface mockups.  Hopefully those will be coming in the next few days.

3 comments ↓

#1 Jimmy Zimmerman on 12.11.08 at 2:54 pm

Very cool. I’ll have to try it out. I had used the old one for some projects and it was quite good, but I like this new interface.

#2 Jimmy Zimmerman on 12.11.08 at 3:00 pm

Have you also looked at Hans Fugal’s GEDCOM parser? Last time I asked Jamis Buck if he was planning on making updates to his parser (the one you forked), he referred me to Hans’ parser.

http://hans.fugal.net/tmp/redcom/

It hasn’t seen much action lately, and I don’t know the its stability or license. Here’s the conversation that took place on the Utah Ruby User Group list:

http://groups.google.com/group/urug/browse_thread/thread/4f269619703f4b81/85ed3cf6fc9a0b06?hl=en&lnk=gst&q=gedcom#85ed3cf6fc9a0b06

#3 jeremy on 12.11.08 at 4:10 pm

I did search for other GEDCOM parser implementations in ruby, but didn’t find anything (beyond the Ruby Quiz results). Even directly googling Hans Fugal for a gedcom parser doesn’t reveal anything. Looks like his ‘redcom’ parser didn’t go anywhere…

I like this parser implementation specifically because it is a stream parser. I don’t want the parser to populate an in-memory model — that is for the client to decide.

Leave a Comment