Which database(s) to support

I’ll be getting into some real design and coding activities soon.  I’ve been thinking about the database schema, and I got a lot of good thoughts from Leif Erikson’s excellent essays.  He strongly advocates the use of postgresql over mysql — “mysql is crap”.

I don’t have any experience with postgresql, and only a bit with mysql and sqlite.  So I don’t have any basis for a comparison on technical merits.  I do know, however, that many hosting providers, including mine, do not support postgresql.  Mysql is the standard, crap or no.

yology.org is supposed to become a project that supports self-hosting by individuals as well as (potentially) scaling up to much larger deployments.  In theory, that would cover a range of a few 100′s of individuals to 100′s of millions or more.  Those scenarios have very different constraints and sweet-spots, so I don’t think it’s reasonable to think one-size-fits-all will work.

So, I’m hoping to make the database layer flexible enough to handle multiple RDBMSs.  I really don’t know what I’m getting into yet… Rails does abstract that to some degree through ActiveRecord, but abstraction involves trade-offs.  The farther away from the concrete implementation one gets, the more specific advantages of one solution may be lost.

One point that Leif Erikson makes is the benefit of procedures / views implemented directly in the database.  The idea of pushing some logic and processing into the database layer makes sense — especially to avoid multiple round trips to fetch some data, combine it with other data, fetch some more, etc.

Postgresql evidently has the strongest, most mature support for that.  More recent mysql versions add some capabilities, and perhaps sqlite does as well.  My knowledge in this area is extremely thin. But I’m pretty sure the support varies across the different databases.  I want to be able to take advantage of those optimizations where possible, so perhaps it will be a flexible (thus complicated, unfortunately) ORM layer that pushes functionality down the stack where possible, and emulates in higher levels where necessary.

On the other hand, I’m primarily thinking about optimizations to help scaling — I’m probably getting ahead of myself, as that is the least among my problems to solve at this point.  So I guess it makes the most sense to start of with a vanilla ORM layer based on ActiveRecord, using the tools and conventions as they are.  That’s the way to get the fastest start, which is probably the most important consideration at this point…

2 comments ↓

#1 Dave Lester on 12.04.08 at 7:19 pm

I’m in the middle of getting a similar project off the ground, so perhaps we could collaborate. I currently work on an open source web publishing system for digital archives (omeka.org), which has a built-in theme and plugin architecture — it’d be fairly straight-forward to develop a tool like this that extends the software, and would allow not only GEDCOM data to be managed, stored, and shared, but digital scans and photos to be associated w that data, google map mashups, timelines, etc.

Another approach would be to use something like BackPress, which is the core code of WordPress abstracted into classes that can be reused by other software.

This all assumes that you’re using PHP. What is your language of choice?

#2 jeremy on 12.04.08 at 10:27 pm

I have been planning to use Ruby on Rails as the primary platform. I expect the front-end to be very javascript-heavy, talking to a REST API.

All the things you mention are definitely in my vision — photos and other media, timelines and maps and other views…

I look forward to further discussions and possible collaboration with you.

Leave a Comment