Friday, March 29, 2013

Theory vs. Practice: Data Stored Must Be Useable

I heard there's a move to drop a table from a database I designed.  That's fine, I'm long gone and needs change, but thinking about why makes a convenient springboard for a "Theory vs. Practice" post.

The table in question is hard to love: most of the data is otherwise derivable and the data that isn't probably should be elsewhere.  There's also a couple of poor design choices, but I suspect the real reason the table is in the cross hairs is standard relational practice says the table shouldn't exist.  Pure speculation, but let's go with that.  Ironically, the table not only exists, it is key to day to day operational support.

The initial schema drafts didn't include this table.  I did the thing you do, gathered up all the data points, worked out the relations.  A couple of surprises, but mostly "people" had zero to many "addresses", etc.  This phase of the design is all about minimizing redundancy and efficient storage, efficient inserts. The easy part and when complete, the place where most people stop.  But

there's no reason to put data into a database unless you're going to get it back out again

and, astonishingly, few people think of that.  They don't do the next step which is to figure out how data is going to be used on a daily basis, what kinds of questions are going to be asked.

This can be hard, because you have to anticipate what people are going to ask before they ask it, though in this case it was pretty straightforward.  The basic operational questions were fairly obvious (though the people who would eventually ask the questions didn't find them obvious at first).  The problem was this.  Given the design I had, I could certainly answer the questions, but not without a pickup truck full of SQL that would be hard to write and probably wouldn't perform very well.

The answer was this questionable table.  Data creation added no friction.  Data update required some attention but was actually a lot simpler than the design without and vastly easier to comprehend as well.  And most importantly, those operational questions I was expecting to see asked a zillion times a day became incredibly simple and inexpensive to answer.  Everybody's happy except for somebody tutting over dubious relational design.

Practice trumps theory.  A pretty ERD isn't as important as timely delivery of data to the people who need it.