If You Have A Hammer, It’s Not Always Wrong To Use It
June 7th, 2009
If you asked my girlfriend, and others close to me, what my biggest personality flaw is, you would be told that I’m a perfectionist. The last few days have been a shining example of that.
In my last post, I documented my source tree for the Reviewer project – four directories of planned features, and one directory called ‘Database’ which is the subject of today’s post.
I’ve been thinking a lot about how to implement the database interop for this project, since it’s my first time doing something like this and I really wanted to get it right. I came up with two (fairly obvious) ideas for this:
- Build the database lookups directly into my core classes.
The first idea I came up with was to have a few methods in each class which would talk directly to the database. For example, the review class would include a method like getReviewsByType(ReviewType rt) which would translate the ReviewType object into a literal stored in the ReviewTypes table in the (currently nonexistant) database, then perform a lookup on the Reviews table and translate the results into a List and return that.I didn’t like this idea because I didn’t want to mix my business logic with my database logic. - Build an abstract base class to handle all the database connectivity stuff, and create subclasses for each of my business objects that would handle the nuts and bolts of translating the objects to/from the database.
I didn’t like this idea because I’d be essentially storing details of the same business object in two separate classes. The Review class, for example, would be strongly coupled to the ReviewDB class (pending some better name for it).
In the end I decided that I liked option two best. I could probably decouple the two in such a way that the Review class would be reusable, even if I didn’t want to bring the ReviewDB class with it.
With that decision made, I turned to StackOverflow yet again (those guys will be sick of me and my noobish questions soon enough, I’m sure) to ask if this sort of implementation was standard or fundamentally flawed. I didn’t get any negative feedback on the design question I intended to ask, but I did get one answer which asked me why I wasn’t using Object/Relational Mapping (O/RM).
I’m forced to plead ignorance here. The two examples I was given were LinqToSQL and nHibernate.
For LinqToSQL my reasoning was that I’m using the Jet database engine for the short term on this project. The reason for this have a lot to do with the deployment options at my work since they’re the only customer for the product at this point. LinqToSQL is simply not designed to work with Jet. It can be tricked into working, I’m told, but you really don’t get the complete experience.
As for nHibernate, well, I really didn’t know what that was until today. I had heard the name before, especially relating to Java, but had never looked into what it was.
I weighed the pros and cons of this approach, and have decided to completely ignore both of these options at least until the project makes it to the 1.0 release. My goal for right now is to ship the product soon, so that my colleagues at work can benefit from it, and also to teach myself C# and development in general. Whether I’m learning how to use OleDB connections, or nHibernate, I’m still learning something useful. For now, I’ve decided that taking an extra week to figure out nHibernate really isn’t worth it. For v1.1 (or 2.0), I’ll take another look and start refactoring my design and my code, and that’ll also be good practice. I’m also planning on adding support for multiple database engines (SQL Server being the main target here), without losing the existing support for Jet/Access. It should be exciting, if most likely horribly frustrating at first.
I know this seems like a bad case of “if all you have is a hammer, everything looks like a nail,” but I’m thinking instead that if my hammer does a good enough job now, why not use it?
June 7th, 2009 at 7:08 am
Option #2 is better because it is a fairly well-known design pattern – “Data Access Objects”. http://en.wikipedia.org/wiki/Data_Access_Object
Instead of putting the common database lookup routines in the base class, treat them as a separate “Database abstraction layer”. This way, you won’t have to modify any of the DAO classes if you ever change out your database. You just write a new Database abstraction layer with the same API as the old one and you’re done.
http://en.wikipedia.org/wiki/Database_abstraction_layer
Yes, this splits your application classes into 3 layers that sit on top of the database, but the extra flexibility and modularity are worth it.
June 7th, 2009 at 7:38 pm
I agree the 2nd option is better. I like to make an adapter class that implements database specific functionality (for .NET, provider specific stuff), and then have the database layer classes call the adapter methods. Also, you should look into auto-generating the database layer, either with one of the OR/Ms listed above or just roll your own that meets your needs. You can go way overboard with OR/M for small projects. It should be automatic and easy to re-run after you make a database change. I personally like to use a scripting language for this purpose, but there is some value in using the main language of your project.
June 7th, 2009 at 8:16 pm
No, this seems like “I have no hammer, therefore I’ll create my own”.
Not using an ORM mapper, or getting to learn one, is going to waste your time and prevent you from focusing on leaning C#. Without an ORM mapper your going to have to write every single query that accesses your database by hand and then debug those queries when you inevitably make a typo.
The real “everything is a hammer” problem is probably your choice to use Access as a backend DB “because you know it”. This is what the saying is used for. Because you know Access every persistence problem your have your going to try and fix with Access.
Try branching out into something like Firebird or SqlLite instead of what you know if you really can’t get connected to your company’s sql servers.
June 7th, 2009 at 8:31 pm
@John Farrell:
I’m not specifically using Access/Jet because *I* know it. I’m using it partially because people at work know it (in case you haven’t read my other posts, I do consulting on a legacy, partially cobol-based system, with people who may or may not be comfortable with other systems) and I’m not planning on being at this job forever. I’m doing this project largely as a springboard to leave, actually.
Other than that, you’re mostly right. I didn’t look at Firebird because it was simply off my radar, but I did take a look at SqlLite. I didn’t actually use it, but from what I read on their site, it just didn’t seem suitable:
“Situations Where Another RDBMS May Work Better
Client/Server Applications
A good rule of thumb is that you should avoid using SQLite in situations where the same database will be accessed simultaneously from many computers over a network filesystem.
High Concurrency
SQLite uses reader/writer locks on the entire database file. That means if any process is reading from any part of the database, all other processes are prevented from writing any other part of the database. Similarly, if any one process is writing to the database, all other processes are prevented from reading any other part of the database.”
Source: http://www.sqlite.org/whentouse.html
That said, after all the feedback I’ve received from this post, I’m definitely going to take a closer look at nHibernate. I also like SQL Server Compact, but I haven’t looked close enough at it to see if it’s suitable.
June 7th, 2009 at 10:50 pm
@John Farrell:
I’ve become sceptical of ORMs and abstraction layers. Great in theory but I’ve not yet seen an implementation that doesn’t suck either for maintenance coders or for performance, or both.
I used to be big on following principles when coding. After several years of contracting, working on existing code bases, I’m now in favour of practicality. I rate everything now based on the two criteria I mentioned above:
1) Is it easy to maintain?
2) Is the performance adequate?
Almost all the code I’ve come across fails one or both of these two criteria.
The code-base I’m working on at the moment is a case in point. I need to change the way the data is stored in the database. First step, reverse engineering the existing code to find out how it saves the data. 10 – 12 nested methods between clicking a button on the UI and something happening in the database (not to mention side calls to read properties or get specific data). And slow as a dog. All the business logic in the middleware layer. Great idea but it involves 3 database writes and 4 reads to add an item to an order. Maximum of 99 items can be added at a time. I make that up to 693 database accesses to perform a single operation (adding a bunch of items to an order). It took two days to rewrite the code to perform a single database update for all items. Still using a data access layer but moving the business logic to a stored procedure in the database. No good from a purist’s point of view but all the business logic is in one place (the stored proc) and we got a several hundred-fold improvement in performance, on average.
June 8th, 2009 at 3:11 am
Jesse,
I’d encourage you to write at least one or two database applications before you start using an ORM. While it can be a bit more tedious writing the queries by hand and debugging them, it is a worthwhile learning experience. Once the learning is done and you’re just repeating the same busywork you’ve already done a number of times before, then the ORM makes a lot more sense.
June 10th, 2009 at 1:18 am
@simonTwebi, I’m not going to call you out with specifics here but your comment really sounds like your problems are from shoddy code and not an ORM mapper in general.
I’ve used two different ORM mappers in sites with thousands of queries executing every second and both ORMs had almost no noticeable effect on performance.
Take a look at this post about the Subsonic ORM:
http://blog.wekeroad.com/blog/subsonic-scaling/
“I’ve just pumped a million orders (and items – 2 million records total) into my SQLExpress Northwind database and it took just about 22 minutes (which is about 1500 records a second):”
Here’s a link to a performance test of NHibernate:
http://www.iamnotmyself.com/2008/07/02/NHibernateTestingThePerformanceUrbanLegend.aspx
1000 inserts via Stored Procedure was only 175ms faster than NHibernate. That means that NHibernate added .17 ms to the overhead of inserting a record.
So when you say “several hundred-fold improvement in performance, on average.” for adding items into the database the problem likely due to a specific problem with your code.
Overall, popular ORM packages have a negligible effect on performance and its unfair to say all ORM performance is inadequate based on these numbers.