Most developers worth their salt would agree that source control is a good, if not critical thing. One thing I’ve noticed is that oftentimes whilst a developer will absolutely agree that source control is required for the code base, they will ignore it for the sql database schema.
I’ve always found that strange given that the database determines the structure of all the tables, any constraints between those tables, probably some sql stored procedures that have logic within them – surely this is important enough to be source controlled!! Its also surprising as oftentimes, the code is built against a specific version of a database – so rolling back the code would not work as the database version could not be rolled back. All in all, version controlling your databases is something you SHOULD BE DOING!!
So why do some many developers / development teams not do it?
I’ve mulled this one over, and come to the following conclusions:
- Its too complicated: Lets face it, version controlling a sql database is not that simple. The usual option is to script out the structures and source control them. But that only gives you a way to create the database, it provides no upgrade option. It also means, if you want to update a database object (like adding a column), you have to amend the scripts directly. Which then renders the tooling (e.g. sql management studio) and all its features defunct. This is hard to overcome, but there are ways.
- The database already exists: More often than not the database already exists – in this case it can be a hassle to go through all the database objects, extracting sql scripts. In my mind, the benefits of of source control far outweigh the time taken to do this activity – furthermore tooling is improving, and there are ways to get this done in one fell swoop.
- Its the DBA’s responsibility: This one is a headache – but as a developer, even if there is a DBA in the mix you should take the responsibility to move this forward – just because its not your responsibility does not mean it does not impact your work.
- We didn’t even consider it: If it hasn’t been done before, then nobody has thought to bring it up. Well, now’s the time – I’m bringing it up and your can’t pretend its not there anymore!!!
So how can we do it?
Ok, so you’ve decided to start source controlling your database(s) – what are the next steps, what options do you have.
- SQL Script Files: Source controlling a bunch of sql files is very easy – most developers have done that at some point. Practically everything in your sql database can be constructed using sql scripts, so using that as a basis they’re a good place to start. You can even use tools like the visual studio database projects to initially extract all the database objects into script files – these projects will even help you redeploy your changes to the database.
- Code First DB Design: Another option is to use the code first approach (as in the Entity Framework code first). This allows you to indirectly source control your database as the logic for a lot of this is contained within your managed code.
- 3rd Party Tools: There are a few 3rd party tools (like the Red Gate tools), that have support for sql source control.
What are the pitfalls?
Using any of the methods I mentioned, creating a database from scratch is a piece of cake – it couldn’t be easier! This makes getting a developer workstation up and running that much easier, and allows each developer to work on their own independent copy of the database. The problems start to flood in after that initial database creation, which the problem fraught area of upgrading.
Upgrading a database is a difficult and risky subject at the best of times, and when combined with automated upgrade scripts things don’t get easier!
How do you even do the upgrade?
The first problem is how do you even do the upgrade in the first place! If you’ve gone with the scripts option, they will only create the database objects, not alter them. I have worked on systems whereby 2 sets of scripts are maintained, one set which creates everything from scratch, and a second that upgrades the database from a point in time. This works ok, but how do you attribute that point in time? You can also end up in a situation where you’re having to string together several sets of upgrade steps – this feels uncomfortable and unwieldy – but for most its the only option.
If you’ve gone down the code first option you have a database migrations facility – this is ok, but you end up having to write a lot of code to determine what change you want to apply, give the databases current schema. You also introduce a risk of bugs as 2 versions of the same database that (because they took a different root through your code) end up with different schemas – this just feels wrong!
The second big problem is when upgrading a production database with live data in it. I’ve worked on projects where databases have been upgraded only to find that important data has been deleted due to some referential integrity issue introduced. Or where rows in a table have become orphaned. In my mind, this is the biggest risk – sure you can take your database back-ups and restore them, but what if you don’t realise straight away….OUCH!
You can create data migration scripts, but getting it right is difficult – executing the right ones at the right times is tough.