Code First Database Evolution (aka Migrations)

Back in July we released the latest preview of Code First and we’ve been receiving a lot of feedback that is helping us shape this feature. One of the top asks we are hearing is for a solution that will evolve the database schema as your object model changes over time, often referred to as migrations. 

Our team is looking at a few different options in this area and we really want to get your feedback on one of the options we are considering that takes a somewhat different approach to most existing migration solutions. 

We’ve put together a screen cast to walk you through a storyboard style mock-up we built that shows this solution being used to build and deploy an application. We really want your feedback to help us shape and evaluate this option, you can provide feedback using the comment functionality on this post. 

This is the first time our team has used a storyboard screencast to share proposed designs so we’d also love to hear your feedback on whether this is an effective way to share our ideas. 

ADO.NET Entity Framework Team

 

(HD option is available in lower right corner of player while the screencast is playing)

Comments

  • Anonymous
    October 25, 2010
    First comment is that the video can be helpful, but would be good if it could be downloaded in wmv format for use in other than Windows media player as it is difficult to move backwards and forwards when it's streaming The ability to specify things like indexes is very useful, but it would be nice if it was specifiable via attributes as well. I assume that the database schema changes can be run via C# code? I don't want the database changes part of the build process, but in my application. It would be nice if there was a way for a class to handle its own migration, for example if you have one string field which you split into two, you might have to run some C# code to do that work.   Overall, the concept seems good, look forward to being able to try it out! ...Stefan

  • Anonymous
    October 25, 2010
    I think this is a very clever idea.  I'm glad to see you trying to push the boundaries and come up with some new ideas for migrations. Before I could judge whether this approach would work for me I'd need to get some more information on a couple things:

  1. How would you migrate data?  For example, say you're converting a varchar column to foreign key reference to another table.  Where would the code for that migration go?  How would this framework know when to run it, or if it had already been run?
  2. The VS integration is all well and good, but could this solution be used in an automated build server environment?  For example, could I cause the tool to run AND commit from a rake script that I call from a CI server like Team City?  Or could this only be automated by running the publish in VS and saving out a script (aka, manually). Thanks, Kevin kevin-berridge.blogspot.com
  • Anonymous
    October 25, 2010
    The comment has been removed

  • Anonymous
    October 25, 2010
    Would this also work with SQL Server CE?

  • Anonymous
    October 25, 2010
    I have to say I'm not a fan of this approach.  We've already got something similar in the Database Projects in Visual Studio 2010, in which you "diff" a set of SQL scripts with an existing database. This approach has not worked well at all for me, and I fear that the functionality in this demo would pose the same problems.  Much of the time, things like column renames aren't interpreted correctly by the engine creating the diff, and running the resulting script would result in data loss or simply a flat-out wrong resultant schema.   I realize that the Deployment<T> class was put in place to mitigate that problem, but it seems like a way to patch up an approach that doesn't work very well rather than creating a way that works well from the beginning.  For one thing, since you're keeping all those "patches" in one file, that file would quickly grow unmanageably large on anything other than a trivially short project.   I think this is a case where a declarative approach is the wrong way to go.  I would much rather see an imperative API for making schema changes, with the code split across multiple files (one file per schema version) and an extra table added to the schema to keep track of what version that particular database instance is on. Also, I think database migration should be available as a discrete activity, without necessarily being tied to running or publishing the application.  Sure, make the option available to run the migrations at run and publish time, but I can see many scenarios where I would only want to make the database changes and not run or publish.   In short, we know that the way Rails manages schema migrations works pretty well.  I would much rather see something similar to what has been proven to work well in other environments than have you guys come up with an "innovative" approach simply to be different and have it not work very well.

  • Anonymous
    October 25, 2010
    What does this effort mean for the expected first release of EF with Code First?   This approach looks nice for the scenario where an application can specify its database schema, but we're using Code First (and finding it useful) to work with an existing database, whose schema is fixed. Is there a plan to release Code First without the migration, e.g. in something like a service pack?  Or should I expect a CTP for migrations before a release?

  • Anonymous
    October 26, 2010
    I like it, but Orchard Project have a better way of manipulate this, i think

  • Anonymous
    October 26, 2010
    Brian makes a very good point about multiple versions. Maybe each database should have a version indicator and a separate deployment file for each major version? ...Stefan

  • Anonymous
    October 26, 2010
    Also, what about the situation where the database is not fully defined until runtime. I'm looking to develop an application using code first where other DLLs can plug-in using mef and provide additional classes which would be stored in the database - how would that be processed from a change calculation perspective? ...Stefan

  • Anonymous
    October 26, 2010
    I can see how you might want to do a whole lot of dev work then capture the db diffs at the end of it in a deployment situation, ie do all your diffing in one hit. However I can also imagine on a dev machine having the v1 version of my testdb and wanting to catch the iterative changes in my source code repository with each change. Eg if I add the Extract Property then have the ALTER TABLE Blogs ADD COLUMN Extract... in a script in my project so that it gets into the source code control at the same time. This would give you more flexiblity:

  1. use the scripts from source control or partial patches thereof. or
  2. ignore the script and do the full diff as you have suggested. I guess I am suggesting having an easy way to build the incrementat sql script built in rather than do it manually. You db can then (if suitable) mirror the source code control versions and have a script to take it from v1 to your current source code version. Just a thought.
  • Anonymous
    October 26, 2010
    Like it, great to have migration support baked in. How does this work for embedded databases?  What if a user installs an version that is two or three versions ahead of their current schema - is there support for that scenario?  

  • Anonymous
    October 26, 2010
    It's really great to see how this is developing. Like J Jones I would be very keen to see a beta of code first with a go live licence, even without finished work on db migrations. I would be sad if we could not make use of the existing impressive code first achievements until this further functionality is completed.

  • Anonymous
    October 27, 2010
    Thanks for the feedback everyone.  Please keep it coming. Here are a few quick responses to questions that seem to be coming up:

  1. If we go this route, it's our intention that the ability to perform these kinds of migrations (compute and/or run the scripts) would be available through multiple routes.  By default it would be turned on so that just hitting F5 to build and debug a project would get you going with auto-deployment/migration against a local sandbox DB, but of course this would be configurable.  In addition you would have the ability to trigger any of this process through build actions or from a commandline so that it would be fully continuous integration friendly.  We may also support the ability to trigger all of this from an API so that it can be done at runtime, but there are some slightly tricky parts to getting that how we would like it, so it's not clear if that capability would be part of the very first release of this capability.
  2. Another key part of this is the ability to take "snapshots" of a schema either manually or through any of the mechanisms described above and then to create alter scripts that will migrate a database between any of its current state, the target described by your model, or any snapshot.  So it would be easy to do things like creating a snapshot at each release to production and then each time you do a CI build compute a script which will allow deployment to a clean machine, one that will migrate from the last major release to the current version, and one that will migrate from the schema compatible with the last build to the current schema, etc.  If we add to this the ability to supply whatever manual migration steps you need, but based on a core system which automatically handles the 90% cases, then it's our hope this will make you very productive while still being flexible.
  3. We have heard loud and clear (here and in other places) the feedback that folks are anxious to get CodeFirst into production.  We're working hard to make that possible as soon as possible, and while we are also working on this feature, we won't add arbitrary delays to a codefirst release for the sake of migrations.
  • Danny
  • Anonymous
    October 29, 2010
    Hi, The approach looks good. I also happen to see the PDC on 29 oct 2010. I would like to have LINQ to entities to be some thing like linq to SQL. In Linq to SQL we do not have any such 'magic' happening. Please let the developer decide what he/she wants to do with the database. In real-world the database is something that may not be so easy to change. In some projects I work I see have to actually get client approval for making even a small index change. I would prefer that the approach be some thing like this
  1. Define the POCO entities
  2. Reverse engineer table class ( a POCO class may be or a XML)
  3. Define the mapping
  4. In mapping I would say each attribute of POCO going to which tables which column. You can tell me that it is possible through edmx today. But the issue if I have write a class that encapsulates data from to table then I think we need to do some sort of inheritance. But in case of code first approach I would like to have that control with me that is define a class and tell that this column maps to this table and this class.
  5. I should have all facilities to be made attribute free and 100% configurable through XML or through fluent API. This is because I may even pass this POCO over WCF services
  • Anonymous
    October 31, 2010
    I'm building apps that are sold to customers. Therefore, whenever we upgrade our app the databases must be migrated too. This migration must take place either during installation time (making it a custom installer task), or during initial app launch. Here's a couple of requirements for this process:
  1. It has to be automatic, as there is no DBA for the customer database.
  2. It must be atomic, e.g., either all changes done during the migration must succeed, or the whole database must be rolled back.
  3. Customers might have different version of the app installed. Therefore, several migrations must be run sequentially to lift the database to the correct version.
  4. Sometimes there are not only schema changes, but data must be migrated as well. Each migration should have some sub-steps: a) Compare the schema with the "official" schema to ensure nobody has messed with the database. b) Do schema changes as necessary c) Do data transformations as necessary d) Clean up schema changes (e.g. drop tables you no longer need) e) Check schema with intended target schema f) Repeat a)-e) until we're at current version... So please keep in mind that EF is not only used for server apps, where each DB must be deployed once, but for normal applications where hundreds or thousands of databases must be migrated. I'd love to see migrations as being a part of EF! Keep up the good work.
  • Anonymous
    November 01, 2010
    The comment has been removed

  • Anonymous
    November 04, 2010
    We migrate customer databases 'live' on their servers during app upgrade. As Microsoft haven't yet provided any migration framework, I made one that works as follows: Every time we want to make an alteration to the database, we write an SQL script and add it to a collection. Scripts are numbered. The migration tool looks at the existing database's version and at the scripts available. If any scripts need running to take the database up to the current version, they are executed. This is all done in a transaction, to avoid the possibility of a failed migration. We also try to get the user to run a backup beforehand, just in case. Migrating a database is not simply altering its schema. In fact, only about 1% of our upgrade scripts handle schema changes. The other code does stuff like alterations to stored procedures and user defined functions (we drop and re-add all of these, with the UDFs being done first, in the correct order so inter-dependencies don't cause a problem), data inserting/deleting/updating and larger data transformations involving more than one table. T-SQL and C# are Turing-complete so it's literally impossible to make a framework which can somehow encapsulate all the operations one might want to perform in migrations, though I'm sure some helpers can be provided. What's essential, however, is that the framework allows user code free reign during the migration process, whether that's by having it run the show and giving it some helper code to assist in its tasks, or by putting enough hooks in between all the stuff the framework does to allow it to do its job. I'm looking forward to the EF migrations stuff and hoping it fulfils this essential criterion!

  • Anonymous
    November 08, 2010
    All in all, it's a good effort and something that needs attention. So all thumbs up there! My experience with different migration strategies is that it's essential to embrace the fact that a database evolves organically and not in big, schema changing leaps. Schema comparison as a migration mechanism simply doesn't work, progressive changes trough coded migrations do. The low uptake of the database projects in Visual Studio, compared to the proven success of migrations in Rails is maybe not proof of this, but it should at least be a good indication. That's not saying that schema comparisons are not useful, but I'd say that they should be used more for pre/post migration validation and to provide valuable help when creating the coded migrations (T4-generated migrations based on schema comparison is one approach I've been working with).

  • Anonymous
    November 08, 2010
    I left my comments with the team while I was at PDC.  I think most of the feedback I had was captured by them and by others commenting here.  Looking forward to seeing the evolution of the migrations capability.  It is definitely very important to us!

  • Anonymous
    November 08, 2010
    The comment has been removed

  • Anonymous
    November 09, 2010
    I agree with some of the other posters here that user defined data migrations (migrations that update existing records) are essential. The ability to migrate the schema is worthless without the ability to run data migrations. Also, versioning of migrations is going to be essential to anyone doing staged deployment (i.e. from local to test to production environments).

  • Anonymous
    November 09, 2010
    I'd agree that some kind of snapshot feature is essential.  It's tracking and managing the migrations that appears difficult in the current solution.  I wonder if it would actually be easier to create the migrations on a per object basis.  So if I have a user object, as I make changes to that object over time, each migration is created for me automatically using the diff tooling you already have. UsersMigration0001   Up() { Create new table ("Users") Add Column("UserId") Add Column("UserName") } Down { Remove table ("Users") At a later point I edit the object to add a address field, the framework creates a new migration for this object. UsersMigration0002 Up() { Add Column ("address") } Down() { Remove Column ("address") The snapshot can then bundle up all object migrations at  that period of time, ie UsersMigration0002 RolesMigration0004 I can then rollback to a previous snapshot if something has gone wrong. Then it's fairly easy to understand what version everything is at, if you know what snapshot number the db is running at, plus it makes it easy for people to create migrations manually to add indexes etc that would be too laborious or difficult to do with data annotations. I'd also agree that it is essential that this can be run by CI tools, but as it appears to hook into MSDeploy that appears to be taken care of :-)

  • Anonymous
    November 11, 2010
    I like the ability to specify indexes and such in the *Deployment.cs files.

  1. Can we control column ordering for our resulting tables via this mechanism? For example, my DBA wants to make sure the ID column for a table always appears first.
  2. Can we specify a TPH or TPC inheritance somehow using a code-first approach? I like the idea of both the model-first and code-first approaches in EF but found that while I could use the DB Generation Power Pack and muck around in the T4 templates, it wasn't a very pleasant experience. I may be in the minority but I do care about what the database looks like at the end of the day in terms of naming conventions, indexes, structure,etc. At the same time, I find the code-first and model-first perspectives to be compelling. I guess I want to have my cake and to eat it too.
  • Anonymous
    November 11, 2010
    I agree with the commenters questioning the viability of using schema-comparisons to handle code-first migrations. My experience has been that database migration involves a lot of data transformation, around the how, where, and what of existing data. As a result, the imperative bits (code inside the DeclareSchema method, et al) will quickly grow unwieldy. I think it would be more effective to recognize that and place the primary focus on transformations, rather than simple schema changes (a la Rails Migrations.) If you could combine the two, say by auto generating simple schema transformations like Iain Hunter suggested, it may offer the best of both worlds. Further, I wouldn't spend much (if any) time developing snapshot capabilities.

  • Anonymous
    November 11, 2010
    I agree with the commenters questioning the viability of using schema-comparisons to handle code-first migrations. My experience has been that database migration involves a lot of data transformation, around the how, where, and what of existing data. As a result, the imperative bits (code inside the DeclareSchema method, et al) will quickly grow unwieldy. I think it would be more effective to recognize that and place the primary focus on transformations, rather than simple schema changes (a la Rails Migrations.) If you could combine the two, say by auto generating simple schema transformations like Iain Hunter suggested, it may offer the best of both worlds. Further, I wouldn't spend much (if any) time developing snapshot capabilities.

  • Anonymous
    November 11, 2010
    I agree with the commenters questioning the viability of using schema-comparisons to handle code-first migrations. My experience has been that database migration involves a lot of data transformation, around the how, where, and what of existing data. As a result, the imperative bits (code inside the DeclareSchema method, et al) will quickly grow unwieldy. I think it would be more effective to recognize that and place the primary focus on transformations, rather than simple schema changes (a la Rails Migrations.) If you could combine the two, say by auto generating simple schema transformations like Iain Hunter suggested, it may offer the best of both worlds. Further, I wouldn't spend much (if any) time developing snapshot capabilities.

  • Anonymous
    November 11, 2010
    (Sorry for the dup's! Apparently IE9 beta doesn't play well with this comment system.)

  • Anonymous
    November 11, 2010
    The comment has been removed

  • Anonymous
    November 11, 2010
    You're certainly on to something great. I think the code first model is very promising and developer friendly, and migrations support fills an important gap in EF. Some random thoughts:

  • Make it script friendly. Deployment should be automated by scripts, and the scripts should be versioned. Scriptable tools, PowerShell modules and/or Web Deploy integration would be nice to have.
  • Make scripts and snapshots source control friendly, so that it's easy to compare and merge scripts at the text file level. The commit button may be sweet for simple projects, but for any serious development, you want everything go into source control first, and then deployed by scripts (a repeatable process).
  • Wizards are certainly nice to have sometimes, but they often require manual steps that could be completely automated, both to speed up development, and to reduce the risk of doing something wrong. At least enable storing and sharing tool settings with the project files, in order to facilitate developer and team productivity. If a wizard is used, a team developer shouldn't need to repeat selecting options that could be a preset for all team members.
  • Anonymous
    November 11, 2010
    I think one of the nice things about the Rails and ASP.NET MVC implementation of Routes is that all of the information needed to be specified is contained in one place: Routes file. In fact, one drawback to RoR is that the DB relationships are only defined in the Models and the columns are defined in the Migrations. Something like Hibernate Mapping Annotations (Attributes) or the JPA allow both the relationships and the columns to be defined in the same file, in the same location as the relationships. When developing, I don't really want to look in one place for one piece of information about the DB and in another place for another piece of information. Either combine all the information that the DB needs into 1 class, or split it completely into the model class. Jason

  • Anonymous
    November 11, 2010
    This feature may already be in EF Code FIrst, but let me ask sin I don't know for sure... Is there a way  generate the initial set of model classes by scraping an existing database? I don't want to manually type in all that code or dozens of tables if the tables already exist. Id like to code-gen what is there to start with, and then migrate from that point forward.

  • Anonymous
    November 11, 2010
    I think generating the initial set of model classes can be done by creating an edmx file and dropping all the tables onto it, then using the POCO extension, then copying the generated poco files, then deleting the edmx and using just the code first feature.

  • Anonymous
    November 12, 2010
    The comment has been removed

  • Anonymous
    November 18, 2010
    This may be a bit unrelated, but are you going to add the support for names starting with underscores in CTP5?

  • Anonymous
    November 23, 2010
    The comment has been removed

  • Anonymous
    December 01, 2010
    I'd like to see more ideas about how Code-First implementations can operate in a mixed db schema scenario, from a code consistency point of view. Specifically, if I have two web apps running EF Code-First communicating with SQL Azure, and I perform a staged rolling upgrade, how can the old EF web role still operate with the modified SQL Azure schema during the upgrade process without introducing downtime? Telerik has some concept of this, in that their framework can inject properties on the fly to allow the ORM to continue to operate with a modified schema. Would be nice if Code-First could somehow allow us to flag specific fields as transitional, or version dependent, so the model can operate against two schemas at once. Ideally, you want to detect schema changes between versions, and then perform a rolling upgrade smoothly - but there's still a lot to figure out on the Code-First front on how it will react with a live schema change vs. it's "edmx" representation of the model.  Having these technologies work hand in hand would be nice. For example... Create a Separate Staging Area on Azure Make a Copy of SQL Azure Version 1 from live version Create a duplicate of Web Role Version 1 (which can talk to V1 schema) Test (verify) Upgrade Web Role to Version 1.5 (can talk to both V1 and V2 schemas)  [ How to make this work ] Test Upgrade SQL Azure to Version 2 Test Upgrade Web Role to Version 2 (can talk to V2 schema) Test Go to Production, perform rolling upgrade UD0 - Web Role 1 - Upgrades to Version 1.5 against SQL Azure Version 1 Test UD1 - Web Role 2 - Upgrades to Version 1.5 against SQL Azure Version 1 Complete 1.5 upgrade step Upgrade SQL Azure to Version 2 (using migration approach) Test - UD0 and UD1 should continue to work UD0 - Web Role 1 - Upgrades to Version 2 against SQL Azure Version 2 Test UD1 - Web Role 2 - Upgrades to Version 2 against SQL Azure Version 2 Rolling Upgrade Finished

  • Anonymous
    December 02, 2010
    Do not chase after an "in place" schema migration approach. Treat the schema as immutable and then create transformation scripts that map the data from the old schema to the new schema. Code first is probably not being not being used on extremely large datatabase. In that case so long as the data can be migrated in reasonable time you are OK. If and only if you get that approach working, then investigate having an "optimisation" to do in-place schema migrations.

  • Anonymous
    December 19, 2010
    I tend to think that simplicity is aways a good design strategy. And I also tend to think that the developer should be in control of the development process as much as possible. EF code first ctp5 acomplish both, your proposed migration DOESN'T. Here is how I see a migration process must have: #1 - There must be a way to track the database version. Rails uses a special database table, but it can be anything. There should be ways to go back or advance to any schema version. #2 - We're talking about SCHEMA migrations, not data. So, a database snapshot is unecessary. Use a schema file instead. This file would contain the current database schema from where the diffs would be generated. #3 - Every build must create a migration file and apply them, but there must also be another away to generate and apply migrations apart from the build process. (Maybe the migration files would be put in the Migrations folder) #4 - There must be a way to track the apllyed migrations and rollback and reapply migrations. Keep it simple, but functional and the developer in control of the process.

  • Anonymous
    December 22, 2010
    Regarding the use of a screencast - our corporate IT policy blocks YouTube so unfortunately this is useless to me.

  • Anonymous
    January 27, 2011
    First off, I'm glad you're working on this, and it looks like you're off to a good start.  I also want to second what Hartmut Kocher and Rik Hemsley have already stated.  They did a fine job of summarizing some of my concerns.

  • Anonymous
    February 04, 2011
    Hartmut Kocher has listed pretty much my exact requirements which this doesn't seem to cover. I need the database to auto upgrade when my app server initialises. This can be tricky, when for example, adding a new not null field as the schema needs changing to add the column as nullable, a query needs to be run to populate that field, then the schema needs to be adjusted to not allow nulls.

  • Anonymous
    February 05, 2011
    The comment has been removed

  • Anonymous
    March 18, 2011
    +1 to Jack's comment.  I'm going to repost it because I think it is the simplest and most robust approach: Do not chase after an "in place" schema migration approach. Treat the schema as immutable and then create transformation scripts that map the data from the old schema to the new schema. Code first is probably not being not being used on extremely large datatabase. In that case so long as the data can be migrated in reasonable time you are OK. If and only if you get that approach working, then investigate having an "optimisation" to do in-place schema migrations.

  • Anonymous
    April 29, 2011
    I currently use Model First approach. Then for the Database Generation Workflow I use "Sync Database Project.xaml (VS)"  And this will update my DB Project schema.  From there I can deploy the schema changes to the actual database because the DB project will schema compare/diff what is needed to change. If data loss occurs I can create pre and post migration scripts to overcome the issue.  The works great for me at the moment. What I don't understand is how Code First can sync to a database project?  If it can do that then there wouldn't be any migration issue at the moment. Thanks

  • Anonymous
    February 20, 2012
    The comment has been removed