Introduction To Database Refactoring

原文链接:by Scott W. Ambler;http://www.tdan.com/view-articles/5010/

Published: July 1, 2006
Published in TDAN.com July 2006

Material for this article was modified from Refactoring Databases: Evolutionary Database Design by Scott Ambler and Pramod Sadalage (Addison Wesley 2006).www.ambysoft.com/books/refactoringDatabases.html

Abstract:

A database refactoring is a small change to a database schema which improves its design without changing, at a practical level, the semantics of the database. In other words, it is a simple database transformation which neither adds nor breaks anything. The process of database refactoring defines how to safely evolve a database schema in small steps. Database refactoring enables data professionals to work in an evolutionary manner, just as modern application developers do. It also provides a coherent strategy for organizations to dig their way out of the legacy database hole.

1. What is Database Refactoring?

In the seminal text Refactoring, Martin Fowler [1] describes the programming technique called refactoring, which is a disciplined way to restructure code in small steps. Refactoring enables you to evolve your code slowly over time, to take an evolutionary (iterative and incremental) approach to programming. A critical aspect of a refactoring is that it retains the behavioral semantics of your code. You do not add functionality when you are refactoring, nor do you take it away. A refactoring merely improves the design of your code - nothing more and nothing less.

A database refactoring [2, 3] is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics - in other words, you cannot add new functionality or break existing functionality, you cannot add new data, and you cannot change the meaning of existing data. A database schema includes both structural aspects, such as table and view definitions, and functional aspects, such as stored procedures and triggers. I use the terms code refactoring to refer to traditional refactoring as described by Martin Fowler and database refactoring to refer to the refactoring of database schemas. The process of database refactoring is the act of making these simple changes to your database schema.

2. Why Database Refactoring?

There are two fundamental reasons why you want to adopt database refactoring:

  1. To repair existing legacy databases. Database refactoring enables you to safely evolve your database design in small steps, making it an important technique for improving the legacy assets within your organization. This is clearly much less risky than a "big bang" approach where you rewrite all of your applications and rework your database schema and release them all into production at once. Furthermore, it is much better than the "let's try not to allow things to get any worse" strategy currently employed by the vast majority of data management groups which I've run into, a strategy which has no hope of success because all it takes is one development team to go around the data management group and do an imperfect database design.
  2. To support evolutionary software development. Modern software development processes, including the Rational Unified Process (RUP), Extreme Programming (XP), Agile Unified Process (AUP), Scrum, and Dynamic System Development Method (DSDM), are all evolutionary in nature. Craig Larman [4] summarizes the research evidence, as well as the overwhelming support among the thought leaders within the IT community, in support of evolutionary approaches. Unfortunately, most data-oriented techniques are serial in nature, relying on specialists performing relatively narrow tasks, such as logical data modeling or physical data modeling. Therein lies the rub - the two groups need to work together, but both want to do so in different manners. I believe that data professionals need to adopt evolutionary techniques, such as database refactoring, which enable them to be relevant to modern development teams. Luckily these techniques exist [3], and they work quite well, it is now up to data professionals to choose to adopt them.

3. Implementing a Database Refactoring

Sometimes a project team finds itself in a relatively simple, "single-application database" situation, and if so they should consider themselves lucky. With this simple architecture database refactoring is fairly simple - you merely change your database schema and update your application to use the new version of the schema. What is more typical is to have many external programs interacting with your database, some of which are beyond the scope of your control. In this situation you cannot assume that all the external programs will be deployed at once, and must therefore support a transition period (also referred to as a deprecation period) during which both the old schema and the new schema are supported in parallel. For the rest of this article I will assume that you're in this situation.

To put database refactoring into context, let's step through a quick example. You have been working on a banking application for a few weeks and have noticed something strange about the Customer table depicted in Figure 1[1] - one of the column names isn't easy to understand. You decide to apply the Rename Column refactoring to theFName column to rename it to FirstName.

Figure 1. The initial database schema for Customer.

Agilists typically work together in pairs; one person should have application programming skills and the other data skills, and ideally both people have both sets of skills. The pair begins by determining whether the database schema needs to be refactored. Perhaps the programmer is mistaken about the need to evolve the schema, and how best to go about the refactoring. The refactoring is first developed and tested within the developer's sandbox. When it is finished, the changes are promoted into the project-integration environment, and the system is rebuilt, tested, and fixed as needed.

To apply the Rename Column refactoring in the development sandbox, the pair first runs all the tests to see that they pass. Next, they write a test because they are taking a Test-Driven Design (TDD) approach [5, 6, 7]. A likely test is to access a value in the FirstName column. After running the tests and seeing them fail, they implement the actual refactoring. To do this they introduce the FirstName column and the SynchronizeFirstName trigger as you see in Figure 2.

Figure 2. The database schema during the transition period.

The trigger is required to keep the values in the columns synchronized - each external program accessing theCustomer table will at most work with one but not both columns. At first, all production applications will work withFName, but over time they will be reworked to access FirstName instead. There are other options to do this, such as views or synchronization after the fact, but I find that triggers work best.

The FirstName column must be populated with values from the FName column. You then need to run both columns in parallel during a "transition period" of sufficient length to give the development teams time to update and redeploy all of their applications. This transition period could be several years in length, depending on the ability of your project teams to get new releases into production. In this case we've decided that the transition period will run to November 14, 2007.

The pair reruns the tests and sees that they now pass. They then refactor the existing tests to work with theFirstName column rather than the FName column. Once the database refactoring is completed in their development work environment, the pair promotes their work into the team's integration sandbox where they rebuild and rerun the tests, fixing any problems which they find. To update the database schema, the pair runs the appropriate change and migration scripts in the appropriate order.

This promotion strategy continues into your pre-production integration testing environment and then eventually into production. Depending on your need, you could implement and then deploy the refactoring within a single day, although more realistically it would be several months until the next major release of your application that you would deploy the refactoring along with any other updates that you've made.

After the transition period, you remove the original column plus the trigger(s), resulting in the final database schema of Figure 3. You remove these things only after sufficient testing to ensure that it is safe to do so. At this point, your refactoring is complete.

Figure 3. The final database schema for Customer.

There is a little more to successfully implementing a database refactoring than what I've described. You need a way to coordinate the refactoring efforts of all the development teams within your organization, clearly something that may prove quite difficult. You also need to get good at deploying refactorings in production, once again coordinating the efforts of several teams. In Refactoring Databases [3], my co-author Pramod Sadalage and I discuss several strategies for doing each of these things.

4. Why Not Just Get it Right to Begin With?

I am often told by existing data professionals that the real solution is to model everything up front, and then you would not need to refactor your database schema. Although that is an interesting vision, and I have seen it work in a few situations, experience from the past three decades has shown that this approach does not seem to be working well in practice for the overall IT community. The traditional approach to data modeling does not reflect the evolutionary approach of modern methods such as the RUP and XP, nor does it reflect the fact that business customers are demanding new features and changes to existing functionality at an accelerating rate. The old ways simply aren't sufficient any more, if they ever were [8].

I suggest that you take an Agile Model-Driven Development (AMDD) approach [9, 10], in which you do some high-level modeling to identify the overall "landscape" of your system, and then model storm the details on a just-in-time (JIT) basis. Take advantage of the benefits of modeling without suffering from the costs of over-modeling, over-documentation, and the resulting bureaucracy of trying to keep too many artifacts up-to-date and synchronized with one another. Your application code and your database schema evolve as your understanding of the problem domain evolves, and you maintain quality through refactoring both.

5. In Conclusion

Database refactoring is a database implementation technique, just like code refactoring is an application implementation technique. You refactor your database schema to ease additions to it. You often find that you have to add a new feature to a database, such as a new column or stored procedure, but the existing design is not the best one possible to easily support that new feature. You start by refactoring your database schema to make it easier to add the feature, and after the refactoring has been successfully applied, you then add the feature. The advantage of this approach is that you are slowly, but constantly, improving the quality of your database design. This process not only makes your database easier to understand and use, it also makes it easier to evolve over time; in other words, you improve your overall development productivity.

My experience is that data professionals can benefit from adopting modern evolutionary techniques similar to those of developers, and that database refactoring is one of several important skills that data professionals require. Evolutionary development has arguably become the norm within the IT community, and agile software development approaches extend evolutionary methods to become more effective. My advice to data professionals is to take evolutionary and agile concepts and techniques seriously: they're real, they work, and they're here to stay.

6. References and Recommended Reading

    1. Fowler, M. (1999). Refactoring: Improving the Design of Existing Code. Menlo Park, California: Addison Wesley Longman, Inc.
    2. Ambler, S.W. (2003). Agile Database Techniques: Effective Strategies for the Agile Software Developer. New York: John Wiley & Sons. www.ambysoft.com/books/agileDatabaseTechniques.html
    3. Ambler, S.W. and Sadalage, P.J. (2006). Refactoring Databases: Evolutionary Database Design. Boston: Addison Wesley. www.ambysoft.com/books/refactoringDatabases.html
    4. Larman, C. (2004). Agile and Iterative Development: A Manager's Guide. Boston: Addison-Wesley.
    5. Astels D. (2003). Test Driven Development: A Practical Guide. Upper Saddle River, NJ: Prentice Hall.
    6. Beck, K. (2003). Test Driven Development: By Example. Boston, MA: Addison Wesley.
    7. Ambler, S.W. (2004). Introduction to Test Driven Development (TDD). www.agiledata.org/essays/tdd.html
    8. Ambler, S.W. (2004). The Agile Data Home Page. www.agiledata.org.
    9. Ambler, S.W. (2002). Agile Modeling: Best Practices for the Unified Process and Extreme Programming. New York: John Wiley & Sons. www.ambysoft.com/books/agileModeling.html
    10. Ambler, S.W. Agile Model Driven Development (AMDD). www.agilemodeling.com/essays/amdd.htm

taiyi_interview(Introduction To Database Refactoring)的更多相关文章

  1. 数据库设计(二)Introduction to Database Design

    原文链接:http://www.datanamic.com/support/lt-dez005-introduction-db-modeling.html Introduction to Databa ...

  2. Database ORM

    Database ORM Introduction Basic Usage Mass Assignment Insert, Update, Delete Soft Deleting Timestamp ...

  3. Harvard数据库课程CS 265: Research Topics in Database Systems

    CS 265: Research Topics in Database Systems Announcements Quiz 3 will be posted. Good luck! Quiz 2 h ...

  4. Linux新手必看:浅谈如何学习linux

    本文在Creative Commons许可证下发布 一.起步 首先,应该为自己创造一个学习linux的环境--在电脑上装一个linux或unix问题1:版本的选择 北美用redhat,欧洲用SuSE, ...

  5. 新手学习Linux之快速上手分析

    一.起步 首先,应该为自己创造一个学习linux的环境--在电脑上装一个linux或unix 问题1:版本的选择 北美用redhat,欧洲用SuSE,桌面mandrake较多,而debian是技术最先 ...

  6. [转载] Linux新手必看:浅谈如何学习linux

    本文转自 https://www.cnblogs.com/evilqliang/p/6247496.html 本文在Creative Commons许可证下发布 一.起步 首先,应该为自己创造一个学习 ...

  7. (009)每日SQL学习:Oracle各个键说明(转)

    原文地址:http://www.agiledata.org/essays/keys.html 本文概述关系数据库中为表指定主键的策略.主要关注于何时使用自然键或者代理键的问题.有些人会告诉你应该总是使 ...

  8. Distributed Databases and Data Mining: Class timetable

    Course textbooks Text 1: M. T. Oszu and P. Valduriez, Principles of Distributed Database Systems, 2n ...

  9. MongoDB之bson的介绍

    MongoDB之bson的介绍 1. 什么是bson BSON是一种类json的一种二进制形式的存储格式,简称Binary JSON,它和JSON一样,支持内嵌的文档对象和数组对象,但是BSON有JS ...

随机推荐

  1. 【java】分页查询实体类

    package com.dmsd.itoo.tool.pageModel; import java.io.Serializable; import java.util.HashMap; import ...

  2. Mono自定义图片按钮

    首先,我们编写一个MyImageButton类,继承自LinearLayout public class MyPhoneImageButton:LinearLayout { private Image ...

  3. 7、IMS - DNS & ENUM

    1.相关基础SBC:http://blog.sina.com.cn/s/blog_7a6f76080100vp9r.html 2.ENUM/DNS查询过程:http://blog.sina.com.c ...

  4. JS传递对象数组为参数给后端,后端获取

    前端JS代码: var conditons = []; var test1 = new Object(); test1.name="1"; test1.id="2&quo ...

  5. String类常用方法

    1.String类的特点,字符串一旦被初始化就不会被改变. 2.String对象定义的两种方式 ①String s = "affdf";这种定义方式是在字符串常量池中创建一个Str ...

  6. Swift 3 中的访问控制 open public internal fileprivate private

    Swift 3必看:新的访问控制fileprivate和open http://www.jianshu.com/p/604305a61e57 浅谈 Swift 3 中的访问控制 https://mai ...

  7. Android常用客户端测试工具

    Emmagee GT iTest PowerTutor 网速限制 Root Explorer ApkEditor 陆续添加...

  8. 时间序列分析之ARIMA模型预测__R篇

    http://www.cnblogs.com/bicoffee/p/3838049.html

  9. Python自动化 【第十四篇】:HTML介绍

    本节内容: Html 概述 HTML文档 常用标签 2. CSS 概述 CSS选择器 CSS常用属性 1.HTML 1.1概述 HTML是英文Hyper Text Mark-up Language(超 ...

  10. JVM GC原理

    JVM原理 1.分代回收(目前JDK都采用此方式) 采用分治的思想,进行代的划分,把不同生命周期的对象放在不同代上,不同代上采用最适合它的垃圾回收方式进行回收.非堆区有CMS Perm Gen(持久化 ...