Reference: http://blogs.msdn.com/b/felixmar/archive/2011/02/14/partitioning-amp-archiving-tables-in-sql-server-part-1-the-basics.aspx

Database partitioning is a feature available in SQL Server(version 2005 and Up) which lets you split a table among multiple files which can be beneficial for large tables, especially for those which hold historical data. Using partitioning You can also simplify data management  ( like storage size and placement, backups strategy etc.) as well as index management, making queries run faster when working with smaller ranges of data. Data partitioning is really easy to implement if you clearly understand how it works and if you plan your strategy carefully. You can also benefit from this feature to archive old data in a very easy way.

You can find much information about data partitioning on the Internet including the examples from MSDN and BOL, however there are important considerations you must care about and experiment with different data scenarios before implementing data partitioning in a production environment. This article focuses on these topics as well as partition management including split and merge partitions and the switch partition option.

Filegroups and Data Files

As you know, a database contains a data file (.MDF) and a transaction log file (.LDF), however you can add additional files to the database,these files are called secondary files (.NDF) and can be also use to store data rows. The files are assigned to filegroups in the same way the files are assigned to folders in the a file system, however, when assign several files to the same filegroup, data is distributed evenly  between all of them, in a round-robin way.To begin the example, let´s create a sample database called PartitionDB with 3 filegroups and 3 additional files (.NDF).

In SQL Server Management Studio, right click on Databases –> New Database. You can add the filegroups in the Filegroups page or you can add the files (.NDF) in the Files page and add the filegroup at the same time in the Files page. Just go to Files Page, click the Add Button, in the Logical Name column, type a name for the File, click on Filegroup column, select <new Filegroup> and type a name for the filegroup:

Define partitions

Now, we need decide how to partition our data. Is different the scenario where you want to partition a table which already contains data from a scenario where you are creating a new table. In the first case, you must be aware of the primary key in terms of how is build, which column (s) participate and the data types, you must also care about the unique  index which the Primary key builds automatically . In the case you are building a new table from scratch, you can decide how to create the table, the primary key and the partitioning strategy, in the first case you will be limited to what the table design allows.

In this examples we are going to create  2 new tables, the tables will simulate order information. The first example will implement partitioning by using OrderID as the partition column while the second example will use OrderDate. The partition column is called Partition Key and represents the criteria used to partition the information.

Partition Function

The partition function creates the ranges of data (partitions) according to values defined inside its body. This values are the boundaries which in turn produce the ranges. The function takes a data type as a parameter according to the column you selected as the partition key. When you define the boundaries for the partitions, you can use RIGHT or LEFT to define the limits of each partition.

LEFT means each value represents the upper limit for each range, in other words, each range goes from each boundary value to the LEFT.

RIGHT means each value represent the lower limit for each range but the first range will start to the right of the first value, so there will be an additional range  (the first range) and the first boundary value in the function defines where the second range begins.

To better understand,  suppose you have 10 000 records and want to create 3 ranges, the first range goes from 1-1000, the second from 1001 to 5000 and the third from 5001 to 10000. Since the partitions are based on numbers, you should pass a numeric datatype as an argument to the function.

Let´s create a function named pfRecordsRange in order to do this:

Note that the function uses LEFT for defining ranges, as I said before when using left each value represents the upper limit. 1000 is the upper limit for the first range, 5000 for the second and 10000 for the third.

If the first record starts in 1, this would be the lower limit for the first range (1-1000) , 10001 would be the lower limit for the second range (1001 – 5000)  and 5001 for the 3rd (5001 – 10 000)

But look, what if more records are inserted outside the limits? for example where would the record 10001 be placed?

well, in fact another range is automatically generated for all the records that are greater than the  last value defined in the function (>10000) and this will be the last range, so the record 10001 will be placed here.

Another question: If the value –1 would be inserted, where would it be placed?. The answer is: In the first range.

In fact the first and the last ranges have no limits, both have just one boundary, so consider  to add a check constraint in your table to restrict the values that can be inserted in these ranges.

Another important consideration: You will need an additional filegroup to hold the data for the additional range.  You will always need to consider 1 additional filegroup apart from those required by the function values. If your function includes n values, you will need n+1 filegroups.

This requirement is just to be sure the function will be able to place each partition in an available filegroup but it does not really mean there will be data for the last filegroup, so you could use the PRIMARY file group as the target for the last partition.

Returning to our example, If I use RIGHT instead of LEFT the first boundary value (1000) in the function would be the lower limit for the second range and the first range would include all records which are less than 1000, the last range would include the  number 10000, that is mean >=10000.

The following image shows the difference when using LEFT and RIGHT in terms of the ranges they produce:

If LEFT is used, the first value in the function will be part of the first partition. If RIGHT is used, it will be part of the second partition, also look how the ranges are defined in each case. So, the only difference is the partition each value belongs to. The symblos <,<=,>,>=, define this boundaries.

Right or Left?

This is a common question, and the answer is: it all depends how you want to define your partitions,  In my case, when I work with numeric ranges I prefer to use LEFT because usually I´ll want to include the upper limit in each range (1 to 10,, 11 to 20, etc.) however when I work with dates I prefer to use RIGHT because I prefer to define ranges by the start date not by the end date, for example If I want 3 partitions for each year (one for 2009, another for 2010 and the last for  2011) is easier to consider the start value for each range: 2009/01/01, 2010/01/01, etc. instead of the end range. Look at the script for this partition function:

Partition Schema

Once you have created the partition function, the next step is creating the partition schema. The partition schema uses the partition function to redirect the partitions defined by the function to the appropriate filegroups. Remember, if you have 3 values defined in the function it will produce 4 ranges (partitions).

In this case, assume we want a partition schema for the function pfOrderDateRange.

The schema would looks like this:

Note that I used PRIMARY for the last filegroup since no data will be stored here.

Partitioned Table

The last step is creating the table using the partition schema to store data in a partitioned fashion. When you create a table, you can define where to store the table (CREATE TABLE <name> (…) ON ). If you omit the ON part of the statement, the table will be created on the default filegroup which is usually the PRIMARY filegroup. In this case, we will tell the table to store the data in several partitions which is the job of the partition schema, so simply put the partition schema name after the ON clause and add the column name as the argument.

Before showing the script let´s talk about the primary key column.

When you add a primary key constraint to a table an index is also built on this column. The index may be clustered or non clustered (you can choose) but it must be unique. This unique index is used is to reinforce rows uniqueness. We will not discuss index structures here but remember this: if the Primary Key contains a clustered index, the table will be sorted using the index column (or columns). I f the Primary Key contains a non clustered index, an additional sorted structure will be created and the table data will remain as is.

Index and Data Alignment

The best practice when creating a partitioned table is to partition the index the same way you partition the data, this is called aligned index. But to be able to align index to data rows you must include the partition key (the column by which you are partitioning) in the index definition. If you do this, each index portion will be stored in the same data file (.ndf) where the corresponding data portion is stored, this way, if you query a table using a range expression (WHERE order date BETWEEN…), SQL Server will only use the corresponding index data for the query. The execution plan will be more efficient. Also, you can backup filegroups and create a more efficient recovery strategy because index are contained in the same filegroup. When indexes are aligned, You can move a partition to another table without moving rows just the metadata in just seconds!. It is highly recommended align indexes in order to benefit from this features.

Since this is the best scenario possible, lets create a primary Key for our table (orders)  using OrderID as well as OrderDate which is the partition key. Look at the script:

That´s all, now when you insert data into your table, each row is evaluated using the partition function to identify the corresponding partition, Partition schema will store the data to the corresponding filegroup using the partition function and the index will be stored in the same filegroups as the corresponding data.

Populate and Verify Data

Now let´s insert some rows in our table using this simple script:

I´ll run the script tree times, each using the values ‘2008/01/01’, ‘2009/01/01’ and ‘2010/01/01’ for the@OrderDate variable in order to insert 900 rows.

Finally, use the following script which uses the transact $PARTITION to verify how many rows are inserted in each partition as well as the data boundaries:

You can also view the table properties to verify data partitioning:

And look at the index properties:

If you wish, you can create the following view (taken from SQL Server 2008 Internals by Kalen Delaney, Paul S. Randal, Kimberly L. Tripp, and Conor Cunningham):

and then query the view to obtain the following results:

This view is similar to the other query but is very useful because you can also see the name of the filegroup and the name of the table where each partitions is stored,  you will appreciate this information when merging and splitting data, without this info, is difficult to see where the data was moved.

This is the simplest procedure for data partitioning, in the next post I´ll talk about split and merge data,  and what to do when you can’t align indexes as well as other scenarios.

Partitioning & Archiving tables in SQL Server (Part 1: The basics)的更多相关文章

  1. Partitioning & Archiving tables in SQL Server (Part 2: Split, Merge and Switch partitions)

    Reference: http://blogs.msdn.com/b/felixmar/archive/2011/08/29/partitioning-amp-archiving-tables-in- ...

  2. Part 17 Temporary tables in SQL Server

    Temporary tables in SQL Server

  3. sql server:compare data from two tables

    --Comparing data between two tables in SQL Server --Create two Tables-- CREATE TABLE TableA(ID Int, ...

  4. Microsoft SQL Server Trace Flags

    Complete list of Microsoft SQL Server trace flags (585 trace flags) REMEMBER: Be extremely careful w ...

  5. 人人都是 DBA(II)SQL Server 元数据

    SQL Server 中维护了一组表用于存储 SQL Server 中所有的对象.数据类型.约束条件.配置选项.可用资源等信息,这些信息称为元数据信息(Metadata),而这些表称为系统基础表(Sy ...

  6. SQL Server主要系统视图说明

    SELECT * FROM sys.all_columns --显示属于用户定义对象和系统对象的所有列的联合--https://docs.microsoft.com/zh-cn/sql/relatio ...

  7. SQL Server:INFORMATION_SCHEMA.columns 与sys.columns 与 syscolumns对比

    sys.columns视图 sys.columns是SQL Server从2005版本起引入的新的系统级视图.相关链接如下: Mapping SQL Server 2000 System Tables ...

  8. Understanding how SQL Server executes a query

    https://www.codeproject.com/Articles/630346/Understanding-how-SQL-Server-executes-a-query https://ww ...

  9. Describe in brief Databases and SQL Server Databases Architecture.

    Databases- A database is a structured collection of data.- Database can be thought as simple data fi ...

随机推荐

  1. (转)COM组件里的AddRef()

    D3D是 COM组件,它在服务进程中运行,而不在当前的客户进程中.在DX组件运行过程中,要创建一系列接口对象,如CreateDevice()返回接口指针,这些接口及其占用内存什么时候释放,要通过“引用 ...

  2. 今天第一次接触到typescript,看了第一个知识点就是变量的声明,来回忆回忆,做做笔记

    以前只用过JavaScript原生写网站特效,今天还是第一次听说typescript的,然后看了一下它的基本知识,感觉很像Java,真的太像了,但是又有不同点.很让我惊奇看到的第一个知识点就和以前不同 ...

  3. split,slice,splice,replace的用法

    split()方法用于把一个字符串分割成字符串数组 str.split("字符串/正则表达式从该参数制定额地方分割str",可选,可指定返回数组的最大长度,如果没设置参数,整个字符 ...

  4. Java框架重量级,轻量级的问题?

    一般认为,SSH为重量级.SSI为轻量级. 但轻重的概念怎么界定?

  5. eclipse连接mysql,插入数据时乱码

    问题:如果eclipse中项目的编码方式为utf-8 插入数据后,在数据库中查看后,汉字出现乱码情况 解决方法: 1.在获取连接的时候将conn = DriverManager.getConnecti ...

  6. 高效能人士必知铁律--note

    偶然看到了<高效能人士 必知铁律>这本书,我比较少看成功学,但是这本书把很多著名的成功学书籍整理出来,有时会让你耳目一新,有些观点尽管是常识,但是却加深了你对它们的理解,比如: 只要在积极 ...

  7. grep命令学习

    grep(Globally search a Regular Expression and Print), 全面搜索正则表达式并把行打印出来,是一种强大的文本搜索工具,它能使用正则表达式搜索文本,并把 ...

  8. 2016HUAS_ACM暑假集训4B - 递推

    这种数学推理题目题意极其明显,在做的时候,可以多写几组,这样找起规律来会容易些.概括起来就是:题意简单暴力,案例毫无价值. 一个三角形最多可以把一个平面分成两部分,两个三角形最多是8(2+6)部分,而 ...

  9. IntelliJ IDEA 的 20 个代码自动完成的特性

    http://www.oschina.net/question/12_70799 在这篇文章中,我想向您展示 IntelliJ IDEA 中最棒的 20 个代码自动完成的特性,可让 Java 编码变得 ...

  10. 性能测试知多少---系统架构分析 转自https://yq.aliyun.com/articles/35147?spm=5176.100239.blogcont24251.8.lS96At

    摘要: 有些事儿一旦放一放就难再拾起来,突然发现<性能测试知多少>这个系列两月没更新,关键时我都不知道啥时候放下的,总容易被各种技术所吸引走,如饥似渴的想学更多的东西,这几天一直有朋友问我 ...