Question: Which column type to use?
Answer: It depends ….
Explanation: SQL Server stores fixed width and variable length columns different. When storing a VARCHAR, there can be 2-byte overhead for each column plus an additional 2-byte overhead for the row. SQL Server does not record the length of the data saved in a VARCHAR field. That is when saving someone’s first name or last name in a VARCHAR(50) field it doesn’t actually keep track of if it was 10 characters or 20 characters; what it records is the ending offset for the column. So if the field value is NULL it points to the end offset points to start of the next columns value. But when we store CHAR field it takes up the space whether we have any value in the field or not. That is with CHAR even NULL fields take up space. But VARCHAR depending on where they are in the field list may or may not take up space.
I created the following table with three VARCHAR columns:
CREATE TABLE FixVarWidthOnly
(Col1 INT NULL,
Col2 VARCHAR(10) NULL,
Col3 VARCHAR(10) NULL,
Col4 VARCHAR(10) NULL )
GO
And inserted following value into it:
INSERT INTO FixVarWidthOnly VALUES (1, 'AAAAAAAAAA', 'BBBBBBBBBB', 'CCCCCCCCCC')
Now if I look at the page I get the following information:
00000000: 30000800 01000000 0400f003 001d0027 †0…………..’
00000010: 00310041 41414141 41414141 41424242 †.1.AAAAAAAAAABBB
00000020: 42424242 42424243 43434343 43434343 †BBBBBBBCCCCCCCCC
00000030: 43†††††††††††††††††††††††††††††††††††C
This page dump shows number of columns, my data, and tells me which columns are null. Let’s look at the bolded section …
01000000 | 0400 | F0 | 0300 | 1D00 | 2700 | 3100 |
This starts with the integer value, number of columns, NULL Bitmap, Array size of variable length columns ending offset; followed by the three offset values indicating where the column value finished. So to store all three columns it took (2 Bytes+(# VARCHAR Columns*2) ) = 8 Bytes of overhead to store this information.
Question: So we had value in each of the variable length columns; what if they were null then do we still have the 2-byte over head for the column?
Answer: It Depends! If the column that was null was at the end of the list of three columns then we don’t have the two byte over head.
Explanation: So with same table above I inserted following rows and looked at the information stored in the page:
INSERT INTO FixVarWidthOnly VALUES (2 , 'AAAAAAAAAA', 'BBBBBBBBBB', NULL )
INSERT INTO FixVarWidthOnly VALUES (3 , 'AAAAAAAAAA', NULL, NULL )
INSERT INTO FixVarWidthOnly VALUES (4 , NULL, NULL, NULL )
INSERT INTO FixVarWidthOnly VALUES (5 , NULL, 'BBBBBBBBBB', 'CCCCCCCCCC' )
INSERT INTO FixVarWidthOnly VALUES (6 , 'AAAAAAAAAA', NULL, 'CCCCCCCCCC' )
INSERT INTO FixVarWidthOnly VALUES (7 , NULL, NULL, 'CCCCCCCCCC' )
Row 1 (Original Row):
00000000: 30000800 01000000 0400f003 001d0027 †0…………..’
00000010: 00310041 41414141 41414141 41424242 †.1.AAAAAAAAAABBB
00000020: 42424242 42424243 43434343 43434343 †BBBBBBBCCCCCCCCC
00000030: 43†††††††††††††††††††††††††††††††††††C
Row 2 (Last value NULL):
00000000: 30000800 02000000 0400f802 001b0025 †0…………..%
00000010: 00414141 41414141 41414142 42424242 †.AAAAAAAAAABBBBB
00000020: 42424242 42††††††††††††††††††††††††††BBBBB
Row 3 (Last two values NULL):
00000000: 30000800 03000000 0400fc01 00190041 †0…………..A
00000010: 41414141 41414141 41†††††††††††††††††AAAAAAAAA
Row 4 (All VARCHAR columns NULL):
00000000: 10000800 04000000 0400fe†††††††††††††………..
Row 5 (First column NULL):
00000000: 30000800 05000000 0400f203 0013001d †0……………
00000010: 00270042 42424242 42424242 42434343 †.’.BBBBBBBBBBCCC
00000020: 43434343 434343††††††††††††††††††††††CCCCCCC
Row 6 (Middle column NULL):
00000000: 30000800 06000000 0400f403 001d001d †0……………
00000010: 00270041 41414141 41414141 41434343 †.’.AAAAAAAAAACCC
00000020: 43434343 434343††††††††††††††††††††††CCCCCCC
Row 7 (First two VARCHAR columns NULL):
00000000: 30000800 07000000 0400f603 00130013 †0……………
00000010: 001d0043 43434343 43434343 43††††††††…CCCCCCCCCC
I highlighted all the key values before VARCHAR data information starts, below is summary:
Row Number | Column Count | NULL Bitmap | Array Size | 1st Offset | 2nd Offset | 3rd Offset | Overhead |
01000000 | 0400 | F0 | 0300 | 1D00 | 2700 | 3100 | 8 |
02000000 | 0400 | F8 | 0200 | 1B00 | 2500 | 6 | |
03000000 | 0400 | FC | 0100 | 1900 | 4 | ||
04000000 | 0400 | FE | 0 | ||||
05000000 | 0400 | F2 | 0300 | 1300 | 1D00 | 2700 | 8 |
06000000 | 0400 | F4 | 0300 | 1D00 | 1D00 | 2700 | 8 |
07000000 | 0400 | F6 | 0300 | 1300 | 1300 | 1D00 | 8 |
As you see in above example if the NULL VARCHAR columns are at not at the end in the table it takes the maximum number of bytes over head as if there was actual data in the column.
Question: So how is CHAR field stored in the SQL Server?
Answer: They are stored like integer field, because they are fixed width; they take fixed space whether the value is null or not.
This time I create a table with FIXED width CHAR type:
CREATE TABLE FixWidthOnly
(Col1 INT NULL ,
Col2 CHAR(10) NULL,
Col3 CHAR(10) NULL,
Col4 CHAR(10) NULL )
GO
And I create similar set of rows in the new table:
INSERT INTO FixWidthOnly VALUES (1 , 'AAAAAAAAAA', 'BBBBBBBBBB', 'CCCCCCCCCC' )
INSERT INTO FixWidthOnly VALUES (2 , 'AAAAAAAAAA', 'BBBBBBBBBB', NULL )
INSERT INTO FixWidthOnly VALUES (3 , 'AAAAAAAAAA', NULL, NULL )
INSERT INTO FixWidthOnly VALUES (4 , NULL, NULL, NULL )
INSERT INTO FixWidthOnly VALUES (5 , NULL, 'BBBBBBBBBB', 'CCCCCCCCCC' )
INSERT INTO FixWidthOnly VALUES (6 , 'AAAAAAAAAA', NULL, 'CCCCCCCCCC' )
INSERT INTO FixWidthOnly VALUES (7 , NULL, NULL, 'CCCCCCCCCC' )
I take the output of the page:
Row 1 (Nothing Null):
00000000: 10002600 01000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 f0†††††††††††††††††CCCCCC…
Row 2 (Last value NULL):
00000000: 10002600 02000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 f8†††††††††††††††††CCCCCC…
Row 3 (Last two values NULL):
00000000: 10002600 03000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 fc†††††††††††††††††CCCCCC…
Row 4 (All CHAR columns NULL):
00000000: 10002600 04000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 fe†††††††††††††††††CCCCCC…
Row 5 (First column NULL):
00000000: 10002600 05000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 f2†††††††††††††††††CCCCCC…
Row 6 (Middle column NULL):
00000000: 10002600 06000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 f4†††††††††††††††††CCCCCC…
Row 7 (First two CHAR columns NULL):
00000000: 10002600 07000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 f6†††††††††††††††††CCCCCC…
If you notice they all look exactly the same; so how does SQL know which field is null it’s by the NULL Bitmap value. The value that is bolded for each of the 7 rows above; for example a value F6 translated to Binary is 11110110; indicating first and forth row are not null. But the actual page is taking the full 30 bytes each time for CHAR whether it is empty or not.
Final notes:
- Fix-width character fields use space whether you have anything saved field or not; so if there is a percentage of rows that might be null consider using VARCHAR. For example if 50% of the rows are going to be null then the varchar(2) is equivalent to char(2); you can see the minimum field length required before varchar(2) becomes as good as char by using FS = (2/%Null) – 2. (** Note this is based on just column size not row size **).
- Variable length character fields take at minimum two bytes of overhead to store.
- Variable fields should be ordered in the table by least nullable to most nullable; i.e. fields that have higher null value should come later in the table.
References:
Inside Microsoft SQL Server 2005: The Storage Engine, Kalen Delaney.
How to use DBCC Page, Microsoft SQL Server Storage Engine Internals Blog, Link.
How Does SQL Server Engine find Variable Length Columns, Paul S. Randel, Link.