Archive

Archive for the ‘SQL Internals’ Category

VARCHAR or CHAR?

April 13, 2009 Leave a comment

Question: Which column type to use?
Answer: It depends ….

Explanation: SQL Server stores fixed width and variable length columns different. When storing a VARCHAR, there can be 2-byte overhead for each column plus an additional 2-byte overhead for the row. SQL Server does not record the length of the data saved in a VARCHAR field. That is when saving someone’s first name or last name in a VARCHAR(50) field it doesn’t actually keep track of if it was 10 characters or 20 characters; what it records is the ending offset for the column. So if the field value is NULL it points to the end offset points to start of the next columns value. But when we store CHAR field it takes up the space whether we have any value in the field or not. That is with CHAR even NULL fields take up space. But VARCHAR depending on where they are in the field list may or may not take up space.

I created the following table with three VARCHAR columns:

CREATE TABLE FixVarWidthOnly
(Col1 INT NULL,
Col2 VARCHAR(10) NULL,
Col3 VARCHAR(10) NULL,
Col4 VARCHAR(10) NULL )
GO

And inserted following value into it:

INSERT INTO FixVarWidthOnly VALUES (1, 'AAAAAAAAAA', 'BBBBBBBBBB', 'CCCCCCCCCC')

Now if I look at the page I get the following information:

00000000: 30000800 01000000 0400f003 001d0027 †0…………..’
00000010: 00310041 41414141 41414141 41424242 †.1.AAAAAAAAAABBB
00000020: 42424242 42424243 43434343 43434343 †BBBBBBBCCCCCCCCC
00000030: 43†††††††††††††††††††††††††††††††††††C

This page dump shows number of columns, my data, and tells me which columns are null. Let’s look at the bolded section …

01000000 0400 F0 0300 1D00 2700 3100

This starts with the integer value, number of columns, NULL Bitmap, Array size of variable length columns ending offset; followed by the three offset values indicating where the column value finished. So to store all three columns it took (2 Bytes+(# VARCHAR Columns*2) ) = 8 Bytes of overhead to store this information.

Question: So we had value in each of the variable length columns; what if they were null then do we still have the 2-byte over head for the column?
Answer: It Depends! If the column that was null was at the end of the list of three columns then we don’t have the two byte over head.

Explanation: So with same table above I inserted following rows and looked at the information stored in the page:

INSERT INTO FixVarWidthOnly VALUES (2 , 'AAAAAAAAAA', 'BBBBBBBBBB', NULL )
INSERT INTO FixVarWidthOnly VALUES (3 , 'AAAAAAAAAA', NULL, NULL )
INSERT INTO FixVarWidthOnly VALUES (4 , NULL, NULL, NULL )
INSERT INTO FixVarWidthOnly VALUES (5 , NULL, 'BBBBBBBBBB', 'CCCCCCCCCC' )
INSERT INTO FixVarWidthOnly VALUES (6 , 'AAAAAAAAAA', NULL, 'CCCCCCCCCC' )
INSERT INTO FixVarWidthOnly VALUES (7 , NULL, NULL, 'CCCCCCCCCC' )

Row 1 (Original Row):
00000000: 30000800 01000000 0400f003 001d0027 †0…………..’
00000010: 00310041 41414141 41414141 41424242 †.1.AAAAAAAAAABBB
00000020: 42424242 42424243 43434343 43434343 †BBBBBBBCCCCCCCCC
00000030: 43†††††††††††††††††††††††††††††††††††C

Row 2 (Last value NULL):
00000000: 30000800 02000000 0400f802 001b0025 †0…………..%
00000010: 00414141 41414141 41414142 42424242 †.AAAAAAAAAABBBBB
00000020: 42424242 42††††††††††††††††††††††††††BBBBB

Row 3 (Last two values NULL):
00000000: 30000800 03000000 0400fc01 00190041 †0…………..A
00000010: 41414141 41414141 41†††††††††††††††††AAAAAAAAA

Row 4 (All VARCHAR columns NULL):
00000000: 10000800 04000000 0400fe†††††††††††††………..

Row 5 (First column NULL):
00000000: 30000800 05000000 0400f203 0013001d †0……………
00000010: 00270042 42424242 42424242 42434343 †.’.BBBBBBBBBBCCC
00000020: 43434343 434343††††††††††††††††††††††CCCCCCC

Row 6 (Middle column NULL):
00000000: 30000800 06000000 0400f403 001d001d †0……………
00000010: 00270041 41414141 41414141 41434343 †.’.AAAAAAAAAACCC
00000020: 43434343 434343††††††††††††††††††††††CCCCCCC

Row 7 (First two VARCHAR columns NULL):
00000000: 30000800 07000000 0400f603 00130013 †0……………
00000010: 001d0043 43434343 43434343 43††††††††…CCCCCCCCCC

I highlighted all the key values before VARCHAR data information starts, below is summary:

Row Number Column Count NULL Bitmap Array Size 1st Offset 2nd Offset 3rd Offset Overhead
01000000 0400 F0 0300 1D00 2700 3100 8
02000000 0400 F8 0200 1B00 2500 6
03000000 0400 FC 0100 1900 4
04000000 0400 FE 0
05000000 0400 F2 0300 1300 1D00 2700 8
06000000 0400 F4 0300 1D00 1D00 2700 8
07000000 0400 F6 0300 1300 1300 1D00 8

As you see in above example if the NULL VARCHAR columns are at not at the end in the table it takes the maximum number of bytes over head as if there was actual data in the column.

Question: So how is CHAR field stored in the SQL Server?
Answer: They are stored like integer field, because they are fixed width; they take fixed space whether the value is null or not.

This time I create a table with FIXED width CHAR type:

CREATE TABLE FixWidthOnly
(Col1 INT NULL ,
Col2 CHAR(10) NULL,
Col3 CHAR(10) NULL,
Col4 CHAR(10) NULL )
GO

And I create similar set of rows in the new table:

INSERT INTO FixWidthOnly VALUES (1 , 'AAAAAAAAAA', 'BBBBBBBBBB', 'CCCCCCCCCC' )
INSERT INTO FixWidthOnly VALUES (2 , 'AAAAAAAAAA', 'BBBBBBBBBB', NULL )
INSERT INTO FixWidthOnly VALUES (3 , 'AAAAAAAAAA', NULL, NULL )
INSERT INTO FixWidthOnly VALUES (4 , NULL, NULL, NULL )
INSERT INTO FixWidthOnly VALUES (5 , NULL, 'BBBBBBBBBB', 'CCCCCCCCCC' )
INSERT INTO FixWidthOnly VALUES (6 , 'AAAAAAAAAA', NULL, 'CCCCCCCCCC' )
INSERT INTO FixWidthOnly VALUES (7 , NULL, NULL, 'CCCCCCCCCC' )

I take the output of the page:

Row 1 (Nothing Null):
00000000: 10002600 01000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 f0†††††††††††††††††CCCCCC…

Row 2 (Last value NULL):
00000000: 10002600 02000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 f8†††††††††††††††††CCCCCC…

Row 3 (Last two values NULL):
00000000: 10002600 03000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 fc†††††††††††††††††CCCCCC…

Row 4 (All CHAR columns NULL):
00000000: 10002600 04000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 fe†††††††††††††††††CCCCCC…

Row 5 (First column NULL):
00000000: 10002600 05000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 f2†††††††††††††††††CCCCCC…

Row 6 (Middle column NULL):
00000000: 10002600 06000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 f4†††††††††††††††††CCCCCC…

Row 7 (First two CHAR columns NULL):
00000000: 10002600 07000000 41414141 41414141 †..&…..AAAAAAAA
00000010: 41414242 42424242 42424242 43434343 †AABBBBBBBBBBCCCC
00000020: 43434343 43430400 f6†††††††††††††††††CCCCCC…

If you notice they all look exactly the same; so how does SQL know which field is null it’s by the NULL Bitmap value. The value that is bolded for each of the 7 rows above; for example a value F6 translated to Binary is 11110110; indicating first and forth row are not null. But the actual page is taking the full 30 bytes each time for CHAR whether it is empty or not.

Final notes:

  • Fix-width character fields use space whether you have anything saved field or not; so if there is a percentage of rows that might be null consider using VARCHAR. For example if 50% of the rows are going to be null then the varchar(2) is equivalent to char(2); you can see the minimum field length required before varchar(2) becomes as good as char by using FS = (2/%Null) – 2. (** Note this is based on just column size not row size **).
  • Variable length character fields take at minimum two bytes of overhead to store.
  • Variable fields should be ordered in the table by least nullable to most nullable; i.e. fields that have higher null value should come later in the table.

References:

Inside Microsoft SQL Server 2005: The Storage Engine, Kalen Delaney.
How to use DBCC Page, Microsoft SQL Server Storage Engine Internals Blog, Link.
How Does SQL Server Engine find Variable Length Columns, Paul S. Randel, Link.

INSERTED and DELETED Logical Tables

February 23, 2009 Leave a comment

The INSERTED and DELETED logical tables that exist in SQL Server and allow for handling the data when information is inserted, updated and deleted in DML Triggers only:

DML trigger statements use two special tables: the deleted table and the inserted tables. SQL Server automatically creates and manages these tables. You can use these temporary, memory-resident tables to test the effects of certain data modifications and to set conditions for DML trigger actions. You cannot directly modify the data in the tables or perform data definition language (DDL) operations on the tables, such as CREATE INDEX. (Books Online, SQL Server 2008).

Below is summary of what special tables get modified with each DML statement.

DML Statement INSERTED DELETED
INSERT X
UPDATE X X
DELETE X

I wanted to know if there anything special happens when I work with single row versus batch. So I test follow cases …

For testing I Created a new table called ‘IDST_Testing’ (IDST = Inserted Deleted Special Table). Below is summary of test cases and records inserted in each of the special tables.

Test Case INSERTED DELETED
Single Insert 1 0
Double Insert – Two Statements 1/per statement 1/per statement
Single Update 1 1
Double Update – Two Statements 1/per statement 1/per statement
Single Delete 0 1
Double Delete – Two Statements 1/per statement 1/per statement
Batch Insert – Two Records 2 0
Batch Update – Two Records 2 2
Batch Delete – Two Records 0 2

All those results are normal; but what was surprising was when I was doing batch INSERTED and DELETED the records were in reverse order.

For example:

I inserted following two records:

John
Mary

But when I looked at INSERTED table it showed:

Mary
John

In my actual table it was in order what I entered in but when processing the INSERTED and DELETED table they are revered. It was same in tables with IDENTITY columns and without.

Another interesting information on these tables:

  1. In SQL Server 2000, these logical tables internally refer to database transaction log to provide data when user queries them.
  2. In SQL Server 2005, these logical tables are maintained in tempdb and they are maintained using the new technology Row versioning.
  3. Accessing of logical tables is much faster in SQL Server 2005 when compared to SQL Server 2000 as the load is removed from transaction log and moved to tempdb.
  4. Logical tables are never indexed. So, if you are going to loop through each and every record available in these tables, then consider copying data of logical tables to temporary tables and index them before looping through.

Ref: http://blog.techdreams.org/2007/01/logical-tables-of-sql-server-inserted.html

Ref: http://www.sqlmag.com/Article/ArticleID/93465/sql_server_93465.html

When creating trigger it is important to keep in mind that sometimes a Batch Import, Update or Delete might happen against the table; if you are referring the logical tables INSERTED and DELETED the data will be in reverse order so you don’t want to cause issues when traversing through these tables in triggers.

Note always keep in mind RBAR when designing triggers; its very easy set up RBAR scenario when working with triggers. For more information on that please read, http://www.simple-talk.com/sql/t-sql-programming/rbar–row-by-agonizing-row/.

%d bloggers like this: