Advertisements

Archive

Archive for the ‘SQL 2012’ Category

Changing @@SERVERNAME causes SQL Server Backups to Fail for AlwaysOn Availability Group

January 13, 2016 2 comments

One of customers changed the value returned from @@SERVERNAME.  SQL Server works no problem, however an unexpected behavior appeared.  Changing the value for @@SERVERNAME, caused the backups to fail.

Looking at the maintenance jobs, found all jobs completed successfully and without issues.  However, upon looking at the database’s statics it states no backups completed.

image

Because the database in question is part of AlwaysOn Availability Group (AG); SQL Server executes sys.fn_hadr_backup_is_preferred_replica to determine if the backup should take place on the current node.  However, it returns value of 0 for all databases, if the preferred replica is set.  Because, the script makes a check that is running on the server that is preferred.  It does this by comparing the value to @@SERVERNAME to value of replica_server_name in sys.availability_replicas.  Because value will never match, it skips the database on both primary and secondary replica.

I have created a Microsoft Connect article (link); asking this little bit of information to be added to Books Online article (link).  There was a request submitted by Ola Hallengren (Blog | Twitter), which was closed as Won’t Fix (link).   Please vote!

This post is cross posted on my SQLCAN Blog, MSDN Blog, and SQL server Consultation blog.

Advertisements

SQL Server 2012 NUMA Node Imbalance–Conclusion

October 4, 2015 Leave a comment

Final post in the series, I hope to summarize my findings for everyone.  Thanks for all the comments, feedback and comments I have received to-date on these posts.

Before providing the summary of all the testing, research and reading, I want to provide some internals for folks.

NUAM Memory Allocation

If a server has NUMA configuration, when we define the max memory setting.  All memory gets allocated evenly from all NUMA nodes.  For example, if I allocate 8GB from a server that has 4 NUMA nodes, it will allocate 2GB from each NUMA node for the instance in question. So be careful, if you are using CPU affinity.  As even using CPU affinity, lets say you only want to use NUMA 1 for an instance; this can lead to foreign memory access. NUMA allocation still happens across all NUMA nodes (at least in SQL Server 2014).  So even though you might only have subset of schedulers online, memory will still be accessed across all NUMA Nodes.  Even though with new hardware foreign memory access does not have major impact on performance.  Still recommend avoiding it.  If you must use CPU affinity, then allocate sub-set of the processor from each NUMA node.

Lazy Writer Responsibility

Lazy Writer is back ground process and a thread is created per physical NUMA node (for Soft-NUMA, we only get one thread, you can read details here by Jonathan Kehayias (Blog|Twitter)). Lazy Writer responsibility is to make sure sufficient number of free pages, about 640 pages, are always available for new data to be loaded.  If Lazy Writer thread is spinning, that is usually a clear signal of internal memory pressure.

Note: Internal Memory pressure does not mean server doesn’t have enough memory.  There are multiple factors that can contribute to this, one fact is low Max Server Memory, which is what has been explored in these articles.

Original Scenario: SQL Server 2012 Latest Build + Windows 2008 R2 Latest Build + Locked Pages In Memory + Low Max Server Memory + AMD Processors + NUMA Configuration

In this build, I can consistently reproduce the issue.  If I set the Max Memory low enough, it caused the Lazy Writer on NUMA node 0 to spin non-stop, as I documented in SQL Server 2012 NUMA Node Imbalance – Cont’d.  To further to this I understand some additional internals around this thanks to colleague who helped here.  As I was not true sure what is happening.

If we looked at the DBCC MEMORYSTATUS, in particular we notice following for Node 0:

Memory node Id = 0                       KB
—————————————- ———–
VM Reserved                              127086340
VM Committed                             642452
Locked Pages Allocated                   55996
Pages Allocated                          54008
Pages Free                               0
Target Committed                         512000
Current Committed                        698448
Foreign Committed                        0
Away Committed                           0
Taken Away Committed                     0

Review the two numbers high-lighted above, Target memory is what this NUMA node is allocated and Current is what is NUMA consuming, therefore the Page Free count is 0.  Thus Lazy Writer spinning to free up memory.  However it is unable, thus spinning indefinitely causing CPU to pin.  Question to answer, why can’t Lazy Writer free up memory from NUMA Node 0?

Looking further into MEMORYSTATUS, we also notice, that most of the memory allocated into NUMA Node 0 is for buffer cache:

MEMORYCLERK_SQLBUFFERPOOL (node 0)       KB
—————————————- ———–
VM Reserved                              103971820
VM Committed                             524288
Locked Pages Allocated                   2524
SM Reserved                              0
SM Committed                             0
Pages Allocated                          8

Test Scenario #1: Windows 10 + SQL Server 2012 RTM + Low Max Server Memory + Intel Processors + No-NUMA (Physical)

In this, I was just trying to reproduce the issue on my desktop, with no luck.  Memory allocation or such were nominal.

Test Scenario #2: Windows 2012 + SQL Server 2014 RTM + Low Max Server Memory + Intel Processors + No-NUMA (Hyper-V)

Again no issue, as per the title of the series, NUMA Imbalance.  It was obvious and expected this two scenarios don’t show any issues.  But I wanted to confirm it.

Test Scenario #3: SQL Server 2008 R2 + Windows 2008 R2 + Low Max Server Memory + AMD Processors

These test was on same servers as original scenario, however we did not notice any issue.

*** During testing, we realized an interesting fact about NUMA configuration in SQL Server 2008 R2.  SQL Server swaps NUMA 0 and NUMA 1 CPU mapping when creating a logical mapping for schedulers.  For example, if you have 2 NUMA Nodes each with 4 cores; SQL Server binds NUMA 0 – Processor 0 – 3 to Scheduler 4 – 7 and NUMA 1 – Processor 4 – 7 to Scheduler 0 – 3.  This is considered NUMA Swap, documented in this MSDN blog article here by CSS Team.  However this behavior does not seem to persist in SQL Server 2012.

Test Scenario #4: SQL Server 2014 + Windows 2012 R2 + Low Max Server Memory + Intel Processors + NUMA (Azure IaaS Virtual Machine)

This was interesting case, when I set the Max Server Memory to low value, the NUMA node 0’s processor time started to increase.  However it was not as swear as SQL Server 2012, however these are different processors.  I had same behavior as noticed above, Current Committed was higher then Target Committed, therefore NUMA thread on Node 0 was spinning.  However interesting thing note, it was not as aggressive as SQL Server 2012.  It was not continuously spinning, it allowed CPU to breath.  However would wake up frequently as low as ever 20ms.

!!! Eliminated Locked Pages in Memory as a factor for NUMA Imbalance.

Test Scenario #5: SQL Server 2014 + Windows 2012 R2 + Low Max Server Memory + Intel Processors + Locked Pages In Memory + NUMA (Azure IaaS Virtual Machine)

Exact same behavior as above was noticed, however, after restart, SQL Server no longer had Current Committed higher then Target Committed.  Even after lowering the max memory even further then Scenario #4.  It did not cause as dramatic shift.

*** Another interesting thing to note in SQL Server 2014. Up-to SQL Server 2012, Lazy Writer Thread bound to CPU 0 of each NUMA Node.  In SQL Server 2014, Lazy Writer Thread is binding to last CPU of each NUMA Node.

Conclusion & Things to Consider

  • Configuring SQL Server Max Server Memory low memory will cause Lazy Writer to spin due to internal memory pressure issue.
  • There is no value at which point this behavior starts, it depends on the load of the server.  For example, in Original Scenario, due to server load under 4000MB it started spinning the Lazy Writer thread; however in Test Scenario #4, I had to go down to as much as 1000MB. Therefore on your servers look at Lazy Writer thread in conjunction with CPU and DBCC MEMORYSTATUS to understand what is truly happening.
  • This behavior is consistent in whether I have Locked Pages in Memory or not.
  • In Windows 2012 R2 NUMA allocation seem to be much better then Windows 2008 R2.
  • SQL Server 2014 Lazy Writer thread does not seem to be as aggressive as SQL Server 2012.
  • Few KB articles came up during my research in SQL Server 2012 that fix NUMA related issues, KB2819662 and KB2926223.
  • There as been some fixes released for NUMA for Windows 2008 R2 operating system also, please review KB2155311 and KB2510206.
  • In addition, if you are running a server with more then 64-logical cores, please review K-Group Configuration, for details please reference this article.
  • Last but not least verify that memory is distributed evenly on physical server.  That is the memory banks have been evenly allocated, i.e. you do not have more memory allocated in one bank versus another, as you might also see one NUMA node working extra harder if there is significant amount of foreign memory access.

Outstanding Question

Question to answer, why can’t Lazy Writer free up memory from NUMA Node 0?

SQL Server Lazy Writer is only responsible for clearing memory from Buffer Pool, it cannot remove or clear memory from any other memory clerk.  Therefore, it is possible, that other internal components have memory allocated on NUMA Node 0 only, therefore NUMA Node 0 is unable to free up enough memory for buffer pool only.  Looking at DBCC MEMORYSTATUS we can see which memory clerks allocate from NODE Node 0 only, there are few, for example, MEMORYCLERK_SQLCLR, MEMORYCLERK_SOSMEMMANAGER, OBJECTSTORE_LOCK_MANAGER, and MEMORYCLERK_XE_BUFFER all have allocation from NODE Node 0 only.

This post is cross posted on my SQLCAN Blog, MSDN Blog, and SQL server Consultation blog.

SQL Server 2012 NUMA Node Imbalance – Cont’d

October 1, 2015 6 comments

Yesterday, I had posted an issue I been troubleshooting with memory allocation.  That was causing CPU to pin, you can read more about the issue at SQL Server 2012 NUMA Node Imbalance.  Even though I found what was causing it, I didn’t understand the internals well enough to identify the root cause.

Additional testing I discovered this issue is repro on Windows 2008 R2 Server with SQL Server 2012.  It does not seem to fixed that at 4000MB the issue will start; if server is busy, then it can start at a higher value.  In my re-pro test, I found it starting at even 6000MB.

As I said the up front memory allocation and management was contributing to this, however now I am unsure if Locked Pages in Memory is to contribute to this.

However, further insight I learned today.  If this issue exists, you might have Lazy Writer spinning continuously on the effected NUMA node.

Therefore if you execute following T-SQL:

select session_id, wait_type, wait_time, cpu_time, scheduler_id
from sys.dm_exec_requests
where command = ‘lazy writer’
order by scheduler_id

Look at the output generated, if your Lazy Writer thread is working as normal (aka, only wakes up sometimes) you should see something similar to following:

clip_image002

However if you have this issue the Lazy Writer might look like below:

clip_image002[4]

  • Notice the wait_type has gone to NULL.
  • Notice wait_time has gone to 0 – signal it is not going to sleep like the rest.
  • Notice cpu_time keeps climbing.

Only work around that I know for this currently is, increase the max memory until you see lazy writer stop spinning and processor utilization starts dropping.

This post is cross posted on my SQLCAN Blog, MSDN Blog, and SQL server Consultation blog.

SQL Server 2012 NUMA Node Imbalance

September 30, 2015 2 comments

One of my client has been fighting a very strange problem and there is not much literature on the topic, NUMA Node Imbalance.

Server Configuration

  • 8 NUMA nodes with 6 cores each; totaling 48 cores.
  • 256GB memory.
  • 12 Instances of SQL Server; mix of SQL Sever 2012 Latest Build and SQL Server 2008 R2 SP2.

Issue

    NUMA node 0 pinned to 100% when all 12 instances are set to automatic CPU affinity setting.  This is recommended as default, there have been very few cases where I have had to consider changing this.  Generally default is fine, the SQL Server internal engine (or SQLOS)’s scheduler handle is very efficient at distributing load.  So why do we have it that NUMA node 0 was pinned?

Initial Troubleshooting

Any time we have performance problem on a server that is running SQL Sever instances; my first question to ask is, is it really SQL Sever? Looking at task manager or resource governor, both show highest CPU utilization of under 5%.  This is not surprising, as the CPU in both of these tools provides an average over all CPUs.  Therefore there is no easy way to know which instances were causing the NUMA Node 0 on the server to pin.  There is nothing obvious that is also visible if I filter through the DMV/DMFs with in each instance.

However due to performance issues, my client decided to manually distribute the load across the 8 NUMA nodes by using CPU affinity mask setting.  And sure enough load on NUMA node 0 went down; however he only did it for subset of the instances.  So load decreased from 100% to about 70%.

Research and Investigation

While doing investigation, I only found two KB articles that talk about NUMA node performance issue on SQL Server 2012, KB2819662 and KB2926223. However, all the SQL Server 2012 instances are on latest build; therefore, these hot fixes did not apply.  I also found KB relating to operating system and NUMA processor pressure issues, however all turned out to be not the issue here.  Because even operating system was fully patched.

In order to troubleshoot what is causing CPU pressure issue on NUMA node 0, we decided to move all instances off NUMA node 0 by using CPU affinity mask setting.  In doing so we eventually found instances that was causing the load.  There were 6 instances that were contributing the load.

After investigating we found, locked pages in memory is enabled by default on each instance and found all affected instances had their maximum set to 4000MB.  I remember reading it is no recommended to have maximum memory set to such a low value.  Therefore, we increased it to 6000MB.  Doing so instantly caused the CPU on NUMA node 0 to return to nominal levels.

Details

MaxMemory_at_4GB

Memory settings on SQLInstance01

CPUAffinity_At_Node5

Processor Affinity Mask Setting – Set to NumaNode03 for SQLInstance01

Above is an example of SQLInstance01, set to 4000MB with processor affinity set to NumaNode03.  If I look at the CPU now, there is no issues as in screenshot below.  However load is not running on NUMA node 0 right now.

CPU_LoadBefore_Rebalance

CPU Load on NUMA Node 0 (highlighted in red) – Nominal

CPUAffinity_At_Auto

Processor Affinity Mask Setting – Set to Default for SQLInstance01

Now if I change the instance to use all NUMA nodes, as per the default configuration, as in screenshot above.  Processors on NUMA node 0 start to pick up again. Notice the change was almost instantaneously.  Since I only did it for one instance it did not cause processor to pin to 100%; however hovered around 40% constant.  Imagine that times 6 instances, no wonder NUMA node 0 was pinned to 100%.

CPU_LoadAfter_Rebalance

CPU Load on NUMA Node 0 (highlighted in red) – After Affinity Set to Auto in SQL Server

Now if I adjust the max memory setting to 8000MB (I tested it with 6000MB); both settings, processor returned to normal performance; as per the screenshots below.

MaxMemory_at_8GB

Memory setting on SQLInstance01 – Setting it to 8000MB

CPU_LoadAfter_Rebalance_MemoryUpdate

CPU Load on NUMA Node 0 (highlighted in red) – After setting the Memory to > 4000MB

Summary

If you are using 64-bit of SQL Server (which is all version now) and have locked pages in memory enabled.  Be careful in setting the maximum memory value to a low value.  Because of locked pages in memory additional work needs to be done by SQL Server to manage the memory and from looks of it all this work is happening on NUMA node 0.  I do not know if this is by design.  However setting the memory to value higher then or equal to 6000MB resolved this issue.

I’ll follow up on this if I have any updates, as I’ll be raising the issue internally also.

This post is cross posted on my SQLCAN Blog, MSDN Blog, and SQL server Consultation blog.

Corruption occurs on the page of secondary replica when you change the secondary replica to unreadable

May 19, 2015 3 comments

Corruption issue should be treated as critical, as if gone undetected they are difficult to recover from without data loss.

The corruption can happen when changing readability property of the secondary database.  However only if the database is using data compression.  Please review and test the linked CUs if you are using AlwaysOn Availability Groups and Data Compression with your environment.

Microsoft SQL Server 2012 SP2 CU6 (link) and Microsoft SQL Server 2012 SP1 CU16 (link) released.

Online Index Rebuild can Cause Fragmentation

June 9, 2014 Leave a comment

As Microsoft SQL Server PFE, I am always playing with SQL Server.  I generally spend most of my time on the engine side of world; so I am somewhat of a newbie when it comes to BI.  So I understand when I see something in BI world and am a bit lost.  However when I ran into this, rebuilding index online caused fragmentation.  I was some what baffled. You can try it if you wish, I tested it on SQL 2008, SQL 2008 R2, and SQL 2012.

-- Create table for testing.
CREATE TABLE Test (Col1 INT IDENTITY(1,1), Col2 CHAR(50), Col3 CHAR(50), Col4 Char(50)
CONSTRAINT pkTest PRIMARY KEY
(
        [Col1]
) WITH (ALLOW_PAGE_LOCKS  = OFF)
)

-- Insert test data.
INSERT INTO dbo.Test (Col2, Col3, Col4)
SELECT TOP 1000000 REPLICATE('A',50),REPLICATE('A',50),REPLICATE('A',50)
FROM sys.all_objects O1
CROSS JOIN sys.all_objects O2

-- Check fragmentation.
SELECT * FROM sys.dm_db_index_physical_stats(db_id(),null,null,null,null)

-- Rebuild Index online using MAXDOP
ALTER INDEX pkTest ON Test
REBUILD
WITH (Online=ON, MAXDOP=8)

-- Check fragmentation.
SELECT * FROM sys.dm_db_index_physical_stats(db_id(),null,null,null,null)

Some of you might noticed an interesting property.  ALLOW_PAGE_LOCK.

So turns out this is by design.  If ALLOW_PAGE_LOCK is turned off, we are building a index using parallelism, and building it online; it can cause fragmentation (Ref 1 & Ref 2).

Well lesson learned for future and no, it will not be last one.

Using SQL Server 2012 BPA in Windows 8.1

February 4, 2014 Leave a comment

In order to use SQL2012BPA in Windows 8.1 we require Microsoft Baseline Analyzer 2.0.  However even after installing that and SQL2012BPA we cannot do it.  Because module does not show up MBA.  Try following steps to get SQL2012BPA working on Windows 8.1.

  1. Install Microsoft Baseline Analyzer 2.0.
  2. Install SQL Server 2012 BPA.
  3. Copy the SQL2012BPA folder from C:\windows\system32\bestpractices\v1.0\Models\ to C:\ProgramData\Microsoft\Microsoft Baseline Configuration Analyzer 2\Models\
  4. Start Microsoft Baseline Analyzer 2.0.
  5. Select the SQL2012BPA module.
  6. Start analysis.
Categories: SQL 2012 Tags: ,

AlwaysOn Availability Groups Troubleshooting Guide

June 26, 2013 Leave a comment

“This guide helps you get started on troubleshooting some of the common issues in AlwaysOn Availability Groups and monitoring AlwaysOn Availability Groups. It is intended to provide original content as well as a landing page of useful information that is already published elsewhere.” — MSDN Article

Contains Scenarios, Tools, and Monitoring guidelines.  Please check it out @ MSDN Article here.

SQL Server 2012 SP1 CU4 Released!

June 6, 2013 Leave a comment

SQL Server Build: 11.0.3368 KB: 2833645

Lots of fixes are in this CU, in addition at least one KB talking about few new features.  One related to syspreping analysis services, integration services, etc.

SQL Server 2012 RTM CU7 Released!

April 19, 2013 Leave a comment

SQL Build: 11.0.2405 KB: 2823247

Quite a few fixes, similar to the ones in SQL 2008 R2 SP1 CU12 and SQL 2008 R2 SP2 CU6.  Some issues specific to SQL 2012 are related to SSAS, fixing performance issues with systems with more then 32-logical processors, access violation issues with FILESTREAM-enabled but data is offline. FULL JOIN, DMV, Log Shipping, etc.

%d bloggers like this: