Yesterday, I had posted an issue I been troubleshooting with memory allocation. That was causing CPU to pin, you can read more about the issue at SQL Server 2012 NUMA Node Imbalance. Even though I found what was causing it, I didn’t understand the internals well enough to identify the root cause.
Additional testing I discovered this issue is repro on Windows 2008 R2 Server with SQL Server 2012. It does not seem to fixed that at 4000MB the issue will start; if server is busy, then it can start at a higher value. In my re-pro test, I found it starting at even 6000MB.
As I said the up front memory allocation and management was contributing to this, however now I am unsure if Locked Pages in Memory is to contribute to this.
However, further insight I learned today. If this issue exists, you might have Lazy Writer spinning continuously on the effected NUMA node.
Therefore if you execute following T-SQL:
select session_id, wait_type, wait_time, cpu_time, scheduler_id
where command = ‘lazy writer’
order by scheduler_id
Look at the output generated, if your Lazy Writer thread is working as normal (aka, only wakes up sometimes) you should see something similar to following:
However if you have this issue the Lazy Writer might look like below:
- Notice the wait_type has gone to NULL.
- Notice wait_time has gone to 0 – signal it is not going to sleep like the rest.
- Notice cpu_time keeps climbing.
Only work around that I know for this currently is, increase the max memory until you see lazy writer stop spinning and processor utilization starts dropping.
This post is cross posted on my SQLCAN Blog, MSDN Blog, and SQL server Consultation blog.
The Lazy Writer is triggered by pressure to get the buffer cache ready to bring in more data. Perhaps their is a minimum that it likes to keep available and this is affected by a delay writing to disk with so many SQL instances on the same Windows server.
There is a minimum value. However the issue is why it is doing it on NUMA node 0 only. As you know, SQL Server creates one Lazy Writer per NUMA node. Which is running on first CPU of that NUMA node. If the memory threshold value (which is based on # of free pages) is low; it should spin on all nodes. From my understanding after discussing this with few peoples; it might be that internal allocation is primarly happening on NUMA Node 0. However I am unable to reproduce this issue on any other server or my test labs. Seems combination of Windows 2008 R2 + SQL Server 2012 + AMD Processors + Locked Pages in Memory.
However have not had oppertunity to eleminate all the factors.
I suppose its good that they have a BIG Server to hold all those instances. You did say instances rather than discreet SQL server guests running under a host OS. Maybe they could start to amalgamate some of the instances.
Yeap. They are running on physical server using instance stacking. There are really lof of combinations to test here. I hope to test some of them in my lab, when I have few spare cycles :).