Archive

Archive for the ‘Troubleshooting’ Category

SQL Server Error 3117: Restore Database is Terminating Abnormally

SQL Server being a largely used relational database management system regularly encounters issues resulting in errors. One such errors, is the “restore database is terminating abnormally SQL error 3117” which has been causing a lot of trouble for the SQL Server users worldwide. The segment talks about the error as well as the reason due to which it occurs. Moreover, the workaround to overcome the consequences of this error have been discussed in the form of suggestions regarding the restore procedure and state to be used.

The Perpetrators of Microsoft SQL Server Error 3117

When the transaction log backup is tried, being restored to the same database it has been taken from, this error has high chances of taking place. SQL error 3117 clearly states that ‘no files are ready to rollforward’. This happens because; the log file that you are trying to restore is already available in the database. You can take the help of third party tool to view SQL transaction log.

TIP: In order to carry out the restoration process your way, you will have to perform the restoration of full backup of the database that had been taken right before the transaction log backup.

Recovery vs. NoRecovery State: Difference in Restoration

NORECOVERY state is to ensure that rollback does not take place. This state continues the roll forward along with the following statement mentioned in the sequence.

The restore sequence in this case helps restore other backups and then roll them forward.

RECOVERY (default) is the state that indicates that a rollback must only be performed once the roll forward has completed for the backup currently being restored.

Database recovery procedure demands the entire set of data that is to be restored (roll forward set) is in a consistent state with the database. The Database Engine issues an error 3117 in a condition where the data set is not rolled forward despite when it was meant to be to a state far enough to make it consistent enough with the database while RECOVERY is specified.

NOTE: In a condition like the one discussed above, one needs to make sure that the target database is not in a functional mode while the differential backup is being restored as this act can interrupt the procedure.

Restoring Differential Backup Post Full Backup Restore

To avoid the situation of coming across Microsoft SQL Server error 3117 would be to restore full backup with NORECOVERY and differential backup WITHRECOVERY.

RESTORE DATABASE DatabaseName
FROM DISK = 'C:\DatabaseNameFull.bak'
WITH NORECOVERY;

RESTORE DATABASE DatabaseName
FROM DISK = 'C:\DatabaseNameDiff.bak'
WITH RECOVERY;

Conclusion

When queries are generated, they are executed manually. Therefore, the chances of failure become high due to a non-automated approach being made. Similarly, other technical failures tend to take place resulting in restore database is terminating abnormally Microsoft SQL Server error 3117. Technical glitches are common when working with big data concept because of the load, processing and mass data involved. However, dealing with the issue carefully by understanding the cause of its occurrence helps overcome it precisely without taking rounds. When dealing with the restore issue, most DBAs suggest NORECOVERY as the roll forward set to restore backup without a failure encountered. Drop in a comment just in case of further queries regarding the same issue in restoring SQL server database. A suitable solution will be shared via response in the soonest possible time to help deal with the situation at hand.

Troubleshooting Steps For Corrupt or Suspect Databases

Problem

Sometimes, while working on SQL Server, users receive various error messages like SQL Server Error 5172 due to encounter of suspect or corruption in databases. This happens because of operational mistakes or faulty hardware, which can be resolved by using the various actions for corrupt or suspect databases.

Solution for Corrupt or Suspect Database

There are some recommended actions for corruption or suspect databases that help users in handling a situation as discussed below. However, before using these actions users must ensure to have backup strategy that helps to recover the data from failures.

    • ERRORLOG File

Firstly, check the ERRORLOG file for SQL Server to find the occurrence of an error. Many times, the data or log file is missing due to which, there are chances of suspect or corruption of databases. During the startup of SQL Server, it will help to encounter the problem.

    • NO_INFOMSG Option

User can run the DBBCC CHECKDB against the databases to find the reasons of the occurrence of an error message by showing the error number. There might some specific recommendation for error message. This option is helpful as it return only error message such as:

DBCC CHECKDB (adventureworks) WITH NO_INFOMSGS

    • RECOVERY_PENDING option

If the log file is missing then, user cannot run DBCC CHECKDB to perform last log backup. Instead of this, users can use SELECT name, state_desc, database_id from databases. It will help them to more to understand the problem. With this, they can see RECOVERY_PENDING if any database is missing or hindering in SQL Server from executing automatic recovery at startup. It will show all the status for database.

    • Hardware Diagnostics

To remove the occurrence of an error, users can run check errorlog, eventlog. If still the error occurs then, they can perform the hardware diagnostics. It will help to resolve the corruption of database issue. If there is some problem detected then, ensure the replacement of faulty hardware.

    • Tail Log Backup

The tail log backup contains the transaction log backup, which is backed up most recently. It helps to recover right up to the point of disaster. It makes easy for users as they get all those log records back that was corrupted.

    • Repair Command

If the database is not suspected then, run DBCC CHECKDB with REPAIR command. It is a secondary option because it will result in data loss. This command will repair and solve the issue or it will result in loss of whole data.

    • Enable Checksum

In a way to detect the corruption issue, users can enable the CHECKSUM verification. It helps them to give the timely alerts when any error is occurred because SQL Server is a robust to check for other disk errors.

Tip:

  • Never detach the suspect database otherwise, all chances are lost to have backup.
  • Depending upon the error, manually rebuild non-cluster indexes, drop, and reload table if data is static.

Conclusion

The recommended actions for corrupt or suspect databases help to bring back the data in accessible mode. However, users cannot repair a database of SQL Server in case of unavailable or not updated backup.

Know How to Resolve Microsoft SQL Server Error 5172

May 26, 2016 1 comment

Overview

SQL Server is a relational database management system, developed by Microsoft that is used for storage and retrieval of data whenever required. Database on SQL Server contains one master data file (MDF) and may be associated with number of new data file (NDF). Each MDF file is attached with a separate transaction log database (LDF). Sometimes, when user tries to attach the MDF/LDF database file, an Error may be encountered displaying error message “The header for file ‘xxx.mdf’ is not a valid database file header. The FILE SIZE property is incorrect. (SQL Server Error 5172)”The page will be discussing about the same error and solution on how to resolve MS SQL Error 5172.

What is Microsoft SQL Server Error 5172?

One of the most common error seen by the user of SQL Server while they are trying to attach the MDF or LDF Database from existing system to a new system. User needs to copy the database from the source system onto an external drive, attached the external drive to target system and copy them onto new system. Some of the possible causes of this error are damaged or corrupt log file/data file, attaching higher version database to a lower version SQL Server etc.

Precautions to Avoid MS SQL Error 5172

  • The Database files of the source system should only be copied when SQL Server is properly shutdown or Database is detached.
  • The Release and build number of SQL Server must be either exactly same for both source & target or the target must be higher. A database detached from SQL 2005 version needs to be attached to the same or higher version only.
  • Presence of UNICODE type database attached to the target server, that user no longer wants to use. It should either be detached or renamed or delete it permanently.
  • Attach the database file copied from source to target system using the sp_attach_db command or use SQL Server Management Studio while SQL Server is running. No files of an existing database should be replaced while SQL Server is down.

How to Fix SQL Server Header Error 5172

SQL Server Database files store their data as pages, each page is 8KB. The first page is called file header page containing the most important information about the whole file, like file signature, file size, compatibility, etc. Error occurs when certain value in header information in the boot page (page0) of LDF/MDF file is incorrect. It is also possible that file header page is damaged/corrupted making it unrecognizable by Microsoft SQL Server.

The user cannot attach the database without fixing the file header page manually. Let us consider a sample header of page 0:

Error 5172 Header

Some of the modifications that need to be made in the above header file using Hex Editor are:

  • Value at offset 0X00 (header version) should be 0X01
  • Value at offset 0X01 (m_type) should be between 0X01 and 0X66
  • Value at offset 0x04 (m_flagBits) should not be 0x02
  • Value at offset 0x18 (m_objId) should be 0x63 or higher
  • Combined value at offset 0x1E and 0x1F (m_freeData) should be 0x60 or higher (m_freeData is the offset where the next record will be placed)
  • Page size (0x2000)– m_SlotCnt * 4 should be higher than m_freeData. If not, any subsequent operation will overwrite already existing data & is disaster to SQL Server and users.
  • Offset 1C-1D (m_freeCnt) should be less than 0x1FA0 (8096). m_freeCnt denotes available free bytes within page, if it is greater then, SQL Server has reason to drop further execution because this will end up outside the 0x2000 boundary.
  • value 0x1FA0 (8096) is derived from page size (8192) – total header byte size (0x60) = 96 = 8192 – 96 = 8096
  • Offset 1E-1F (m_freeData) should not be zero
  • Offset 16-17 (m_SlotCnt) should less than 0xFD0 (4048)
  • The value at offset 0x40 thru 0x5F should be zero

After the changes made in the entries of header file using Hex Editor, user can try to re-attach the SQL Server Database files to overcome the issue of SQL Server Error 5172.

Conclusion

The blog will be discussing about one of the most common error encountered while user tries to attach their SQL Server Database files. The in-depth details of the SQL Server header error 5172 with the possible causes have been described. Some of the precautions that should be kept in mind are also discussed. The Error occurs when the header of the database file is invalid, which is why we need to make some modifications in the header file entries using Hex Editor.

The above elaborated description is sufficient enough, but if it fails to fix The header for file ‘xxx.mdf’ is not a valid database file header. The FILE SIZE property is incorrect then you can go with SQL Database Recovery Tool which allows you to recover data from corrupt database file.

SQL Skills: SQL Server Wait Types & Latch Classes Library

May 17, 2016 Leave a comment

Microsoft has documented wait types and latches in a minimum fashion on books online; often there are wait types that are just not documented.

Paul Randel (Blog|Twitter) has put together compressive list of these with recommendations and even SQL dump traces.

Check it out here..

 

Troubleshooting Database Mirroring in MS SQL Server

May 13, 2016 5 comments

Database Mirroring ensures availability of SQL Server database in scenarios of data disasters. It is implemented in SQL databases that use Full Recovery model. It maintains two copies of a particular database, which reside in different locations. When the primary server is unavailable due to any reason, the secondary server comes into action and proceeds with the functioning.

In this blog, we will illustrate some of the measures to troubleshoot database mirroring in SQL Server. Below is the list of some factors that need to be considered whilst troubleshooting:

Accounts

  • It is to be noted that the accounts, which are facilitating the SQL server to run, should be correctly configured. In case the accounts run on the same domain accounts, there are less chances of misconfiguration.
  • However, in case the accounts are running on different domains, the login of one of the accounts should be created on the other computer in master. In addition to this, the login should be granted CONNECT permission in the endpoint.
  • In case SQL Server is running as SaaS and is using the local system account, certificates must be used for authentication purposes.

Endpoints

Following key points should be considered to make sure that the endpoints are configured correctly:

  • Make sure that all the server instances- principal, mirror and witness server should possess database mirroring endpoint. Database Mirroring Endpoint can be created either by using Windows Authentication or by using certificates.
  • Make sure that all the port numbers are correct. In order to identify the port number associated with database mirroring endpoint use the following catalog views:
    • sys.database_mirroring_endpoints
    • sys.tcp_endpoints
  • Check whether the status of the endpoints is STARTED. For each server instance, make use of the following Transact-SQL command:
SELECT state_desc
  FROM sys.database_mirroring_endpoints
  • For starting an endpoint use the following command:
ALTER ENDPOINT Endpoint_Mirroring
STATE = STARTED
AS TCP (LISTENER_PORT = <port_number>)
FOR DATABASE_MIRRORING (ROLE = ALL);
  • In order to check whether the ROLE is correct or not, use the following command:
SELECT role
  FROM sys.database_mirroring_endpoints;

System Address

For naming the server instance in the database mirroring, any name, which identifies the system can be used. If the systems reside in the same domain, the system name can be used as the server name. In addition to this, a static IP address or a fully qualified domain name can also be used as the server names.

Network Access

If the server instances are in different domains, it is necessary for every server instance to access the ports of all the server instances or the instances over the TCP. This puts a stop to the communication between the instances of the server.

Mirror Database Preparation

The most important thing that is to be made sure is that the database is prepared for mirroring, no matter if the mirroring is done again or for the first time. It is to be noted that whilst creating a mirror database on the mirror server, restoration of principal database backup with the same name of the database is to be done. Moreover, all the log backups created prior to backup creation should also be applied. Both the things should be done WITH NORECOVERY.

It is recommended to store the mirror database at the same location as of the principal database. However, if the file paths are different, MOVE option should be included with RESTORE statement.

In case the mirroring was stopped earlier and is being restarted, all the backups taken afterwards should be applied to the mirror database.

Fixing Failed Create-File Process

In order to add a file without having any impact on the mirroring session, the path of the file should exist on both the servers. Due this very fact, if the database files are moved while mirror database creation, the add-file operation might fail and the mirroring can be suspended.

In order to fix the issue, follow the below-mentioned steps:

  1. The complete mirroring session should be removed and the full backup of the file group containing the added file must be restored.
  2. The log containing the add-file operation should be then backed up. Moreover, the log backup should be restored manually by using WITH NORECOVERY and WITH MOVE. This will create a file path on the mirror server and the file will be restored to a new location.

Conclusion

Care should be taken while troubleshooting database mirroring in SQL server irrespective of the fact that the mirroring has been previously done or not. With the help of the above-mentioned measures, the users can easily fix the issues while creating database mirroring.

SQL Server 2012 NUMA Node Imbalance–Conclusion

October 4, 2015 Leave a comment

Final post in the series, I hope to summarize my findings for everyone.  Thanks for all the comments, feedback and comments I have received to-date on these posts.

Before providing the summary of all the testing, research and reading, I want to provide some internals for folks.

NUAM Memory Allocation

If a server has NUMA configuration, when we define the max memory setting.  All memory gets allocated evenly from all NUMA nodes.  For example, if I allocate 8GB from a server that has 4 NUMA nodes, it will allocate 2GB from each NUMA node for the instance in question. So be careful, if you are using CPU affinity.  As even using CPU affinity, lets say you only want to use NUMA 1 for an instance; this can lead to foreign memory access. NUMA allocation still happens across all NUMA nodes (at least in SQL Server 2014).  So even though you might only have subset of schedulers online, memory will still be accessed across all NUMA Nodes.  Even though with new hardware foreign memory access does not have major impact on performance.  Still recommend avoiding it.  If you must use CPU affinity, then allocate sub-set of the processor from each NUMA node.

Lazy Writer Responsibility

Lazy Writer is back ground process and a thread is created per physical NUMA node (for Soft-NUMA, we only get one thread, you can read details here by Jonathan Kehayias (Blog|Twitter)). Lazy Writer responsibility is to make sure sufficient number of free pages, about 640 pages, are always available for new data to be loaded.  If Lazy Writer thread is spinning, that is usually a clear signal of internal memory pressure.

Note: Internal Memory pressure does not mean server doesn’t have enough memory.  There are multiple factors that can contribute to this, one fact is low Max Server Memory, which is what has been explored in these articles.

Original Scenario: SQL Server 2012 Latest Build + Windows 2008 R2 Latest Build + Locked Pages In Memory + Low Max Server Memory + AMD Processors + NUMA Configuration

In this build, I can consistently reproduce the issue.  If I set the Max Memory low enough, it caused the Lazy Writer on NUMA node 0 to spin non-stop, as I documented in SQL Server 2012 NUMA Node Imbalance – Cont’d.  To further to this I understand some additional internals around this thanks to colleague who helped here.  As I was not true sure what is happening.

If we looked at the DBCC MEMORYSTATUS, in particular we notice following for Node 0:

Memory node Id = 0                       KB
—————————————- ———–
VM Reserved                              127086340
VM Committed                             642452
Locked Pages Allocated                   55996
Pages Allocated                          54008
Pages Free                               0
Target Committed                         512000
Current Committed                        698448
Foreign Committed                        0
Away Committed                           0
Taken Away Committed                     0

Review the two numbers high-lighted above, Target memory is what this NUMA node is allocated and Current is what is NUMA consuming, therefore the Page Free count is 0.  Thus Lazy Writer spinning to free up memory.  However it is unable, thus spinning indefinitely causing CPU to pin.  Question to answer, why can’t Lazy Writer free up memory from NUMA Node 0?

Looking further into MEMORYSTATUS, we also notice, that most of the memory allocated into NUMA Node 0 is for buffer cache:

MEMORYCLERK_SQLBUFFERPOOL (node 0)       KB
—————————————- ———–
VM Reserved                              103971820
VM Committed                             524288
Locked Pages Allocated                   2524
SM Reserved                              0
SM Committed                             0
Pages Allocated                          8

Test Scenario #1: Windows 10 + SQL Server 2012 RTM + Low Max Server Memory + Intel Processors + No-NUMA (Physical)

In this, I was just trying to reproduce the issue on my desktop, with no luck.  Memory allocation or such were nominal.

Test Scenario #2: Windows 2012 + SQL Server 2014 RTM + Low Max Server Memory + Intel Processors + No-NUMA (Hyper-V)

Again no issue, as per the title of the series, NUMA Imbalance.  It was obvious and expected this two scenarios don’t show any issues.  But I wanted to confirm it.

Test Scenario #3: SQL Server 2008 R2 + Windows 2008 R2 + Low Max Server Memory + AMD Processors

These test was on same servers as original scenario, however we did not notice any issue.

*** During testing, we realized an interesting fact about NUMA configuration in SQL Server 2008 R2.  SQL Server swaps NUMA 0 and NUMA 1 CPU mapping when creating a logical mapping for schedulers.  For example, if you have 2 NUMA Nodes each with 4 cores; SQL Server binds NUMA 0 – Processor 0 – 3 to Scheduler 4 – 7 and NUMA 1 – Processor 4 – 7 to Scheduler 0 – 3.  This is considered NUMA Swap, documented in this MSDN blog article here by CSS Team.  However this behavior does not seem to persist in SQL Server 2012.

Test Scenario #4: SQL Server 2014 + Windows 2012 R2 + Low Max Server Memory + Intel Processors + NUMA (Azure IaaS Virtual Machine)

This was interesting case, when I set the Max Server Memory to low value, the NUMA node 0’s processor time started to increase.  However it was not as swear as SQL Server 2012, however these are different processors.  I had same behavior as noticed above, Current Committed was higher then Target Committed, therefore NUMA thread on Node 0 was spinning.  However interesting thing note, it was not as aggressive as SQL Server 2012.  It was not continuously spinning, it allowed CPU to breath.  However would wake up frequently as low as ever 20ms.

!!! Eliminated Locked Pages in Memory as a factor for NUMA Imbalance.

Test Scenario #5: SQL Server 2014 + Windows 2012 R2 + Low Max Server Memory + Intel Processors + Locked Pages In Memory + NUMA (Azure IaaS Virtual Machine)

Exact same behavior as above was noticed, however, after restart, SQL Server no longer had Current Committed higher then Target Committed.  Even after lowering the max memory even further then Scenario #4.  It did not cause as dramatic shift.

*** Another interesting thing to note in SQL Server 2014. Up-to SQL Server 2012, Lazy Writer Thread bound to CPU 0 of each NUMA Node.  In SQL Server 2014, Lazy Writer Thread is binding to last CPU of each NUMA Node.

Conclusion & Things to Consider

  • Configuring SQL Server Max Server Memory low memory will cause Lazy Writer to spin due to internal memory pressure issue.
  • There is no value at which point this behavior starts, it depends on the load of the server.  For example, in Original Scenario, due to server load under 4000MB it started spinning the Lazy Writer thread; however in Test Scenario #4, I had to go down to as much as 1000MB. Therefore on your servers look at Lazy Writer thread in conjunction with CPU and DBCC MEMORYSTATUS to understand what is truly happening.
  • This behavior is consistent in whether I have Locked Pages in Memory or not.
  • In Windows 2012 R2 NUMA allocation seem to be much better then Windows 2008 R2.
  • SQL Server 2014 Lazy Writer thread does not seem to be as aggressive as SQL Server 2012.
  • Few KB articles came up during my research in SQL Server 2012 that fix NUMA related issues, KB2819662 and KB2926223.
  • There as been some fixes released for NUMA for Windows 2008 R2 operating system also, please review KB2155311 and KB2510206.
  • In addition, if you are running a server with more then 64-logical cores, please review K-Group Configuration, for details please reference this article.
  • Last but not least verify that memory is distributed evenly on physical server.  That is the memory banks have been evenly allocated, i.e. you do not have more memory allocated in one bank versus another, as you might also see one NUMA node working extra harder if there is significant amount of foreign memory access.

Outstanding Question

Question to answer, why can’t Lazy Writer free up memory from NUMA Node 0?

SQL Server Lazy Writer is only responsible for clearing memory from Buffer Pool, it cannot remove or clear memory from any other memory clerk.  Therefore, it is possible, that other internal components have memory allocated on NUMA Node 0 only, therefore NUMA Node 0 is unable to free up enough memory for buffer pool only.  Looking at DBCC MEMORYSTATUS we can see which memory clerks allocate from NODE Node 0 only, there are few, for example, MEMORYCLERK_SQLCLR, MEMORYCLERK_SOSMEMMANAGER, OBJECTSTORE_LOCK_MANAGER, and MEMORYCLERK_XE_BUFFER all have allocation from NODE Node 0 only.

This post is cross posted on my SQLCAN Blog, MSDN Blog, and SQL server Consultation blog.

SQL Server 2012 NUMA Node Imbalance – Cont’d

October 1, 2015 6 comments

Yesterday, I had posted an issue I been troubleshooting with memory allocation.  That was causing CPU to pin, you can read more about the issue at SQL Server 2012 NUMA Node Imbalance.  Even though I found what was causing it, I didn’t understand the internals well enough to identify the root cause.

Additional testing I discovered this issue is repro on Windows 2008 R2 Server with SQL Server 2012.  It does not seem to fixed that at 4000MB the issue will start; if server is busy, then it can start at a higher value.  In my re-pro test, I found it starting at even 6000MB.

As I said the up front memory allocation and management was contributing to this, however now I am unsure if Locked Pages in Memory is to contribute to this.

However, further insight I learned today.  If this issue exists, you might have Lazy Writer spinning continuously on the effected NUMA node.

Therefore if you execute following T-SQL:

select session_id, wait_type, wait_time, cpu_time, scheduler_id
from sys.dm_exec_requests
where command = ‘lazy writer’
order by scheduler_id

Look at the output generated, if your Lazy Writer thread is working as normal (aka, only wakes up sometimes) you should see something similar to following:

clip_image002

However if you have this issue the Lazy Writer might look like below:

clip_image002[4]

  • Notice the wait_type has gone to NULL.
  • Notice wait_time has gone to 0 – signal it is not going to sleep like the rest.
  • Notice cpu_time keeps climbing.

Only work around that I know for this currently is, increase the max memory until you see lazy writer stop spinning and processor utilization starts dropping.

This post is cross posted on my SQLCAN Blog, MSDN Blog, and SQL server Consultation blog.

SQL Server 2012 NUMA Node Imbalance

September 30, 2015 2 comments

One of my client has been fighting a very strange problem and there is not much literature on the topic, NUMA Node Imbalance.

Server Configuration

  • 8 NUMA nodes with 6 cores each; totaling 48 cores.
  • 256GB memory.
  • 12 Instances of SQL Server; mix of SQL Sever 2012 Latest Build and SQL Server 2008 R2 SP2.

Issue

    NUMA node 0 pinned to 100% when all 12 instances are set to automatic CPU affinity setting.  This is recommended as default, there have been very few cases where I have had to consider changing this.  Generally default is fine, the SQL Server internal engine (or SQLOS)’s scheduler handle is very efficient at distributing load.  So why do we have it that NUMA node 0 was pinned?

Initial Troubleshooting

Any time we have performance problem on a server that is running SQL Sever instances; my first question to ask is, is it really SQL Sever? Looking at task manager or resource governor, both show highest CPU utilization of under 5%.  This is not surprising, as the CPU in both of these tools provides an average over all CPUs.  Therefore there is no easy way to know which instances were causing the NUMA Node 0 on the server to pin.  There is nothing obvious that is also visible if I filter through the DMV/DMFs with in each instance.

However due to performance issues, my client decided to manually distribute the load across the 8 NUMA nodes by using CPU affinity mask setting.  And sure enough load on NUMA node 0 went down; however he only did it for subset of the instances.  So load decreased from 100% to about 70%.

Research and Investigation

While doing investigation, I only found two KB articles that talk about NUMA node performance issue on SQL Server 2012, KB2819662 and KB2926223. However, all the SQL Server 2012 instances are on latest build; therefore, these hot fixes did not apply.  I also found KB relating to operating system and NUMA processor pressure issues, however all turned out to be not the issue here.  Because even operating system was fully patched.

In order to troubleshoot what is causing CPU pressure issue on NUMA node 0, we decided to move all instances off NUMA node 0 by using CPU affinity mask setting.  In doing so we eventually found instances that was causing the load.  There were 6 instances that were contributing the load.

After investigating we found, locked pages in memory is enabled by default on each instance and found all affected instances had their maximum set to 4000MB.  I remember reading it is no recommended to have maximum memory set to such a low value.  Therefore, we increased it to 6000MB.  Doing so instantly caused the CPU on NUMA node 0 to return to nominal levels.

Details

MaxMemory_at_4GB

Memory settings on SQLInstance01

CPUAffinity_At_Node5

Processor Affinity Mask Setting – Set to NumaNode03 for SQLInstance01

Above is an example of SQLInstance01, set to 4000MB with processor affinity set to NumaNode03.  If I look at the CPU now, there is no issues as in screenshot below.  However load is not running on NUMA node 0 right now.

CPU_LoadBefore_Rebalance

CPU Load on NUMA Node 0 (highlighted in red) – Nominal

CPUAffinity_At_Auto

Processor Affinity Mask Setting – Set to Default for SQLInstance01

Now if I change the instance to use all NUMA nodes, as per the default configuration, as in screenshot above.  Processors on NUMA node 0 start to pick up again. Notice the change was almost instantaneously.  Since I only did it for one instance it did not cause processor to pin to 100%; however hovered around 40% constant.  Imagine that times 6 instances, no wonder NUMA node 0 was pinned to 100%.

CPU_LoadAfter_Rebalance

CPU Load on NUMA Node 0 (highlighted in red) – After Affinity Set to Auto in SQL Server

Now if I adjust the max memory setting to 8000MB (I tested it with 6000MB); both settings, processor returned to normal performance; as per the screenshots below.

MaxMemory_at_8GB

Memory setting on SQLInstance01 – Setting it to 8000MB

CPU_LoadAfter_Rebalance_MemoryUpdate

CPU Load on NUMA Node 0 (highlighted in red) – After setting the Memory to > 4000MB

Summary

If you are using 64-bit of SQL Server (which is all version now) and have locked pages in memory enabled.  Be careful in setting the maximum memory value to a low value.  Because of locked pages in memory additional work needs to be done by SQL Server to manage the memory and from looks of it all this work is happening on NUMA node 0.  I do not know if this is by design.  However setting the memory to value higher then or equal to 6000MB resolved this issue.

I’ll follow up on this if I have any updates, as I’ll be raising the issue internally also.

This post is cross posted on my SQLCAN Blog, MSDN Blog, and SQL server Consultation blog.

Edmonton SQLPASS – Performance Troubleshooting

September 14, 2015 2 comments

I will be presenting a short talk on things to consider when it comes to performance troubleshooting SQL Server on September 17, 2015.  Goal is to provide practical advice that SQL Server DBAs/developers can use to help identify some of the common performance issues they might have within their environment.  We’ll look at the key resources of SQL Server, the pressure on those resources, and how to look inside the black box of SQL Server.

If your interested please register and show up Edmonton SQL Pass here.

SQL Server Failover Cluster, Agent Services Failed to come online, Error 435.

November 3, 2014 Leave a comment

This was a troublesome error, because there is not much information available on what is going wrong.  I looked the application log, SQL Agent Error Log, cluster logs; but neither gave me much details on what was causing the issue.  So after research on internet found an article by Tom Green (Blog).  However after looking at the said keys in [HKLM]\Software\Microsoft\Microsoft SQL Server\MSSQL.MSSQLServer\SQLServerAgent\ values WorkingDirectory and ErrorLogFile were already correct for me; however agent still refused to start up.

So another suggestion Tom makes is it might be permission issue on directory.  Which were also correct; slowly running out of options…. but …

It dawned on me this key was recently imported because of corruption issue; and with Tom’s article’s hint, I figured what if permissions in registry got deleted.

Sure enough SQL Agent services had lost permission to [HKLM]\Software\Microsoft\Microsoft SQL Server\MSSQL.MSSQLServer\SQLServerAgent\ hive.  After adding the agent account back in with Full Control, services started without problems.

  1. Run regedit.msc
  2. Navigate to [HKLM]\Software\Microsoft\Microsoft SQL Server\MSSQL.MSSQLServer\
  3. Right click on SQLServerAgent, go to Permissions.
  4. Click Add…
  5. If using Service SIDS …
    1. Change Location from Domain to Local Computer
    2. If using Default Instance …
      1. Type “NT Service\SQLServerAgent”
    3. If using Named Instance …
      1. Type “NT Service\SQLAgent$InstanceName”
    4. Click OK.
  6. If using Domain Service Accounts …
    1. Type in Domain\AccountName
    2. Click OK.
  7. Select Full Control in Permissions Dialog Box.
  8. Click OK.

Restart SQL Agent, should work without problem.

%d bloggers like this: