Archive

Archive for the ‘Tools’ Category

Missing registery settings in cluster nodes for SQL Server

October 27, 2014 Leave a comment

I run into this occasionally, I think in last 3-4 years being SQL Server PFE, I have seen this issue total of 4 times.  So its not common, issue.  So for this post, I’ll use example architecture.  Two-Node Cluster, Node A and NodeB running SQLFCI1 on it.  SQLFCI1 runs fine on NodeA but fails on NodeB.  Looking at the Application Log we see strange messages like “Could not open error log file ”“.  Other messages might around missing various configuration settings that SQL Server needs to start up.  So how can that happen?

When SQL Server is running as a Failover Cluster Instance (FCI); its configuration settings (a.k.a registry keys under HKLM\Software\Microsoft\Microsoft SQL Server) are saved in a cluster hive in registry.  So when the node fails over from active to passive these settings get carried over and applied to passive node.  That is why we have best practice to make all configuration settings on active node only, if you make it on passive node, or if instance is offline.  The Cluster Service will over write them with what it know of the settings.  This is called CheckPoint process.

We can check if all the required SQL Server keys being copied to cluster hive or not.  We can do that from Command Prompt using following command:

cluster.exe . res “SQL Network Name (SQLFCI1)” /CheckPoints

You will get a output similar to below:

Listing registry checkpoints for resource ‘SQL Network Name (SQLFCI1)’…

Resource                   Registry Checkpoint
————————– —————————————————————————-
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Cluster’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\MSSQLServer’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Replication’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Providers’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerSCP’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\CPE’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerAgent’

However if you get only some or no record back we have an issue.

PLEASE NOTE DO THIS ON NODE THAT IS WORKING.  IF YOU DO IT ON NODE THAT IS NOT, YOU WILL LOSE ALL YOUR REGISTRY SETTINGS.

  1. Backup the HKLM\Software\Microsoft\Microsoft SQL Server\ hive on both NodeA and NodeB (just in case, you ignore my warning above/ or murphy’s law kicks in).
  2. Confirm instance is on NodeA, if not failback to NodeA from NodeB (NodeA was the good guy in my scenario above).
  3. Execute following commands to add each of the key registry settings to cluster checkpoint.

cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Cluster”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\MSSQLServer”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Replication”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Providers”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerSCP”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\CPE”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerAgent”

After this re-run the /CheckPoints command above to verify they were added successfully.

SQL Server Configuration Manager, WMI Error 0x80041010

August 4, 2014 Leave a comment

Just got this at my client, its normal message.  See it way to often, generally I run into this when the multiple version of SQL Servers are installed.  So decided to do a bit of research on this, namely why does it happen?  Some reasons include …

  • A SQL Server instnace on the server was uninstalled inturn it uninstalled shared compontents relarting to WMI.
  • During SQL Server installation WMI doesn’t get registered successfully.

7522_SQL_Server_Configuration_Manager_Invalid_Class_0x80041010

Its easy to fix, as per KB956013, execute following command and restart WMI Services.

mofcomp “%programfiles(x86)%\Microsoft SQL Server\number\Shared\sqlmgmproviderxpsp2up.mof”

If executed successfully you should see a successful message.  If you get an error, make sure you are running the command prompt as Administrator.

Performance Impact of SQL Server Profiler

January 7, 2013 Leave a comment

The general best practice is only capture the events and columns you need when using SQL Server Profiler.  But how much impact does it have?  Well the answer always been for me IT DEPENDS!

But really how much impact does it back?  So I decided to do some testing.  I will be using the code below for each test case scenario.

USE TempDB
GO

SET NOCOUNT ON
GO

SELECT GETDATE()
GO

CREATE TABLE T1 (Col1 INT, Col2 INT, Col3 INT)
GO

DECLARE @Cntr INT

SET @Cntr = 0

WHILE (@Cntr < 100000)
BEGIN
SET @Cntr = @Cntr + 1
INSERT INTO T1 VALUES (@Cntr, @Cntr*10, @Cntr*100)
END
GO

DROP TABLE T1
GO

SELECT GETDATE()
GO

Note: I know there are better way to do large insert, but for this test, this way works pretty nice :).

I am simply creating a table throwing 100,000 rows in it.  And taking the start time and end time.  So I can calculate the total time it takes to complete the operation.  In my control test, with no SQL Server Profile, it took 2416ms to complete this insert.

Now I set up a SQL Server Profile with following Events (with all columns):

Errors and Warnings

  • ErrorLog
  • EventLog
  • Exception

Security Audit

  • Audit Login
  • Audit Logout

Stored Procedures

  • RPC: Completed
  • RPC: Starting
  • SP: StmtCompleted
  • SP: StmtStarting

TSQL

  • SQL: BatchCompleted
  • SQL: BatchStarting
  • SQL: StmtCompleted
  • SQL: StmtStarting

Lets run again and see what happens…

Waiting ….

Waiting …

I had time to go get some some coffee and snacks while it was running, when I came back it had finished.  It took 165106ms that is 68x worse.  WOW SQL Profile is evil!

Well lets not through SQL Server Profiler out on the chopping block just yet, maybe some events are more expensive?  So let me take out all of the Security and Error events, since I really don’t care about them.

And …

It executed in 166530ms, huh slightly worse.  Okay that is just weird.  Lets chop that up to my crappy laptop.  But can say those events didn’t make much dent, because they were not capturing much to begin with.  Will now what can I do?  I want to use Profiler to capture what is running, but but is just too painful to wait this long.  How about we get rid of some of the columns we don’t need?

So I have eliminated more than half of the columns and its running tick tock tick tock …

Nope it didn’t finish while I typed that sentence as I hoped… hmmm … tick tock tick tock… waiting I do…

IT FINISHED!  It finished in 56550ms, wow that is 1/3 the time it took when I captured all the columns.  But thats still 23x worse than my control?  Hmm well I don’t really need Starting events because nothing is failing.  So I’ll only captured completed.  Lets see if that helps it?

Well it helped, it finished in 40086ms.  Slightly better, but still worse then control, about 16x worse.  Improving slowly … alright lets remove RPC events (since this is all batch work).  As we seen before I don’t expect much improvement.  But what they hey… that’s why do we do testing right?

36303ms!!! Improvement why, what is going on?  Well it turns out that my SQL Server has some back ground processes running (Ooops!) that were making RPC calls, so it did hurt the performance a little. So now this is 15x worse.

I am only capturing SQL: BatchCompleted and SQL: StmtCompleted, with select number of columns.  I can possibly eliminate some more columns, but over all at this point it might not be worth it.  So the end lesson is?

SQL Server Profiler IS going to have performance impact on your server, you can minimize the impact but cannot eliminate it.

Results Summary

  • Control Test, No Profiler: 2416ms (2 Seconds 416ms)
  • Run #1, All Events & All Columns: 165106ms (2 Minute 45 Seconds 106ms)
  • Run #2, Only RPC/TSQL Events & All Columns: 166530ms (2 Min 46 Seconds 530ms)
  • Run #3, RPC/TSQL & Select Columns: 56550ms (56 Seconds 550ms)
  • Run #4, Only Completed Events & Select Columns: 40086ms (50 Seconds 86ms)
  • Run #5, Only TSQL & Selected Columns: 36303ms (36 Seconds 303ms)

Lessons Learned

SQL Profiler is a very important tool to find out what is happening.  But you need to understand what you need, do not capture every event and column.  You cannot eliminate the performance impact, but it is lesser of two evils.  Either know what is going on, or don’t know and live with performance problems. Yes we can use Dynamic Management Objects, but the tool is still important.

Next week I will do same test but using SQL Server Service Side Trace and follow it up with Extended Events.  After which I’ll compare all three 🙂

 

PAL (Performance Analysis of Logs) Issues

May 18, 2012 Leave a comment

Created few issues on http://pal.codeplex.com for the new functionality added in version 2.2.  Where you can enter in SQL Server Instance name in a dialog box so it produces a counter list with instance name instead of only default.  Please vote, to get it fixed (not critical issues however).

Categories: Tools Tags:

Creating a Simple Counter List in PAL (Performance Analysis of Logs)

May 17, 2012 Leave a comment

In my previous post, I talked about how implement the counters in Windows 2003.  However I went about getting the list of counters from PAL the long way.  Kind of feel a bit stupid about this now, but Clint (PAL Creator), pointed out we can do this directly in the tool.  I had looked around for it in few locations, expect the obvious.

  1. Open PAL.
  2. Go to Threshold File.
  3. Click Export to Perfmon Template File.
  4. Click OK in SQL Instance Dialog Box.
  5. Change the Save as Type to “Logman Files (*.txt)”.
    No more manual hacking and slashing.  (Thanks Clint).

Cheers!

Categories: Tools Tags:

How to implement PAL (Performance Analyis of Log) counters in Windows 2003?

April 1, 2012 3 comments

PAL designed by Clint Huffman, available on Codeplex.com.  This tool analyzes the performance counters logs for you to identify if anything is outside a threshold.  These thresholds are predefined in the program, thus you don’t have to remember stuff like minimum “Page Life Expectancy” suppose to be 300 (all though you really should remember at least that one :P).  But it has many-many good counters to look at for system and SQL Server.  So instead of me providing a list of all the counters you can look at there is functionality in PAL to export an XML file from which you can create Windows 2008 based Perfmon capture (will blog about it another time).  But what I want to do is use the same counters on Windows 2003, however the 2003 version does not support XML so we have to use command line tool called logman.exe.

PAL Wizard - Threshold File

1. Start PAL.
2. Go to Threshold File.
3. Click on Threshold file Drop Down select “Microsoft SQL Server 2005/2008”.
4. Click Export to Perfmon Template File.

PAL - Instance Name Dialog Box

5. It’ll ask for Instance name, click OK (assuming we are running it on Default instance).
6. Close PAL.
7. Open the XML file in Notepad.

Modifying XML Data File

SQLCounters.XML File XML To Delete @ TOP

PAL Updating XML File

SQLCounters.XML File XML To Delete @ BOTTOM

8. Delete the Highlighted Part at the Top of the XML file and at the Bottom of the XML file.
9. Search for and replace ”    ” with (nothing).
10. Search for and replace “” with (nothing).

Clean Counter List

11. So you are left with the above image.

Save Dialog Box
12. Now Save file as SQLCounters.txt and make sure the to change encoding to ANSI (Windows 2003 command doesn’t like unicode).
13. Run following command from Command Prompt.

logman create counter SQLPerfCollection –cf “C:\Temp\counters.txt” -si 00:10

14. You should see a collection in perfmon called SQLPerfCollections now.

Credit for this goes to a colleague of mine, Microsoft Platforms PFE (Thanks!!).

Categories: Performance Tuning, Tools Tags: , ,

Why backup testing is important?

October 21, 2009 Leave a comment

How do you do backup testing? In normal cases we backup data from SQL Server (either using SQL Server Agent, or another 3rd party backup agent) to Disk and then to Tape or Directly to tape. We setup alerts on backups to make sure the backup job completed successfully or failed? What else we can do, we can implement verification test after backups to make sure backups are valid. Is that really enough?

What other things should be considered? Every part of the backup and restore process needs to be tested to confirm you have good backups. It should be part of policy to test this as often as possible with minimum at quarterly.

** Assuming using SQL Server Agent Backups **.

1) Confirm backups (Email alerts, manual checks).
2) Confirm backups are going to Tape directly or are being picked up from Operating System backup (Email alerts, manual checks).
3) Request a restore from Tape (make sure it is from off-site location and not Virtual Tape Library (VTL)).
4) Run Verification on the backup to make sure no issues with tape drive and backup file is still valid).
5) Restore the database in a testing environment.
6) Check security and GET END user to try it out.
7) Verify ALL infrastructure items are there, configured, and working (tape backup, tape agent, sql backup, security, disk space, disk availability, etc.)

So Step 1-2 are normal; but that is not where backup confirmation ends. You should occasionally request restore, verify it and restore it in testing environment. Step 6 is seem a bit strange but it just as much part of restore process, I restore the database I am done but if no one can still connect? Then have I really recovered from disaster? Step 7 is explained later….

So why am I writing about this? This is basic DBA duties that should be hard-coded into our brains. Well I thought it was hard coded in my brain but sometimes because of other project constraints, time, money, etc. Process is skipped well in one such case this happened to me.

I configured SQL Server, all standard jobs; disk backups and email alerts. Everything works nice I did not get any errors. Backups were going off server that has been production for a while setup before my time on the team. So I assume everything was correct as we have been using this dump location for a while. This was new server so I didn’t get chance to run verification steps listed above after the server was up, assuming things are going well, I got busy on other tasks. Well that was bad decision on my part, because I ASSUMED! things were configured when they were not. This is were #7 comes into play, I assumed and was left in an awe when I was told this dump location doesn’t have any tape agent installed on it.

I got contacted by client, because of a bug in the application it allowed the client to delete their ENTIRE WEBSITE (Content Managed System) with SQL Server Backend. I found out 1 day after the fact so I did not have backups on disk any more. I contact the Storage guys to ask for restore and found out they DON’T HAVE ANY tape backups for my dump location. So I quickly made copy of what Transaction Log Backups, Data backups I hand on hand to see if I can get some kind of restores. But found the transaction log backups needed the full backup that has been over written already.

Since this was a complete data lose I had to turn to 3rd party vendor tools for data-recovery in non-standard manner. I used the ApexSQL Log tool to read the transaction log to recovery engineer all the records that were deleted. However using this tool found few annoyances that you’ll have to watch out for:

1) The tool will no generate any recovery script for binary type objects. In file format or other wise.
2) If you have large varchar columns or such the system will try to be creative and change the Enter/Carriage Return to Char(10) + Char(13) string concatenation. Problem with that is for very large varchar columns SQL Server could not prase the SQL String successfully any more, for more please read here.
3) In default settings the code does not generate the GUIDS, so if your database heavily relies on this this setting has be turned off before reverse engineering.
4) While the tool is reading transaction log it chugs through alot of memory, for example reverse engineering, transaction logs with over 1M transactions with 1.5GB ram program crashed on me every time. So I processed the transaction logs in small chucks.
5) I found the tool to be extremely slow when selecting/unselecting and the filter functionality to be limiting.

So aside from these limitations this tool still saved me, it couldn’t recovery enough to get site working. But it recovered enough that CMS product vendor was able to recreate the pages because of the redundant nature of the data.

ApexSQL Doc

May 27, 2009 Leave a comment

Another ApexSQL Tool I got to try out finally …. I never got chance to try the RedGate version but since I got all the tools when I ordered the ApexSQL Toolkit; I been trying tools as I get time.

    • So the tool right of start lets you choose Database Engines/Integration Services you wish you document.
    • After adding the SQL Server Database Engine (SQL Authentication or Windows Authentication); you are able to select what databases to run the documentor on (note will not work on databases set to 6.5 Compatibility).
    • After which I can choose detail options for server like Server Properties and …

    • Followed by detail options for databases like …

  • As you select the above options you are able to further select details for each object you would like to see like Columns, Properties, Permissions, Parameters, Index Information, Statistics, etc…
  • Like the server objects you can also choose to set an filter on all or any of the objects in the database.

After all that you can choose options for the report, for some extra information that this tool can put together for you:

Database Details: Include Database Summary Information
DDL Options: Include DDL Script, T-SQL Syntax Highlights
ToC Options: Include Parameter/Column Names Dictionary, Include Objects by Filegroup Section, Include Objects by Owner/Schema Section
Miscellaneous: Include Documentation of Procedures/Functions, Use Default Owner/Schema Qualification in Generated Documentation, Show Sections with no Relevent Metadata to Document, Show Nodes for object Types that don’t Exist in Database Objects
Extended Properties
Dependency Options: Explicitly Parse Database for Improved Dependency accuracy, Include Dependency Lists/Tables (Didn’t try this to see how good of report it can produce).
Output options: Complied Help File Format (.chm) (This didn’t work for me; Microsoft Help file compiler was taking long so the program terminated and didn’t save any of the work so I had to start all over), Linked HTML Files (.html) (Worked nicely, but generates a very last output if all the options selected), Complied Help File Format HTML 2.0 (.hsx) (Didn’t try).

For output you can define your own format how you would like to show the data, there is predefined CSS; but you can choose your on if you wish.

I like the tool; but for speed for putting information together it is slow. I think that is expected because the amount of information it gathers. However sometime it looks like the program has hung up and is very unresponsive.

I plan to try the RedGate version of Documentor to see how well that tool works comparing to this; updating in future posts.

Categories: Tools Tags: ,

Blocked Process Report in SQL Profiler

April 21, 2009 4 comments

One of my servers been giving me grief as of late because the vendor has created process into their application that dumps the full content of their table into the temp database before processing it for each page load.

I been trying to find out what kind of blocking it is causing but when I turned the profiler on the “Blocked Process Report” event under “Errors and Warnings” did not return any information. So I thought okay maybe there isn’t blocked process …

I created a blocked event on purpose but it still did not report it. I have used this before on other servers without problem so I was a bit confused. I checked the Configuration settings on server and found the Blocked Process detection was disabled. By default apparently it is disabled (new to me); as all other SQL 2005 servers I have it is enabled. I guess someone must have enabled it at later point.

How to check if you have Blocked Process Report enabled:

sp_configure 'show advanced options', 1
GO
RECONFIGURE
GO
sp_configure 'blocked process threshold'
GO

If the value returned from the last statement is zero; then it is disabled. You can set it to your preferred value it is in seconds with following code:

sp_configure 'blocked process threshold', 10
GO
RECONFIGURE
GO

Remember to turn off the advanced options :).

References:
How to Identify Blocking Problems with the Profiler by Brad McGehee, Link.
MSDN Online, sp_configure options, Blocked Process Threshold, Link.
MSDN Online, Blocked Process Event Details, Link.

%d bloggers like this: