Advertisements

Archive

Archive for the ‘Windows 2008 Clustering’ Category

Missing registery settings in cluster nodes for SQL Server

October 27, 2014 Leave a comment

I run into this occasionally, I think in last 3-4 years being SQL Server PFE, I have seen this issue total of 4 times.  So its not common, issue.  So for this post, I’ll use example architecture.  Two-Node Cluster, Node A and NodeB running SQLFCI1 on it.  SQLFCI1 runs fine on NodeA but fails on NodeB.  Looking at the Application Log we see strange messages like “Could not open error log file ”“.  Other messages might around missing various configuration settings that SQL Server needs to start up.  So how can that happen?

When SQL Server is running as a Failover Cluster Instance (FCI); its configuration settings (a.k.a registry keys under HKLM\Software\Microsoft\Microsoft SQL Server) are saved in a cluster hive in registry.  So when the node fails over from active to passive these settings get carried over and applied to passive node.  That is why we have best practice to make all configuration settings on active node only, if you make it on passive node, or if instance is offline.  The Cluster Service will over write them with what it know of the settings.  This is called CheckPoint process.

We can check if all the required SQL Server keys being copied to cluster hive or not.  We can do that from Command Prompt using following command:

cluster.exe . res “SQL Network Name (SQLFCI1)” /CheckPoints

You will get a output similar to below:

Listing registry checkpoints for resource ‘SQL Network Name (SQLFCI1)’…

Resource                   Registry Checkpoint
————————– —————————————————————————-
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Cluster’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\MSSQLServer’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Replication’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Providers’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerSCP’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\CPE’
SQL Network Name (SQLFCI1) ‘SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerAgent’

However if you get only some or no record back we have an issue.

PLEASE NOTE DO THIS ON NODE THAT IS WORKING.  IF YOU DO IT ON NODE THAT IS NOT, YOU WILL LOSE ALL YOUR REGISTRY SETTINGS.

  1. Backup the HKLM\Software\Microsoft\Microsoft SQL Server\ hive on both NodeA and NodeB (just in case, you ignore my warning above/ or murphy’s law kicks in).
  2. Confirm instance is on NodeA, if not failback to NodeA from NodeB (NodeA was the good guy in my scenario above).
  3. Execute following commands to add each of the key registry settings to cluster checkpoint.

cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Cluster”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\MSSQLServer”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Replication”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\Providers”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerSCP”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\CPE”
cluster.exe . res “SQL Network Name (SQLFCI1)” /Addcheckpoint:”SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLServer\SQLServerAgent”

After this re-run the /CheckPoints command above to verify they were added successfully.

Advertisements

Multi-path I/O (MPIO) does not work for disk with identifier … on node …

May 6, 2010 5 comments

While configuring a Windows 2008 cluster, currently it has 3-nodes, getting ready to add 4th node. First step running it through cluster validation wizard.  The wizard keeps failing with the disk check returning following error:

MPIO Error

My research indicated this might be iSCSI setting; because on Windows 2008 to use NAS technology in cluster environment iSCSI-Persist setting must be enabled.  Both [1] and [2] show how to enable that setting, but after our storage guys checked and made sure this setting was enabled we still could not get past this error. We verified flags on the HBA, the SANsurfer, and ECC Master Agent on the node with already configured nodes with no luck.

I had configured Windows 2008 Cluster (2-Node Cluster) before but never ran into this issue.  On this server we used PowerPath as agent to talk to EMC Symmetrix DMX-3.  I understanding is limited on NAS storage is limited so I didn’t know the functionality of PowerPath.  So the issue was not apparent, after looking at the 2-node cluster and the 3-node cluster we noticed the different agents were used to communicate between nodes and ECC storage.

The 3-Node cluster was configured to use SANsurfer, therefore to keep configuration on this cluster similiar we installed SANsurfer also.  However the issue remained; after reboot, we also noticed there are 59 disks visible on the server.  The server only has 2 physical disks and 29 masked; so it should be at most 30.  Looking at the cluster validation report I noticed that same disk was being masked to two ports; this is expected because of DUAL-HBA for failover.  This was same on all other nodes, so why did we not see duplicate disk on those nodes?

After digging through some documentation, talking to guys in office and internet digging, came up on [3].  Because Microsoft MPIO was not installed and not configured; each disk was mistakenly viewed as two disks.

MPIO vs Non-MPIO Diagram

Image from Microsoft TechNet [4]

So how to fix it now?  In Windows 2008, first you need to make sure Microsoft MPIO is installed, reference [5] on how to install it.  After you are finished installing bring up MPIO under Administrative tools.

MPIO Proprties Dialogbox

Should see something similar to the screen above, then click on “Discover Multi-Paths“.

MPIO Dialog box - Discover Multi-Paths Tab

In this screen if you notice on the bottom you will see EMC SYMMETRIX, select that and click Add.

Reboot Required - Confirmation Dialog Box

After which it will ask you to reboot, click “Yes”.  The problem should be resolved after reboot.

So I asked the storage expert on why did I need to configure this on this cluster and not the last one.  He said because PowerPath comes with Multi-Path I/O built in.  But because more and more people are using SAN/NAS technology with their servers Microsoft decided to add Multi-Path I/O into Windows 2008.  This makes it cheaper because then we don’t have to pay for extra license cost on PowerPath’s Multi-Path I/O.   However in his opinion, it is not as robust for load-balancing and but for failover technology (which is what I am using it for); it works without issues.

References

  1. TechNet. Windows Servers. Windows Server 2008 Cluster Disk cannot bring online. Link.
  2. Bryan Coffey’s Blog. Configuring Symmetrix DMX for Windows 2008 Cluster. Link.
  3. TechNet. Microsoft Multipath I/O Step-By-Step Guide. Link.
  4. TechNet. Microsoft Multipath I/O Step-By-Step Guide. Understanding MPIO features and components. Link.
  5. TechNet. Microsoft Multipath I/O Step-By-Step Guide. Installing and Configuring MPIO. Link.
  6. EMC2. EMC Symmetrix DMX-4 Series. Link.

Acknowledgment

I did lots of digging to get this working while working with co-worker, so credit isn’t mine.  I can take credit for digging, but people who came before me and storage guys are the people who had the answer to this.  I am blogging about it so 1) I have reference for future and 2) So it “might” help someone else :).

SQL Server 2005 Full Text Search services failed to Start Up

May 29, 2009 6 comments

Right after setting up SQL Server 2005 cluster on Windows 2008; the FTE Services did not start up automatically. Following error gets recorded in the event log/cluster events:

Cluster resource ‘SQL Server Fulltext (InstanceName)’ in clustered service or application ‘VirtualClusterName’ failed.

Generic application ‘SQL Server Fulltext (InstanceName)’ could not be brought online (with error ‘1075’) during an attempt to start the service. Possible cause: the specified service parameters might be invalid.

Microsoft Engineer pointed me to KB936302, Problem #3; however said solution of install SP2 was not enough. The services still refused to come online, to make it so services started successfully we edited following registry entry:

Key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSFTESQL$InstanceName\
Multi-String Value: DependsOnService
Value Changed from …

NTLMSSP
RPCSS

to

RPCSS

Rebooted each node; services came on successfully.

Cluster name resource failed registeration in DNS

May 27, 2009 6 comments

This was not an issue with Windows 2008 Clustering, but was rather an oversight when windows clustering was configured. For clustering the Cluster resource name must have full access to the Virtual Cluster Names so when failover takes place DNS entries can be updated.

Following errors were recorded in event logs when it registrations fails:

Cluster network name resource ‘SQL Network Name (VirutalClusterName)’ failed registration of one or more associated DNS name(s) for the following reason:
DNS signature failed to verify.

Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server.

To fix this you’ll need permissions to update the DNS entire permissions for the VirtualClusterName resource name:

  1. In DNS Management (dnsmgmt.msc):
  2. Find the VirtualClusterName that is failing to register.
  3. Right-Click Properties.
  4. Select Security Tab.
  5. Click Add.
  6. Click Object Types.
  7. Check off “Computers“; uncheck other options selected.
  8. Enter in the name of the cluster (a.k.a Cluster Name Object (CNO)).
  9. Click Check Names; Verify that the entry has been found.
  10. Click OK.
  11. Give the CNO FULL Control over this record.
  12. Click OK.

Authentication Issues on SQL Server Startup

May 20, 2009 Leave a comment

After we finished install SQL Server for clustering (install was excellent nothing failed, all greens, logs good); SQL Server refuses to start up. We keep getting following types of error messages in our ERRORLOG file; this was same for RTM and SP2:

2009-05-09 12:53:16.94 Logon Error: 18456, Severity: 14, State: 11.
2009-05-09 12:53:16.94 Logon Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’. [CLIENT: xxx.xxx.xxx.xxx]
2009-05-09 12:53:39.53 spid7s SQL Server is terminating in response to a ‘stop’ request from Service Control Manager. This is an informational message only. No user action is required.

I could not figure out the issue with this; so while talking to Microsoft I found two interesting facts:

  1. I was able connect to SQL Server using \\.\pipe\SQLLocal\NamedInstanceName\ (so using Named Pipes) using Windows Authentication and SQL Server authentication.
  2. I was able to log into ServerName\NamedInstanceName using SQL Server authentication only.

So Microsoft engineer thought it was an issue with NTLM so he got me to create a new value in the following registry location:

Key: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Lsa\
Value Name: DisableLoopbackCheck
Value Type: DWORD
Value: 1

Microsoft Engineer referenced KB887993 as to this fix; this resolved the authentication issue with SQL Server.

But today I had issues with other servers where users not being able to authenticate, in a recent Windows Update, KB957097 (addressing MS08-068 Security Bulletin, link). After reading up KB957097, I found out disabling that setting actually is not advised way to fix this issue; I sent an email to Microsoft Engineer asking his opinion on this …

Ref Links:
SQL Server Protocols: Using Kerberos with SQL Server, Link.

Adventures of Setting up SQL 2005 on Windows 2008

May 15, 2009 Leave a comment

I wrote about issues I ran into setting up SQL Server 2005 cluster on Windows 2008 before; other then install issues I didn’t really have many issues. But in that cluster I wasn’t using Notification Services, MS-DTC for each instance of SQL Server. In addition I had never done Windows clustering bit just SQL part; so there were some interesting lessons to learn here from configuration requirements and SAN requirements. Following are things to watch out for before you start building a server, just a small check list.

  • SAN Disks must be masked to each node of the cluster if they are not masked to all nodes then failover can only occur between nodes that it is masked to. This was an issue because I had to get our SAN guys to remask all disks again because they were visible to each of the nodes.
  • SAN attached disk in Windows 2008 MUST have SCSI-3 Persistent Reserve Setting done for each LUN being masked to the server; or clustering checkup will fail.
  • The IPs for Private heartbeat have to be on different subnet then that of public (common knowledge); but I really never read anything on that when reading SQL Clustering books. I guess I need to read up on Windows clustering a bit (Read KB258750 for more info).
  • While running Windows clustering setup we ran into issue where we could not authenticate to second node in the cluster; it turned out to be an issue with bindings on the Private network and the order of the NIC configuration. In your network configuration you have to go to Advanced Options and unbind Private network from Microsoft Network and Printer and File Sharing Services. In addition change the order of networks with Public being on top, followed by Private, followed by all the other disabled NICs (NICs that were not in use).
  • Lets say you are setting up 2-Node Cluster with Active-Active configuration; then you must have IP address for Cluster Name, 2 IP addresses for each of Virtual SQL Cluster Names, 2 Public IP Addresses for Physical Nodes, and 2 Private IP Addresses on different subnet from the public addresses for Physical Nodes. Lesson learned hear was getting an IP address for Cluster Name.
  • Following same example above; you will need to setup DNS entries/Reverse Name Look-ups and SPN for Cluster Name, Virtual SQL Cluster Names, and two Physical Names.

I ran in to many issues and errors also; I’ll post over next week or so.

SQL Server 2005 Clustering with Windows 2008 Cont’d 2 …

October 2, 2008 Leave a comment

I finally got response from Microsoft, it appears there is an issue with MSI and Cluster services; he didn’t make it clear to me if it was Windows 2008 issue or not.

This is what they found as the issue to be:

This particular issue is a case sensitivity issue for system name.The workaround
is to rename the system names to upper cases only. If the computer name is in
lower case or mixed case, setup fails to notice that its on owning/active node
and needs to run the scripts. The reason why this happens is that MSI gets the
computer name from local system to compare it with the name of current owner
from cluster service. The name MSI gets is all upper case. The name that cluster
API returns is actual name of the machine which could be mixed or lower case.
SQL Server setup uses case sensitive comparison here which makes it impossible
for setup to realize when it is running on active node. Thus setup does’t run
the scripts which it is supposed to.

So after this email I got another email explaining what to do to fix it:

  1. Un-Install SQL Server From the Cluster ( Make sure you remove all the components of SQL server ) and clean up all the folders .
  2. Rename both node names to the Existing Name with Upper Case ( This will ask for a system Reboot ).
  3. Evict One Node at a Time and then Add it back to the cluster ( Make sure you don’t have to type the node names but just select it from the list and it’s in Upper Case).
  4. Start the installation from the node which is owning the SQL Server Group.
  5. Upgrade SQL Server 2005 to Service Pack 2.

So we are trying to do that right now but keep running into other issues. Update with if this worked or not in future :).

———————– Update Oct 3, 2008

So finished reinstalling the Cluster; and I don’t get the issue that was listed below. I asked when will the next patch be out to deal with this issue; the Microsoft Support engineer was not sure and couldn’t give me a time line.

At least we have a work around now :D.

SQL Server 2005 Clustering with Windows 2008 Cont’d…

September 25, 2008 Leave a comment

Just had chat with another Microsoft SQL Server Support Engineer and they still can’t re-produce the error. In their log this has happened once before they couldn’t get anything. He got me to execute the following SQL Statement:

SELECT SERVERPROPERTY('ResourceVersion') as RescourceDB,
SERVERPROPERTY('ResourceLastUpdateDateTime') as ResourceDBLastUpdate,
SERVERPROPERTY('ProductVersion') as Ver,
SERVERPROPERTY ('ProductLevel') as SP;

Which returned following information:

ResourceDB: 9.00.1399
ResourceDBLastUpdate: 2005-10-14 01:56:22.007
Ver: 9.00.3282.00
SP: SP2

Both Microsoft Engineer and I were looking at this going what is going on? So not only the mainteance plans failed now we have an issue with resource db not being updated with SP. We noticed that the DB version is still the original RTM version; we decided to run the script again another SQL Server 2005; and the versions were same as expected.

So now they are going to coders to see if there is a solution for this.

SQL Server 2005 Clustering with Windows 2008

September 24, 2008 2 comments

FUN!

One word to describe it; it was very easy to setup. We just completed setting up a 3-node configuration with Active-Active-Passive. Applied the SQL Server 2005 SP2, logged in-to server and found we can’t create maintenance plans or view existing ones.

So I though okay something went wrong in SP2 install; because it was working properly; I’ll just reapply it and I’ll be set. However, after trying it again same issue. I have other servers running SQL Server 2005 SP2 so I couldn’t figure out why it was crapping out.

Errors I was getting:

This error happens when you are trying to get a listing from the maintenance plans; because it is querying a system view called sysmaintplan_plans in msdb. Both these fields were added in SP2; so this view needed to be altered; but failed.


This error occurs when you actual create a new maintenance plan; this error happens because the field is missing in the table called sysmaintplan_subplans in msdb.

So after banging my head against the brick wall (reading self-support pages, Google, newsgroups, and whatever I can get my hand on); I decided to place a call with Microsoft.

Well we couldn’t figure out the issue there too or rather what happened; Microsoft tech got me to re-do the entire SQL Install to make sure I didn’t miss anything and after 6.5 hour long call; we still didn’t know what caused the issue.

We thought it might have been because of permissions and new security functionality with in Windows Server 2008, and the User Account Control (UAC, read about it here for its affect on SQL Server). Alternatively I though maybe SP2 x64 edition had some issues in the package. But Microsoft engineer said they have that running in their test environment without issues.

At the end of the call we still didn’t resolve the issue but had a work around. We went to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Install\ and loaded the sysdbupg.sql script which is part of SP2 and executed it manually. It resolved both those errors.

So I have a work around but I was not happy with this, thinking I just fluked out. So I decided to turn of UAC and made sure that I had SA rights in SQL server and did the install again but failed only way I can get it to work was with Work around. And looking at the log file I found there were 100+ components missed because they couldn’t be found by the package and the SQL Script was one of them.

I made the Microsoft Engineer back, and they are going to get their coders to look at the package.

So for now I am blaming SQL Server 2005 SP2 x64-Bit as faulty.

Further updates as I get them :).

%d bloggers like this: