As per Cannot Generate SSPI Context article, Link, I had changed the MaxTokenSize to fix the issue. I had read various articles before jumping into this change as being the only proper solution to the problem.
However after changing the setting and rebooting the passive node the node refused to join the cluster; looking at the error logs I have messages like …
- Unable to get join version data from sponsor xxx.xxx.xxx.xxx using NTLM package, status 5.
- Unable to connect to any sponsor node.
- Failed to join cluster, status 53.
- Physical Disk : [DiskArb] Failed to read (sector 12), error 170.
Researching on net I could not figure out results; first article I ran into was KB886717, . It suggested the issue might be because the C:\Windows\Cluster folder was over size of 10MB; which in this case it was. So I removed the log file and tried to restart services with no luck.
I started reading through the log file and started to pick through error messages and looked and articles , , , , and . I verified all the settings each article suggested, Group Policies, Security, firewall, etc. None of it seems to help in making sure the passive node would join the cluster successfully.
I started trouble ticket with Microsoft; they found few settings, on of being NTML Compatibility Level that needed changing as part of their troubleshooting but even after changing these settings we were still getting NTLM, state 5 error messages in Cluster.log file (State 5 means permissions denied). Talking to Microsoft they referenced few more KB articles in addition to what I had found already like , , , &  that indicated what “might” be the issue; but none of them seem to help resolve our issue.
This whole time we had not rebooted the active node as it was working successfully; but since we were hitting stone-wall every turn we decided to further troubleshoot the issue node 1 (active node) must be restarted because the errors that were being generated on Quorum disk. After rebooting Active Node, the Passive Node came active and Clustering was working successfully.
I had not read any KB article indicating the issue with MaxTokenSize and Windows Clustering, and neither had the Microsoft guys. So talking to the Kerberos experts we figured issue was similar to , in which if you change password or the password length is less then 15 characters of the Cluster Services account permissions or security settings are not properly hashed and generates errors when authenticating the new node to the cluster.
So if you are changing the MaxTokenSize setting on SQL Server and it is a cluster please make sure you change it on EVERY NODE; or you will have lots of strange issues that probably shouldn’t exist.
 KB886717 Issue with Cluster Log file, Link.
 Problems with Microsoft Clusters, Link.
 How to manually re-create the Cluster service account, Link.
 A Windows Server 2003 based-computer that is running the Cluster service may be unable to join a cluster after the computer is first restarted, Link.
 Cluster Service May Not Start After You Restrict Available IP Ports for Remote Procedure Call, Link.
 Ask Core!, Troubleshooting Cluster Logs 101 Why did the resource failover to other node?, Link.
 Cluster service account password must be set to 15 or more characters if the NoLMHash policy is enabled, Link.
 You cannot add an additional node to a Windows Server 2003-based server cluster, and error code “0x8007042b” is logged in the ClCfgSrv.log file, Link.
 You receive an “Error 0x8007042b” error message when you add or join a node to a cluster if you use NTLM version 2 in Windows Server 2003, Link.
 How to enable NTLM 2 authentication, Link.
 Cluster service does not start on joining node in Windows 2000 Cluster, Link.