Question : SQL server and SQL agent will not start on both nodes in Active to Active SQL 2005 SP3 x64 Enterprise cluster on Windows 2008 x64 after applying KB958644

Hello,
Apologies if this has already been posted but I cannot find it anywhere. My SQL server and SQL agents will not start on both nodes in my Active / Active SQL 2005 SP3 x64 Enterprise cluster on Windows 2008 Enterprise SP1  x64 after applying KB958644 which is the conficker patch. This is the second time I have built the cluster and I tested it and it worked perfectly prior to applying the KB. Has anyone seen this issue before and do they have a fix? Basically not patching the 2 servers is a no go and I have already patched 2 separate clusters with KB958644 and they worked fine, those clusters were W2k3 R2 SP2 2 node with SQL 2005 SP2 and Windows 2008 Enterprise File and Print cluster with 5 nodes.

Whats happening is all the services on both nodes come online except SQL server and sql agent. The are all connected to EMC2 sans via fibre channell cable on HP DL380 G5 with 32 gb RAM and 2 x Quad core 3 ghz CPUs. When I look in cluster event log I get the following errors:
event ID 1069: Cluster resource 'Cluster Disk 5' in clustered service or application 'Cluster Group' failed.
and Cluster resource 'SQL Server (servername1)' in clustered service or application 'clusterservicename' failed.

In the event viewer on Windows 2008 I get these 3 errors:
event id: 19019 [sqsrvres] OnlineThread: Error 1 bringing resource online. (more info in code)
event id: 19019 [sqsrvres] OnlineThread: service stopped while waiting for QP.
event id: [sqsrvres] CheckServiceAlive: Service is dead

I got the above error with KB958644 installed and as I said before the cluster was working perfectly prior to the patch. I tried uninstalling the patch but the same error happens above. I need to get this working with the patch installed otherwise I will have to recommend going back to Windows 2003. If anyone can shed some light on this I would appreciate it and if you require any more information please let me know. thanks Bruce
Code Snippet:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
65:
66:
67:
68:
69:
70:
71:
72:
73:
74:
75:
76:
77:
78:
79:
80:
81:
82:
83:
84:
85:
86:
87:
88:
89:
90:
91:
92:
93:
94:
95:
96:
97:
98:
99:
100:
101:
102:
103:
104:
105:
106:
107:
108:
109:
110:
111:
112:
113:
114:
115:
116:
117:
118:
119:
120:
121:
122:
123:
124:
125:
126:
127:
128:
129:
130:
131:
132:
133:
134:
135:
136:
137:
138:
139:
140:
141:
142:
143:
144:
145:
146:
147:
148:
149:
150:
151:
152:
153:
154:
155:
156:
157:
158:
159:
160:
161:
162:
163:
164:
165:
166:
167:
168:
169:
System 
 
  - Provider 
 
   [ Name]  MSSQL$ExcelicareReport 
 
  - EventID 19019 
 
   [ Qualifiers]  16384 
 
   Level 2 
 
   Task 3 
 
   Keywords 0x80000000000000 
 
  - TimeCreated 
 
   [ SystemTime]  2009-05-20T16:19:36.000Z 
 
   EventRecordID 10475 
 
   Channel Application 
 
   Computer clw-carsql-001.dirone.imperial.nhs.uk. 
 
   Security 
 
 
- EventData 
 
   [sqsrvres] OnlineThread: Error 1 bringing resource online.  
   4B4A0040010000000E00000063006C0077007600630061007200730071006C00300030003400000000000000 
 
 
--------------------------------------------------------------------------------
 
Binary data:
 
 
In Words
 
0000: 40004A4B 00000001 0000000E 006C0063 
0008: 00760077 00610063 00730072 006C0071 
0010: 00300030 00000034 00000000  
 
 
In Bytes
 
0000: 4B 4A 00 40 01 00 00 00   KJ.@....
0008: 0E 00 00 00 63 00 6C 00   ....c.l.
0010: 77 00 76 00 63 00 61 00   w.v.c.a.
0018: 72 00 73 00 71 00 6C 00   r.s.q.l.
0020: 30 00 30 00 34 00 00 00   0.0.4...
0028: 00 00 00 00               ...
 
 
- System 
 
  - Provider 
 
   [ Name]  MSSQL$ExcelicareReport 
 
  - EventID 19019 
 
   [ Qualifiers]  16384 
 
   Level 2 
 
   Task 3 
 
   Keywords 0x80000000000000 
 
  - TimeCreated 
 
   [ SystemTime]  2009-05-20T16:19:36.000Z 
 
   EventRecordID 10474 
 
   Channel Application 
 
   Computer clw-carsql-001.dirone.imperial.nhs.uk. 
 
   Security 
 
 
- EventData 
 
   [sqsrvres] OnlineThread: service stopped while waiting for QP.  
   4B4A0040010000000E00000063006C0077007600630061007200730071006C00300030003400000000000000 
 
 
--------------------------------------------------------------------------------
 
Binary data:
 
 
In Words
 
0000: 40004A4B 00000001 0000000E 006C0063 
0008: 00760077 00610063 00730072 006C0071 
0010: 00300030 00000034 00000000  
 
 
In Bytes
 
0000: 4B 4A 00 40 01 00 00 00   KJ.@....
0008: 0E 00 00 00 63 00 6C 00   ....c.l.
0010: 77 00 76 00 63 00 61 00   w.v.c.a.
0018: 72 00 73 00 71 00 6C 00   r.s.q.l.
0020: 30 00 30 00 34 00 00 00   0.0.4...
0028: 00 00 00 00               ....
 
 
- System 
 
  - Provider 
 
   [ Name]  MSSQL$ExcelicareReport 
 
  - EventID 19019 
 
   [ Qualifiers]  16384 
 
   Level 2 
 
   Task 3 
 
   Keywords 0x80000000000000 
 
  - TimeCreated 
 
   [ SystemTime]  2009-05-20T16:19:36.000Z 
 
   EventRecordID 10473 
 
   Channel Application 
 
   Computer clw-carsql-001.dirone.imperial.nhs.uk. 
 
   Security 
 
 
- EventData 
 
   [sqsrvres] CheckServiceAlive: Service is dead  
   4B4A0040010000000E00000063006C0077007600630061007200730071006C00300030003400000000000000 
 
 
--------------------------------------------------------------------------------
 
Binary data:
 
 
In Words
 
0000: 40004A4B 00000001 0000000E 006C0063 
0008: 00760077 00610063 00730072 006C0071 
0010: 00300030 00000034 00000000  
 
 
In Bytes
 
0000: 4B 4A 00 40 01 00 00 00   KJ.@....
0008: 0E 00 00 00 63 00 6C 00   ....c.l.
0010: 77 00 76 00 63 00 61 00   w.v.c.a.
0018: 72 00 73 00 71 00 6C 00   r.s.q.l.
0020: 30 00 30 00 34 00 00 00   0.0.4...
0028: 00 00 00 00               ....

Answer : SQL server and SQL agent will not start on both nodes in Active to Active SQL 2005 SP3 x64 Enterprise cluster on Windows 2008 x64 after applying KB958644

Latest update on this issue is I fixed it myself here is what i did,

Basically we got hit with the conficker virus so we had to patch all our servers with KB958644 and also we applied GPO as per MS site http://support.microsoft.com/kb/962007

So in order to fix my sql cluster issue (and in fact even to install the SQL cluster) you must
> reverse the above GPO if you have already applied it (as simply taking it out of the OU and running gpupdate will not reverse it)
> uninstall KB958644 on both nodes in your cluster, reboot
> now this is the bit that fixed my problem, i ran a command to reset the locat security policy on both servers (nodes) in my cluster which is this for windows 2008 (its slight different for windows 2003) secedit /configure /cfg c:\windows\inf\defltbase.inf /db defltbase.sdb /verbose
> i then rebooted, and made sure my cluster was working again
> then I reapplied the patch kb958644
> then ofcourse the cluster wouldn't start again so I re ran the above command on both nodes in my cluster, rebooted again and it all works

I hope this helps you let me know if you need anything else and I have just got off the phone to MS and explained to them what I did. They had this problem for over a week and couldn't help. cheers
Random Solutions  
 
programming4us programming4us