Question : Connectivity Problems between DC and Edge server in EBS domain


We recently installed the Accpac 5.5 CRM component on the EBS Management server and part of the process was to run a special program provided by Sage to set the required firewall ports in Windows firewall on the Management server to allow access to CRM.  This was done on Friday afternoon and over the weekend I noticed more and more problems on all of the servers.  It appears that the CRM install broke the firewall functionality in the domain, particularly affecting the Security server firewall and TMG.  Symptoms I encountered were numerous ...

- HTTP/HTTPS outbound would not function
- other protocols (ICMP, FTP, RDP worked fine)
- SMTP messages would queue up in the Messaging server edgesync-default-first-site-name queue for sometimes an hour or more and no amount of manual retries on the queue would move them
- Inbound RWW, HTTP, and SMTP was unaffected.
- Test-EdgeSyncronization would show everything succeeding except RecipientStatus which failed.
- A particularly annoying problem is that whenever I attempt to RDP and logon to the Security server, the connection will establish as quick as usual but after I enter the account and password, the logon will freeze at the Welcome message and the little circle will stop spinning when I'm reconnecting.  If I'm logging on, it does the same thing at "applying computer settings" (GPO processing).  After about 4 or 5 minutes the session will time out and end or occasionaly successfully logon. If I reconnect immediately sometimes it will logon with no delay at all (the desktop appears instantly) and sometimes it will freeze again.  This scenario occurs whether I log on with the domain admin or the local administrator account so I think the issue is with GPO processing.
- Another symptom is that in the EBS Administration console on the management server under the Security tab, the Network firewall component is unable to connect to the security server to get the firewall or network status information.  I installed KB973236 to try and reset all my security settings but it fails to connect as well.  I cannot backup the network configuration, either.  The error I get is "Exception: The Administration Console cannot connect to Forefront Threat Management Gateway" with a 0x800705B4 error code.

There are other strange symptoms I've seen in monitoring the TMG log on the Security server and in the network packet traces I've done such as the management server failing on connecting to random ports, etc.  

I found that if I disabled the public profile and left the domain and private profiles enabled in Windows firewall on the Security server, it would allow outbound HTTP traffic and seemed to help the SMTP outbound traffic somewhat as well.  Finally, I removed all firewall settings for the CRM software and some of the symptoms changed but the problem did not go away.  Now, in order for HTTP outbound to work, I have to disable the Domain profile in Windows firewall and it still works if the public and private profile are enabled.  

I've come to the conclusion that there is something blocking all secure communications between the security server and the rest of the domain but I'm unsure as to whether it's a corrupt SSl certificate or it's something wrong with Active Directory or a problem with ADAM on the security server.

So, I think I need some input from a EBS guru.  Anyone have any ideas as to what I should look at?  My thought is that if I can re-establish communications between the management server and the security server, I can run the wizard to change the domain security level and it will reset all the firewall settings to something that works.

Thanks in advance.

     

Answer : Connectivity Problems between DC and Edge server in EBS domain

The problem has been resolved.  Essentially, the problem was due to NETLOGON failures whenever the Security server needed to check with Active Directory whenever a domain resource was used such as checking the machine account when logging on.  I suspect the ultimate cause to be some sort of corruption or problem in RPC processing on the Security server.  I never did determine the actual cause of the problem but it turns out the problem started just after I installed the DPM client on the security server.

The easiest solution was to re-install the server.  EBS has a feature where when you install one of the three servers, the wizards will check with active directory and determine whether the particular server you are re-installing was a member of the domain previously.  This only works if you give the server the same name as it previously had and you must have at least one DC available.  It asks you if you want to replace the old server configuration in active directory with the one you are installing and then proceeds to do so, successfully I might add.  Given that it was the security server and there was very little changed from the default configuration and no user data involved, it was a painless quick re-install.  It was intelligent enough to re-establish it's role including installing the certificates.  All I had to do was add the 6 rules I had defined back into TMG and make a few other minor config changes for connectivity verifiers.  I was confident of the server successfully re-joining the domain because I had tested this exact scenario when I first installed EBS by destroying active directory on the management server and then re-installing it successfully without having to rebuild the whole environment.

The result is that the Security server performs better now than it has in a while with a number of problems cleared up and I documented the whole process to add to our DRP kit as an optional solution in the event of a disaster.  

Random Solutions  
 
programming4us programming4us