A tale of Event 422 on WAP servers
A tale from support. I hope this helps solve similar issues more quickly.
The Setup:
Two Active Directory Federation (AD FS) Servers running Windows 2012 R2, located on the corporate network.
Two Web Access Proxy (WAP) servers located in the DMZ.
The Story:
At first event 422 was logged here and there, but over the course of a couple of days, it began to be constant.
The error being logged was occurring on the WAP servers in the AD FS\Admin log.
Log Name: AD FS/Admin
Source: AD FS
Event ID: 422
Task Category: None
Level: Error
Keywords: AD FS
Description:
Unable to retrieve proxy configuration data from the Federation Service.
Additional Data
Trust Certificate Thumbprint:
<snip>
Status Code:
Exception details:
System.Net.WebException: The operation has timed out
at System.Net.HttpWebRequest.GetResponse()
at Microsoft.IdentityServer.Management.Proxy.StsConfigurationProvider.GetStsProxyConfiguration()
However, the WAPs were able to establish the trust to the AD FS server successfully:
Log Name: AD FS/Admin
Source: AD FS
Event ID: 391
Task Category: None
Level: Information
Keywords: AD FS
Description:
The federation server proxy was able to successfully establish a trust with the Federation Service.
Eventually, the WAPs stopped servicing authentication requests to the AD FS servers.
The Hunt:
We took a network trace while restarting the AD FS service. We found that after the WAP connected to the AD FS server, the WAP was the last to send a TCP ACK and then there was no traffic on the connection. After 100 seconds exactly, the WAP sent a TCP FIN and closed the connection.
The customer mentioned that when starting the Device Registration Service the service took a long time to start. We investigated this angle and found that Device Registration was initialized (Initialize-ADDeviceRegistration had been run), but Device Registration was not actually being used.
When we ran Get-AdfsDeviceRegistration on the AD FS server, it took about 3 minutes to complete.
At this point, I’m thinking about how the WAP closes the connection to the ADFS servers after 100 seconds, but it Get-AdfsDeviceRegistration is taking at around 180 seconds.
We tried to update the DRS configuration via PowerShell on the WAP to isolate this process. Sure enough, the process failed.
PS C:\> Update-WebApplicationProxyDeviceRegistration
Update-WebApplicationProxyDeviceRegistration : Unable to retrieve Device Registration Service configuration data from
the Federation Server.
At line:1 char:1
+ Update-WebApplicationProxyDeviceRegistration
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Update-WebAppli...iceRegistration], ConfigurationErrorsException
+ FullyQualifiedErrorId : System.Configuration.ConfigurationErrorsException,Microsoft.IdentityServer.Management.Pr
oxy.Commands.UpdateAdfsProxyDeviceRegistration
We continued to troubleshoot DRS and eventually came across the following hotfix:
3020773 Time-out failures after initial deployment of Device Registration service in Windows Server 2012 R2
https://support.microsoft.com/kb/3020773/EN-US
While the symptoms manifested in this case were quite different than what is documented in the hotfix, the symptoms were in line with “it takes a long time to find a valid key”. Taking a long time to find something would definitely result in an operation timing out. The hotfix sounded promising.
The Fix:
We needed to prepare the machines for the update:
Install this rollup first.
2919355 - [Windows 8.1 Update 1] Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2 update rollup: April 2014 (https://support.microsoft.com/kb/2919355)
(If you get a "not applicable" error installing 2919355, install https://support.microsoft.com/en-us/kb/2919442 )
Install this rollup second.
3000850 November 2014 update rollup for Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2 (https://support.microsoft.com/kb/3000850)
Finally, install the DRS issue hotfix.
3020773 Time-out failures after initial deployment of Device Registration service in Windows Server 2012 R2
https://support.microsoft.com/kb/3020773/EN-US
After installing the updates on the AD FS and WAP servers and rebooting all the machines, the issue was resolved.