Exchange 2016 CU20 breaks Event logs

I don't post often, but when I have something that I need to get out there with regards to my subject matter, this is where I post it. In this instance it relates to Exchange 2016 CU20 and a little known bug.

I found this quite by accident, many months after successful installation of CU20. I encountered no issues at all as it went on, but when I needed to find something in my event logs, it transpired that they had stopped being populated from the point of CU20 installation.

The log I am referring to is known as a crimson log. When you open Event Viewer and go to Applications and Services Logs and expand that section - these are crimson logs. The one I refer to in this post is the MSExchange Management log.

This log was broken and not just on one server. When an analysis was done, it had broken on 10% of my servers (4 out of 40) - which meant that when MS Premier Support tried to replicate the issue, they were able to do so quite easily.

To cut to the chase: the log is not actually broken. Rather, entries are being diverted into the Application Log instead (where they are easily lost due to the noise). The reason for this is that during CU20, setup may incorrectly write a registry key which takes precedence as the registry is read from top to bottom and the new key appears before the crimson log entries. The key is:

HKLM:\SYSTEM\CurrentControlSet\Services\EventLog\Application\MSExchange CmdletLogs\

It should not be there. Because it is, entries from source MSExchange CmdletLogs are routed to the Application Log instead of the MSExchange Management crimson log, which basically goes dormant. Microsoft don't have this listed on their database of eff-ups, but that may be because (like me) nobody looks at this log until they actually need to.

The fix is to remove the invalid key and restart the server.

To identify the presence of the key on your Exchange servers, you can use the following EMS:

Get-ExchangeServer | %{$s = New-PSSession -ComputerName $_.name; Invoke-Command -Session $s{if(Get-ItemProperty 'HKLM:\SYSTEM\CurrentControlSet\Services\EventLog\Application\MSExchange CmdletLogs\' -ErrorAction SilentlyContinue){$env:computername}}; remove-pssession $s}

If it finds it, you'll get the server NetBIOS name listed.

The following PowerShell will remove the key on a local server:

Remove-item 'HKLM:\SYSTEM\CurrentControlSet\Services\EventLog\Application\MSExchange CmdletLogs\' -Force -Recurse

That could be compiled into a remote PowerShell command, but I like to mitigate risk of trashing everything by taking such actions on a local server, which needs to be paced into Exchange Management Mode and restarted anyway (the key being in CurrentControlSet means that it only takes effect when the Registry hive is read i.e. on startup).

After a restart, the original command should create an error which is logged to the Event log you just fixed:

Get-ItemProperty 'HKLM:\SYSTEM\CurrentControlSet\Services\EventLog\Application\MSExchange CmdletLogs\' 

I don't know if this affects any other CUs on other Exchange versions as I am only currently working with Exchange 2016. I have implemented two further CUs since then and not experienced the same issue.