Troubleshooting Mailbox Servers
With the shift of mailbox access over to the Client Access server in Exchange Server 2010 and the recent demise of the Hub Transport server role, the Mailbox server's role in Exchange Server 2013 essentially encompasses data storage, mail transport, and Unified Messaging functionalities. The primary focus of troubleshooting Mailbox servers rests on three things: database replication health, server performance, and email delivery. These aren't the only things Mailbox servers do, of course, but they're probably the most common troubleshooting topics. But before we get into those, let's recap some of the standard troubleshooting techniques you should apply to a Mailbox server.
General Mailbox Server Health
Although a Mailbox server is essentially useless without Client Access servers (as a good friend recently said, what good is a database if you can't access it?) to provide access and deliver mail, it's still the most important role in an Exchange Server environment. This is, of course, because the data is stored on the server-in the databases on the associated storage, to be precise. So when dealing with Mailbox server issues, you'll want to perform these basic checks:
- Are all required Exchange Server services able to start as necessary?
- Do you see any errors in the event log relating to MSExchangeDatabase, MSExchangeDatabase → Instances, or MSExchangeSubmission Mailbox?
- Are there any Active Directory issues that might have a negative impact on Exchange Server?
Obviously, the Test-SystemHealth and Test-ServiceHealth cmdlets would be useful in detecting basic problems, like a dismounted database or a stopped service. They should always be the first two cmdlets you execute when troubleshooting a Mailbox server, simply because they group together so many common checks.
Like its close cousin, Test-OutlookConnectivity (which is no longer available), Test-MapiConnectivity will help you determine problems accessing a specific mailbox. It logs into a target mailbox (which you can specify with the -Identity parameter), the system mailbox in a specific database (which you can specify with -Database), or the system mailbox in every active database on a server (through -Server). The output for all three variants looks like the following:
Test-MAPIConnectivity -Server Server1 MailboxServer Database Result Error ------------- -------- ------ ----- Server1 MailboxDatabase... Success Server1 MailBoxDatabase... Success Test-MAPIConnectivity GTaylor MailboxServer Database Result Error ------------- -------- ------ ----- Server1 MailBoxDatabase... Success Test-MAPIConnectivity -Database MailboxDatabase-001 MailboxServer Database Result Error ------------- -------- ------ ----- Server1 MailBoxDatabase... Success
This is a useful (and quick) cmdlet for narrowing the possible scope of a problem; Test-MapiConnectivity essentially tests not only the Exchange Server information store but also ADAccess and RPCoverHTTP access, so a successful test against any mailbox on a server proves that those three components are at least functioning. If you can log into the system mailbox for a database but not into a user mailbox in that same database, the problem is clearly something unique to that user.
What is also very interesting about this cmdlet is that though it tests access to the mailbox database and essentially is a Mailbox server testing tool, it does not indirectly test the availability of Client Access servers. Client Access servers are the entry point of client requests to mailboxes, so if your Test-MapiConnectivity cmdlet retrieves a successful connection, that successful connection will be to the entry point on the Mailbox servers, not necessarily the one that users/Outlook/OWA and others will take to access the mailbox. I actually find that to be a positive aspect of this cmdlet, since it allows you to segment your troubleshooting results to pinpoint the source of a problem.
Checking Poison Mailboxes
One new feature that might lead to confusion for users (and more than a few administrators!) is poison mailbox detection. By default, Mailbox servers will tag any mailbox that causes a thread in the store.exe service to crash or that is connected to five or more "hung" threads. If a mailbox is tagged three times in two hours, Exchange Server 2013 will block access to that mailbox for up to six hours or until the administrator unblocks it, whichever comes first. If a user reports that she cannot connect to a mailbox, but other users have no difficulty, check to see if there are any quarantined mailboxes on the server. You can do this either through Performance Monitor (through the MSExchangeIS Mailbox\Quarantined Mailbox Count performance counter) or through the Get-MailboxStatistics cmdlet. For example, to find out if mailbox GillianK is quarantined, simply use this command:
Get-MailboxStatistics GillianK | Format-List DisplayName, IsQuarantined
Exchange Server 2013 will also write an event to the Application log when it quarantines a mailbox.
Do not confuse this feature with a poison message queue that is also stored on the Exchange Server Mailbox server. This queue contains messages that Exchange Server deems harmful to the environment while they are being transported in and out of an Exchange Server organization. These messages are not lost, since they will continue to exist in the poison message queue until an administrator deletes them manually.
Checking Database Replication Health
The introduction of continuous replication in Exchange Server 2007 dramatically changed the face of disaster recovery, because administrators could deploy two separate copies of a single database, each on a physically separate server. There were a few limitations, of course; end users still connected to the server, not just the database, so problems with the underlying cluster would render both database copies inaccessible. Standby continuous replication (introduced in Exchange Server 2007 Service Pack 1) provided another disaster-recovery option, but this had its limits as well-it was purely manual and, depending on the configuration, would require at least a setup "trick" (setup /recovercms) or even wholesale "rehoming" of users. A successful activation of a standby copy was also heavily dependent on replication of both DNS and Active Directory information, so users might still be unable to connect even after the issue was resolved.
Database availability groups (DAGs) in Exchange Server 2013 provide multiple copies of a single database on different servers, even in different datacenters, so a single server failure should have a significantly smaller impact on an Exchange Server deployment. Other architectural changes-namely Client Access namespaces-affectively hide the server object from the end user, so the actual location of the active database is immaterial from the end user's perspective.
Database replication health is, loosely speaking, how successful Exchange Server is at keeping database copies in sync. This depends on server configuration, network health, and a few other things (most of which Exchange Server checks automatically as part of the Test-SystemHealth and Test-ServiceHealth cmdlets). However, you can check the health of the replication infrastructure quite easily with two cmdlets. The first cmdlet, Test-ReplicationHealth, checks the health of the replication services and alerts you to any errors it finds. The output is extremely easy to read, as shown here:
Test-ReplicationHealth Server Check Result Error ------ ----- ------ ----- EX1 ReplayService Passed EX1 ActiveManager Passed EX1 TasksRpcListener Passed EX1 DatabaseRedundancyCheck Passed EX1 DatabaseAvailabilityCheck Passed
Once you've validated the replication services, you can check the replication status for the databases themselves with Get-MailboxDatabaseCopyStatus. You can focus on a particular database by using the -Identity parameter or check the status for all mailbox database copies on a specific server by using -MailboxServer. You could even check the status of one specific database on one specific server by including both parameters. Here is an example of using the Get-MailboxDatabaseCopyStatus cmdlet where the results are filtered to show only a subset of the data being reported:
Get-MailboxDatabaseCopyStatus | Format-List Name,Status,LastInspectedLogTime,ContentIndexState Name Status LastInspectedLogTime ContentIndex State ---- ------ -------------------- ----------- MDB001\EX1 Mounted 1/13/2013 9:44:03 AM Healthy MDB002\EX1 Mounted 1/15/2013 9:03:24 PM Healthy MDB003\EX1 Mounted 1/15/2013 9:12:55 PM Healthy
There are many possible causes for replication errors, including the following:
- Transient network-connectivity issues
- Permissions issues
- Insufficient disk space on the target server
The general troubleshooting steps we covered in the beginning of this tutorial will help you determine the exact cause of a replication problem.
With the reduction in functionality, Mailbox servers have become significantly easier to troubleshoot than in the past. There are a number of useful cmdlets for validating mailbox database availability and mailbox access, among them Test-SystemHealth, Get-MailboxStatistics, and Test-MapiConnectivity. Two additional cmdlets, Test-ReplicationHealth and Get-MailboxDatabaseCopyStatus, provide insight into the replication of those databases across member servers in the organization.