Windows 7 / Getting Started

Disk Performance Counters

As described earlier, disk performance tends to be all about IOPS. Therefore, it is not surprising that the most interesting of disk counters are involved with communicating the latency of I/O. After all, regardless of how fast a single I/O operation can be completed, if that operation has to wait a significant amount of time prior to its execution, that can be a problem. I/O operations go into queues, which are more or less first-in, first-out (FIFO). However, Windows will perform some optimization when possible, grouping together I/O that is ''close'' in disk terms. On a transaction log, this optimization can be significant. On a mailbox database, it probably will not be.

We'll discuss optimum values for specific I/O types on each Exchange server role shortly. First, let's tackle the performance counters themselves. As I/O can be fairly complicated, there are a large number of performance counters that may come into play. We consider these in a tiered fashion. That is, there is a group of counters we consider most important. Based on results from those counters, we may need to investigate other counters.

Tier 1 Disk Performance Counters

Perhaps the most critical of all disk performance counters is LogicalDisk\% Free Space. We say that because, if a disk is full, you are probably in an emergency situation trying to resolve that situation. Also, urban legend has long held that, when the NTFS file system (which must be used on all Exchange disks) falls below 10 percent free space, the disk is in danger of crashing. Although that is not true in modern operating systems, it is certainly true that this is probably not an optimum situation in many environments.

Today 1.5 TB disks are common and 2.0 TB disks only slightly less so. Larger disks are coming quickly. Ten percent of 1.5 TB is 150 GB, which is a large amount of space. While it is a best practice to keep LogicalDisk\% Free Space at 10 percent or higher, you should temper this with reason in your environment, depending on the size of your disks and arrays. This best practice was originally developed when the normal size of a disk was 9 GB.

A related counter is LogicalDisk\Free Megabytes, which may be a more relevant counter for some installations of Exchange. As described earlier, with large disks it may make sense to set a number of free megabytes of disk space at which the administrator should be alerted. However, unlike LogicalDisk\% Free Space, the LogicalDisk\Free Megabytes counter requires specific knowledge about a given environment to pick an appropriate value.

In a non-SAN environment, the next most important counter is PhysicalDisk\Avg. Disk Queue Length. As you know, I/O operations are processed in queues. If that queue grows too large, your I/O subsystem is not operating quickly enough to service the load. This counter is your number one indicator of that. On average, the PhysicalDisk\Avg. Disk Queue Length counter should not exceed the value of the number of disks in an array. That is, if an Exchange volume is one disk, then the counter shouldn't exceed one, on average. If an Exchange volume is two mirrored disks, the counter shouldn't exceed two, on average, and so on. In a SAN environment, the results obtained from this counter are almost meaningless, and the counter should be ignored.

Two important counters that are related to PhysicalDisk\Avg. Disk Queue Length are PhysicalDisk\Avg. Disk Sec/Write and PhysicalDisk\Avg. Disk Sec/Read. These counters define the average amount of time that it takes for a write I/O and a read I/O to complete, respectively. Long-term trending on these counters can go a long way toward showing you how your I/O subsystem is holding up over time. These counters are also absolutely valid in a SAN environment.

Don't let yourself be fooled. A 500 GB Ultra-320 drive is not necessarily all it's cracked up to be. Just because under some situations it can transfer 320 MB of data per second doesn't mean that it will for your Exchange database! Under some situations, a 9 GB SCSI-1 drive will outperform it. Be more concerned with the ''average ms per transfer'' - this is far more indicative of how a disk will perform with Exchange than what its maximum transfer rate is.

Those two counters define overall input-output latency for a given disk. However, especially if a disk is shared either for multiple applications (not a good idea with Exchange Server) or for multiple roles within Exchange Server, knowing the average latency for the Exchange databases and the Exchange log files is also important. Those counters have longer, but obvious names: MSExchange Database\I/O Database Reads (Attached) Average Latency, MSExchange Database\I/O Database Writes (Attached) Average Latency, and MSExchange Database\I/O Log Writes (Attached) Average Latency. The attached databases are currently active databases. DAG copies have Recovery in parentheses next to them instead of Attached. The MSExchange Database counters should have values that are the same, or lower, than the overall PhysicalDisk counters. If they do not, you may have other applications whose I/O load is causing unacceptable I/O degradation on your Exchange server disk volumes.

In a major break with prior recommendations, with Exchange Server 2010 Microsoft now recommends, or allows, for the transaction logs and database to be contained on a single volume (when you have two or more copies in a DAG configuration). Many people are wary about this concept. However, there is no question that Microsoft has proven that it works in large environments. Whenever that configuration is used, the MSExchange Database counters we just discussed are arguably more important than the LogicalDisk or PhysicalDisk counters, as they express the specific overhead that is being experienced by Exchange, as opposed to the overall experience for the entire volume.

Tier 2 Disk Performance Counters

The ''Tier 2'' performance counters are those which, if the Tier 1 counters indicate a problem, can assist in further narrowing down problems. They primarily assist in differentiating between types of problems rather than identifying new problems.

Exchange mailbox databases are fairly even in terms of the number of reads versus writes that they execute. To minimize both, Exchange mailbox databases implement caches, which store pages of a database in memory. Accessing memory is much faster than accessing disk. Therefore, the larger that the database cache is, the fewer I/O operations that need to occur (at least theoretically). Output operations in Exchange are flushed to disk by a task known as the "Lazy Writer" that processes the cache on a regular basis to aggregate and write the output to the database disk. However, transaction log entries are flushed to the disk prior to an entry being committed to the cache. This is what provides recoverability in case of a system crash. It is also one of the major causes of the difference in I/O profiles between transaction logs and databases (the other is random versus sequential I/O).

In all versions of Exchange prior to Exchange Server 2010, Exchange mailbox databases were ''read-heavy'' (with Exchange Server 2007 much less read-heavy than Exchange Server 2003). That is, they executed far more read I/O operations than they did write I/O operations.
With Exchange Server 2010, the schema of the Exchange database, along with much of the program logic associated with doing I/O, was changed. The purpose of these changes was to execute fewer overall I/Os by using cache more effectively and by consolidating both read and write operations.
This served to provide an overall I/O reduction in Exchange Server 2010 of approximately 70 percent.

However, caching does have its own potential issues. In large memory systems, it may take an extended period of time to thaw the cache. During that period of time, server performance suffers. Also, if a cache is full, the need to empty a portion of the cache can cause a ''stall.'' A stall is a delay in an I/O operation. During the process of a cache thawing, the I/O subsystem can be severely stressed, especially if the cache is large. When planning an I/O subsystem, be aware of this potential stress, but your general design plan should be for the hot cache, not the frozen cache; otherwise you will far over-provision the I/O subsystem.

A cache that is ''frozen'' is completely empty. This happens when a cache is first created. The process of filling a cache with data is known as ''thawing.'' A cache that is optimally full is a ''hot'' cache. Some caches have prefill algorithms that load them before the data is actually used. This process is known as ''seeding'' the cache.

Now that you know everything about the cache, a key performance counter relating to the cache is MSExchange Database\Database Page Fault Stalls/Sec. A page fault stall occurs when something needs to be put into the cache, but the cache is full. On a production Exchange server, except during online maintenance, this value should always be zero. If it isn't, then either the cache is too small (indicating a need for memory on a server) or the I/O write performance of the database volume cannot keep up with the needs of the Exchange database (indicating a need for more spindles or faster spindles in the database volume).

A similar counter, except that it applies to the transaction log files instead of the database files, is MSExchange Database\Log Record Stalls/Sec. This performance counter should also average near zero. If the value of the counter averages one or higher, flushing the transaction log buffer may be a bottleneck for this Exchange server. This can occur when the I/O write performance of the log volume cannot keep up with the needs of the Exchange ''Lazy Writer.'' Similarly to the MSExchange Database\Database Page Fault Stalls/Sec counter, this indicates a need for more spindles or faster spindles in the log volume.

Another counter that helps monitor the performance of the log volumes is MSExchange Database\Log Threads Waiting. This counter indicates the number of update threads that are waiting to write their information to the log. Generally, this is the in-memory log. If there are so many updates that the in-memory log is stalling output to the disk log, there is a performance issue. Again, the issue would typically revolve around the disk subsystem. While it is normal for this counter to be in the single-digit range, if it begins to average over 10, you need to investigate why log files cannot be written quickly enough.

The Paging File\% Usage counter is an in-the-middle performance counter. It has attributes of both memory and of disk. Our primary interest in the counter is how full the paging file is. On average, the Paging File\% Usage counter should stay below 50 percent. If it does not, you may have to either increase the size of your paging file or add more memory. If you have sized your paging file according to the recommendations for Exchange servers discussed previously, paging should be low (note that Exchange itself should not page at all, but related applications and third-party applications may page - this includes such items as content indexing and management agents). Otherwise, just keep this counter in mind as indicating that your server is experiencing memory pressure and may not be able to handle much additional workload before the server requires an upgrade.

If you are experiencing high I/O volumes on a server and it is unclear what program is causing the I/O, it is time to bring the Process performance object under examination. Each running process is tracked within this performance object, and it contains pretty much anything that you may ever want to know about a specific process. The counters that are of high interest in an I/O situation are Process\IO Read Operations/Sec and Process\IO Write Operations/ Sec. On an Exchange server, the most common processes that exhibit high values of the I/O operation counters are store.exe and System.

The final eight Tier 2 counters are a family of counters that provide specific measurements of the total amount of I/O occurring to a physical disk. We left these for last because they tend to be more important from a trending perspective (that is, how the utilization of this server is changing over time) as opposed to something that provides immediately worthwhile information. However, they are also important for determining whether the I/O subsystem on a server is ''fast enough.'' The counters are shown in Table-3.

Table-3: PhysicalDisk Counters for I/O Size and Speed
Description 		   Read Counter 	  Write Counter
Average I/O Request Size   Avg. Disk Bytes/Read   Avg. Disk 
                                                  Bytes/Write
Average I/O Time 	   Avg. Disk Sec/Read 	  Avg. Disk Sec/Write
I/O Speed 		   Disk Read Bytes/Sec 	  Disk Write
                                                  Bytes/Sec
I/O Completion Speed 	   Disk Reads/Sec 	  Disk Writes/Sec
[Previous] [Contents] [Next]