How to Troubleshoot High Disk IO Problems

Overview:

Disk I/O describes how long the read and write operations require on a hard disk. The speed at which your server can read and write information to disk directly affects your server’s performance and the performance of cPanel & WHM. Your server’s load will increase if the system experiences high disk I/O wait time.

Symptoms of high disk I/O:

High server load = > The average system load exceeds 1.

chkservd notifications => You receive notifications about an offline service or that the system cannot restart a service.

Slow hosted websites => Hosted websites may require more than a minute to load.

Slow delivery of email => The Exim service performs slowly or does not respond. Exim contains a large outbound mail queue.

Slow connection for email => The POP or IMAP services perform slowly or do not respond.

Slow Webmail interfaces => The Webmail interfaces perform slowly or do not respond (for example, Roundcube, Horde, or SquirrelMail).

Slow WHM or cPanel interfaces => The WHM or cPanel interfaces perform slowly when you add email accounts, databases, or other items.

How to determine the disk I/O wait on your server:

Use the top command to find the average wait time on your server

# top

The %wa statistic at the top of the output indicates your server’s average disk wait.

If the I/O wait percentage is greater than one divided by the number of your CPU cores, the CPU cores must wait to process data on hard disk. For example, if the system possesses four CPU cores and the server %wa statistic is 8.0, then the actual %wa is 2.0. Because the actual %wa is larger than 1.0, the CPU cores must wait before they can process data on hard disk.

Use the sar command to determine the history of your server’s disk I/O wait:

The sar command provides you with the history of the server’s load averages. Use this command to determine the times when your server experiences high disk I/O.

root@server [~]# sar
Linux 2.6.32-431.29.2.el6.i686 (server.example.com) 10/17/2014 _i686_ (2 CPU)

12:00:01 AM CPU %user %nice %system %iowait %steal %idle
12:10:01 AM all 0.84 1.19 0.45 0.30 0.00 97.22
12:20:01 AM all 0.65 1.06 0.41 0.31 0.00 97.58
12:30:01 AM all 6.67 1.47 1.60 6.25 0.00 84.02
12:40:01 AM all 0.63 1.08 0.40 0.33 0.00 97.56
12:50:01 AM all 0.74 1.94 0.72 1.50 0.00 95.11
01:00:01 AM all 0.58 1.51 0.41 0.24 0.00 97.25
01:10:01 AM all 0.71 1.06 0.48 0.58 0.00 97.17
01:20:01 AM all 0.54 1.06 0.37 0.22 0.00 97.81
01:30:01 AM all 0.63 1.30 0.41 0.28 0.00 97.37
01:40:01 AM all 0.58 1.06 0.39 0.21 0.00 97.76
01:50:01 AM all 0.60 1.06 0.40 0.23 0.00 97.70
02:00:01 AM all 0.54 1.28 0.39 0.23 0.00 97.55
02:10:01 AM all 0.71 1.18 0.43 0.40 0.00 97.27
02:20:01 AM all 0.78 1.08 0.49 0.46 0.00 97.19
02:30:01 AM all 0.58 1.28 0.49 0.23 0.00 97.43
02:40:01 AM all 0.64 1.06 0.54 0.31 0.00 97.45
02:50:02 AM all 0.68 1.07 0.57 0.27 0.00 97.42
03:00:01 AM all 0.66 1.52 0.55 0.26 0.00 97.00
03:10:01 AM all 0.74 1.08 0.60 0.28 0.00 97.30
03:20:01 AM all 0.67 1.06 0.53 0.31 0.00 97.43
03:30:01 AM all 0.65 1.28 0.57 0.36 0.00 97.14
03:40:01 AM all 0.61 1.12 0.64 0.70 0.00 96.93
03:50:01 AM all 0.67 1.06 0.52 0.30 0.00 97.45
04:00:01 AM all 0.63 1.31 0.51 0.29 0.00 97.26
04:10:01 AM all 0.68 1.06 0.52 0.23 0.00 97.51
04:20:01 AM all 0.70 1.20 0.58 0.28 0.00 97.25
04:30:01 AM all 0.65 1.30 0.52 0.30 0.00 97.23
04:40:01 AM all 0.74 1.06 0.54 0.33 0.00 97.32
04:50:01 AM all 0.56 1.08 0.43 0.28 0.00 97.64
05:00:01 AM all 0.59 1.52 0.47 0.29 0.00 97.13
05:10:01 AM all 0.70 1.06 0.47 0.30 0.00 97.46
05:20:01 AM all 0.62 1.07 0.44 0.30 0.00 97.57
05:30:01 AM all 0.55 1.29 0.40 0.20 0.00 97.57
05:40:01 AM all 0.56 1.09 0.39 0.25 0.00 97.71
05:50:01 AM all 0.65 1.07 0.41 0.32 0.00 97.55
06:00:01 AM all 0.74 1.29 0.43 0.33 0.00 97.21
06:10:01 AM all 0.65 1.06 0.41 0.31 0.00 97.56
06:20:01 AM all 0.72 1.19 0.43 0.28 0.00 97.38
06:30:01 AM all 0.56 1.31 0.40 0.26 0.00 97.47
06:40:01 AM all 0.61 1.06 0.40 0.29 0.00 97.63
06:50:01 AM all 0.71 1.06 0.42 0.30 0.00 97.51
07:00:01 AM all 0.52 1.51 0.39 0.22 0.00 97.35
07:10:01 AM all 0.74 1.06 0.46 0.30 0.00 97.44
07:20:01 AM all 0.63 1.23 0.52 0.49 0.00 97.12
07:30:01 AM all 0.58 1.30 0.40 0.27 0.00 97.45
07:40:01 AM all 0.56 1.06 0.39 0.19 0.00 97.80
07:50:01 AM all 0.62 1.06 0.42 0.30 0.00 97.61
08:00:01 AM all 0.67 1.28 0.41 0.30 0.00 97.33
08:10:01 AM all 0.63 1.06 0.42 0.23 0.00 97.66
08:20:01 AM all 0.56 1.20 0.39 0.26 0.00 97.58
08:30:01 AM all 0.59 1.29 0.40 0.27 0.00 97.45
08:40:01 AM all 0.59 1.06 0.38 0.26 0.00 97.71
08:50:01 AM all 0.54 1.06 0.37 0.28 0.00 97.74
09:00:01 AM all 0.60 1.52 0.41 0.23 0.00 97.25
09:10:01 AM all 0.68 1.08 0.42 0.22 0.00 97.61
09:20:01 AM all 0.51 1.06 0.37 0.23 0.00 97.83
09:30:01 AM all 0.65 1.28 0.53 0.51 0.00 97.02
09:40:01 AM all 0.61 1.06 0.39 0.37 0.00 97.56
09:50:01 AM all 0.69 1.05 0.41 0.29 0.00 97.56
10:00:01 AM all 0.61 1.31 0.40 0.27 0.00 97.41
10:10:01 AM all 0.65 1.18 0.42 0.27 0.00 97.47
10:20:01 AM all 0.60 1.06 0.40 0.25 0.00 97.69
10:30:01 AM all 0.52 1.29 0.38 0.20 0.00 97.61
10:40:01 AM all 0.62 1.06 0.40 0.27 0.00 97.65
10:50:01 AM all 0.56 1.08 0.38 0.26 0.00 97.72
11:00:01 AM all 0.61 1.50 0.41 0.28 0.00 97.20
11:10:01 AM all 0.63 1.06 0.39 0.29 0.00 97.62
11:20:01 AM all 0.61 1.06 0.39 0.29 0.00 97.64
11:30:01 AM all 0.55 1.28 0.37 0.29 0.00 97.51
11:40:01 AM all 0.60 1.08 0.40 0.29 0.00 97.64
11:50:01 AM all 0.58 1.06 0.37 0.25 0.00 97.74
12:00:01 PM all 0.55 1.28 0.38 0.22 0.00 97.57
12:10:01 PM all 0.80 1.19 0.45 0.33 0.00 97.24
12:20:01 PM all 0.68 1.06 0.40 0.29 0.00 97.56
12:30:01 PM all 0.52 1.30 0.38 0.29 0.00 97.51
12:40:01 PM all 0.65 1.06 0.39 0.33 0.00 97.57
12:50:01 PM all 0.62 1.06 0.41 0.32 0.00 97.59
01:00:01 PM all 0.55 1.51 0.40 0.26 0.00 97.28
01:10:01 PM all 0.75 1.06 0.45 0.33 0.00 97.42
01:20:01 PM all 0.52 1.08 0.39 0.25 0.00 97.77
01:30:01 PM all 0.60 1.28 0.40 0.26 0.00 97.46
Average: all 0.71 1.19 0.45 0.38 0.00 97.27

How to resolve a problem with high disk I/O:

Hard disk specifications with low RPM speed or slow interface technology = > Upgrade the hard disk on your server or split the application load between separate hard disks.

No bandwidth available on the hard disk => Upgrade the hard disk on your server or split the application load between separate hard disks.

Write caching is disabled => Enable write caching on the disk For more details: http://www.linuxjournal.com/content/advanced-hard-drive-caching-techniques

Degraded RAID array => Check the Raid array for a hardware malfunction. You should test and verify the hardware.

Software RAID array on the server reports busy; CPU uses slow parity calculation => Check the Raid array for a hardware malfunction. You should test and verify the hardware.

Software processes slowly => Upgrade the hard disk on your server or split the application load between separate hard disks.