Thinking about Server I/O and Memory
02-May-2000  S. Kittelson (spencertk@abasys.com)

Abstract

As a followup to the request on LTSP for recommendations on server configurations this query was made to clarify the issues of I/O and memory.  What follows is a short course on thinking about I/O issues and a few comments on memory (both of which have been edited a bit for clarity).

The question:
-------------------------------------------

"J. Hartzelbuck" wrote:
...
>"Spencer T. Kittelson" wrote:
> You also won't know very much about your I/O distribution
> unless you are either very experienced or do some testing.
> More on I/O later.

 Can you say another sentence about I/O distribution? I have a
 sense of what you mean, but I'd like to be clear.

-------------------------------------------
 
 

I/O (Input/Output) Performance Issues
(a few of them)

I/O covers a LOT of interacting factors:
By I/O distribution I mean several things at once with regard to the disk subsystem (it's a technical subject unto itself):

1) The average frequency of I/O requests as I/O's per second.
2) The peak queue depth as a maximum number of I/O's waiting to be completed.
3) The size of the I/O requests in K bytes (4-8-16-32-64 are typical with 4-8-16 being most common for transaction processing and 32-64 more common for stream applications.  The I/O request block size can go even higher for some apps.  (This factor can often be ignored to simplify things.)
4) The spread of 1/2/3 above across the various disk devices in a given server or disk or I/O subsystem.  (Or the concentration of 1/2/3 above on a single device.)
There are more factors but these are the most important and unless you are doing something really esoteric you can measure 1/2/3 above for either a single application or an entire server with monitoring tools.  If you discover that you are saturating a disk device too often (once in a while is ok) you should consider adding disk devices and spreading the I/O workload around to different disks or changing to RAID 0 (striping) to increase the available I/O capacity of the disk subsystem.

Fast drives are great but can only do so much:
Today's high performance 10K Rpm disk drives today can process anywhere from 40-150 I/O requests per second and the 15K Rpm drives can reach over 200 I/O's/sec.  Where the I/O's occur on the disk platters totally dominates the peak performance (if the I/O's are scattered all over the surface you get the lower performance and if the I/O's are very near each other you get the higher rate).

There are two kinds of disk device saturation:
There are two kinds of saturation: queue request saturation and streaming saturation.  Think of the transaction word in 3) above and consider the following:
 

1) A database with customers, receivables, inventory and order processing.

2) All of the above on the same disk drive.

3) You have LOTS of RDBMS updates for orders which:
     a) Look up the customer (SELECT)
     b) Look up the inventory (SELECT)
     c) Create new order detail (INSERT)
     d) Update the inventory (UPDATE)
     e) Update the order header (UPDATE)
     f) Update the customer A/R (UPDATE) , etc., etc., etc.
 

Queue saturation:
You can see that a heavily loaded transaction processing application can run the single drive into the ground with just a few orders being actively processed simultaneously.  The result is queue request saturation, i.e. the drive is physically incapable of servicing any more requests because of limitations of head seek speed and rotational latency.  That's why the peak queue depth is important and if the depth stays very high you have apps constantly waiting for prior I/O's to get done so that the one being waited on can be serviced.  If this is happening on a web server you are going to lose people to boredom and THAT is the cardinal sin of web serving.

Fortunately, most RDBMS's make heavy use of caching and deferred I/O so that optimum I/O operations can be assembled and then performed aperiodically as required or periodically on schedule.  (DBMS tuning is another separate subject and very nearly a black art.)

Bandwidth saturation:
On the other hand, the stream application is often encumbered by being mixed with other I/O on the same drive.  If you are trying to do video streaming you will need to pump a steady stream of data at anywhere from 3-12 MB/second and if you are feeding a live application that is poorly buffered you simply cannot afford to have that stream interrupted by other requests. If you get two streams running on the same drive and the data being requested is physically far apart you will get lots of head thrashing and the overall I/O queue service rate will go way, way, way down.  Even if the two requests are simply out of time sync with each other for the same large file you can get bandwidth saturation. Just put an MPEG server in place and watch the I/O bandwidth get gobbled up with large block I/O.  Then try to multi-stream and listen to the drive as it seeks like crazy or watch the jerky I/O response on separate applications.

Watch, listen (and be wise):
You can actually watch and listen and tell how your disk subsystem is doing to get a feel for how often the subsystem is saturated.  If a drive is quiet but the indicator is locked on it is probably streaming.  If it is blinking often or mostly locked on but not making much seek noise it is very nearly saturated.  If it is making a lot of seeking noise it is losing performance by being made to jump all over the place to service I/O requests.  If the indicator blinks periodically the drive is often idle.  It really is that simple and the old MIS managers and operators (from the glass house days) could often tell what apps were being run by watching the disk activity indicators.  Now we get the same and better info with software tools and GUI displays.  Still, just sitting next to a server or disk array tells me a lot about the system.

Big drives are not necessarily better!:
Note: A perverse downside of using today's very high capacity drives (30+GB/drive) is that it is very tempting to buy just one or two drives and figure you've got enough "disk".  You may have enough storage capacity but you almost certainly become either queue or bandwidth constrained if you load them heavily.

A simple test:
A great way to prove the point about large seeks is to build three very large files that collectively consume 70% or more of the space on a drive, then run any apps that stream files 1 and 3 (file 2 is a separator).  Use "time cat file1 > /dev/null" to stream it and note the times.  Then do the same on file3 and then do them at the same time.  Listen to the drive and watch the difference in the activity indicator and the real execution times.  Try the test with two separate logins on
the same file 1 but separated by several seconds.  This works great on a new drive but may give weird results on an old drive that suffers from fragmentation.  Be sure to consider disk de-frag software for drives with lots of file size changes or create/deletes.

The Main Issues

The average frequency of I/O requests cannot exceed the ability of the drive to service them.  If you get lots of apps that are hammering the drive OR if you get a few apps that force the drive to do LARGE seeks you will saturate the drive.  If you do streaming be prepared to suffer if you multi-stream high bandwidth apps (such as video).

The peak I/O queue depth should be bursty (not steady).  If you get bursts of I/O requests you can service them and recover.  If a large queue depth remains in place you will have apps waiting, waiting, waiting.

The size of I/O requests may matter, particularly if you are doing streaming (large blocks).  For small block I/O the device is often constrained by rotational latency.

Solutions to common problems:
If you get I/O saturated on a device there are really only a few  things you can do:

1) Increase the server main memory capacity and then increase the amount of memory dedicated to disk buffering.

2) Add more disk drives and then place application specific files on different devices.  This is important for RDBMS apps and heavily loaded dynamic web servers that generate web pages for serving.

3) Change to a RAID 0 (striping) configuration so that I/O requests are scattered across mutiple devices.  This is a common performance solution for streaming apps.

4) Use a solid state disk device which is really a large memory array with a SCSI interface (expensive).

5) Use a mixture of techniques on a system.  Have lots of smaller drives for spreading small block I/O around and use RAID 0 with large block I/O on higher capacity drives.

6) Beware of the "capacity economy" of RAID 5.  If you have heavy I/O loading it may not perform well with RAID 5 since every logical WRITE I/O operation is turned into multiple physical I/O's on the actual devices.  RAID 5 benefits from heavy buffering and is ok for mostly read applications such as archives.  It is also ok for medium performance streaming but be SURE to use a high performance controller.  RDBMS will work well if you have enough drives (4-7).

Warning: when a RAID 5 drive fails, the I/O performance of the subsystem goes in the tank, especially with a low performance controller.  While a replacement drive is being rebuilt the subsystem suffers even more and the rebuild can last for hours (or even days).  Don't get me wrong, I like RAID 5 but it is not a panacea which let's you get redundancy without penalty.  For performance I use RAID 0 (striping) and for performance with safety I use RAID 1 (mirroring).  Combine the two and you get the so-called RAID 10 (striped mirrors) which is the most expensive RAID you can buy (in terms of number of disk drives).
7) If a drive is fragmented the amount of excess seeking that occurs may be quite large.  Defragment the drive to get the data/files clustered back together again.  This is very important for both RDBMS and streaming apps but not much of an issue for archives of lots of smaller files.
Monitor your apps:
If you take your existing apps and monitor what they are doing already you will get a great idea of the future I/O load they will put on a server.  Be sure to use realistic file sizes and number of users.  It's not rocket science and by actually watching what the drive is doing with a given app you see cause and effect.  Take into account the behavioral characteristics and limitations of today's disk devices and understand what makes them perform both very well and very badly and optimize for the former and avoid the latter like the plague.
 
 

Some notes on memory for the server

The question:
-------------------------------------------

>"Spencer T. Kittelson" wrote:
> ECC memory can be had with careful shopping for
> about a buck/MB (2$/MB for premium stuff).

 What are the differences between the regular and premium stuff?

 I take it that you feel the regular stuff is adequate?

-------------------------------------------

It's true - Not all memory is created equal (but put the burden on your vendor):
The memory you buy can either come from a major brand name such as Kingston or from 2nd tier brokers or even from retail vendors.  If you buy the motherboard with cpu and memory as a unit from a vendor you put the burden on the vendor to make sure the stuff works with the chosen motherboard (mobo in geek speak).  As long as they warranty the memory forever (or at least 5 years) you should be ok as long as the vendor has been in biz for a few years and has a good track record.  If you buy it yourself, I'd go with the major brands.  The main difference is in the quality of design and fabrication of the mini-circuit board (4/5/6 layers and careful trace layout) and the speed grade of the chips.  A six layer board will probably have superior timing accuracy due to more carfully controlled impedance and trace length on the signal lines.

Memory speed and other esoteric parameters are important:
For best performance get CAS2 memory and be SURE it is x72 (72 bits wide for ECC operation).  The ECC capability can save the day in later years when the memory starts to become flaky (an eventuality given enough time).  Even solar disturbances can create transients (from fast particles) that can upset a memory cell and force the memory controller to do an ECC correction.  You _really_ don't want the server to crash or whack out from a lousy alpha particle.

But memory speed is not that important:
PC100 vs. PC133 is not as important as it may appear since access time tends to dominate most appliations.  However, a streaming server that uses lots of buffers can benefit some from PC133 (and the forthcoming DDR SDRAM).  If you can get BOTH CAS2 and PC133 guaranteed for a good price go ahead and buy it.  Use PC133 if the chipset can properly support it.  I'll take CAS2 PC100 over CAS3 PC133 any day unless I'm building a video server and then I can't buy fast enough memory at any price.

Stress test your memory when it and the motherboard are new:
When you get the mobo installed, be sure to stress test it and try to overclock the memory subsystem by at least 5% or so.  If you cannot get PC100 to go past 105MHz then there is something wrong either with the chipset or the memory.  If you have CAS2 memory that has to go to CAS3 at 105MHz to remain stable it's probably already too close in its timing margins to be stable five years hence (after it has aged).  Have the vendor crank it up and try it out (or do it yourself).  After testing, put it back to PC100 and forget about it.  If the system gets flaky, turn it back up and see if the memory has become unstable at the higher speed.  (This kind of diagnostic is painfully time consuming but by having a benchmark to test against you can often tell what part of the system has degraded.)

Forget RAMBUS:
Don't get fooled by the RAMBUS memory hype.  The current implementation of a great idea is so lousy as to have actually created a poorer performing memory, especially in high memory capacity servers.  The stuff is ok for a few apps but will soon be completely eclipsed by DDR.

Watch out for "registered" memory:
One more thing about memory.  Some mobo's and their chipsets don't like "registered" memory.  The AMD Athlon in particular prefers "unregistered" in most cases.  Adding a register to the memory just slows things down anyway so really try to avoid the stuff.