Suggestions on Server Configuration
28-Apr-2000  S. Kittelson (spencertk@abasys.com)

Abstract

In  dialog on the LTSP a request was made for recommendations on server configurations.  Several very good suggestions regarding use of firewalls, multiple servers, incremental upgrades and high performance disk drives were posted along with this one (all of which have been lightly edited).  Other contributor postings are included here in-line as indented, italicized text with credit to their original authors.

Prior responses:
-------------------------------------------
        From: Brian Fahrlander

    I don't have time to do a full in-detail report, but strategically
you'll want a little 'throwaway' box, possibly an old 386/486 headless
machine to be your firewall.  Install the bare minimum, pull the
keyboard and screen, then LOCK the thing in a closet somewhere.

    This means there will be only 1 real user-login on the machine, and
you'll access it with ssh at all times. (As opposed to telnet).  They'd
have to break THAT in order to get any further.  It also simplifies the
security logistics (ipchains rules, etc).

    Remember that even 1M/sec 'full speed' DSL is only 1M/sec, and
almost every machine you have (except the one that dispenses coffee) can
handle that much traffic if you have nothing else it's required to do. :)

    I really like the way RedHat's 6.2 rolls the ipchains task into such
an easily-managged bundle. I suggest it for the firewall box, to
communicate with the hub and local machines on eth0 and the
cablemodem/ISDN modem/etc on eth1.
...

-------------------------------------------
        From: Glenn Jacobson
Rob -
...
I think that Brian Farlander gave you good advice on separating
the Firewall on a low powered Linux box separate from the LTSP
server.

The LTSP server itself should have a two cpu motherboard, but
to start out you can use one 500 Mhz + cpu with the fastest
scsi drives you can get 10 to 15k ms drives.  On an LTSP
server the drive speed means more than the cpu, as long as
it is adaquait.  Memory should start around 256 megs.

From there you have a number of options for mirroring the
drives, raid, hot swap and multiple hot swap boxes in case
of catasphosophy.  Most installations will only go to
mirroring or raid instead of hot swaps and dual systems.

You will also note that Applix will be a lot less resource
hungry, but more people are familiar with WordPerfect.

-------------------------------------------
        From: Stephen Moore
I would suggest perhaps multiple servers, a netscape server,
a internet server, a wp server.

This gives a bit of redundancy so if you lose one server you
still have some functionality, I also find that a couple of
smaller servers (2 ~3 pentium III 600s with 256 meg of ram)
are less expensive than 1 huge server (quad zeons 600 with
1 gig of ram).

I also see the possibility of incrementally upgrading of this
sort of machine (cluster) as much more practical than one
huge server. Then you have the further benefit that as you
upgrade a machine you get a free workstation.

-------------------------------------------
 
 

Server Configuration Issues
(a few of them)

Robert,

You will really be able to leverage your existing investment by making the LTSP move.  In fact, now or later this year you should be able to create brand new LTSP seats for <$500 each with a 17" monitor.  Now that is cost effective!

Of course, the configuration will depend on what you do with it and the overall workload.  You should be able to do 40 LTSP users on a single box if the processor is fast and you have enough memory.  I'll give a suggested configuration later.

Use a separate firewall:
Brian's comments about the firewall are dead on.  Keep it low powered, separate and minimal and just let it work.  There are lots of resources on firewalling and IP masquerading on the net.

Incremental development from a single server:
For the server I'll take a little different tack than either Stephen or Glen and suggest that you start with a single, hot rocket server and see how far it takes you.  You won't really know how much snort you need in the server box until you load it up with background processes and users.  You also won't know very much about your I/O distribution unless you are either very experienced or do some testing.  More on I/O later.

You can of course at any time break the servers out into multiples and run separate server processes on different boxes so Stephen's comments are very valid.  If you concentrate everything initially on a single box and then shake it down using your own application load you can find out what needs to be moved to a separate server.  If you get a performance drag do some monitoring of memory, cpu, disk (and optionally network) consumption and then move the most demanding functions to the separate box.

Use LOTS of main memory:
Be sure to bathe the server in memory.  For our business clients we consider 256MB of main memory to be a starting point and most servers have either double or triple that amount.  When tuning the OS we use LOTS of memory for I/O caches and we get a lot of bang for the extra memory buck (often eliminates the need for another server).  ECC memory can be had with careful shopping for about a buck/MB (2$/MB for premium stuff).

A manual server mirroring technique:
Stephen's points about incremental upgrades and redundancy are important though so consider them fully.  One trick you can use is to make two servers that are essentially mirrors of each other with the apps configured identically on each and a cron job that periodically copies entire directory trees from the active server to a mirror location on the spare.  This lets you split the workload and periodically copy the critical files and data to spare space on the other system.  If one fails, a quick reconfig lets you run all the workload on a single system (at reduced overall performance) and you get your data back to the last checkpoint.  You can also do this with mirrored drives (RAID 1) and simply break the mirror and use only the data components as mounted filesystems on the recovery system.  Takes careful planning and ongoing attention to detail but it is the lowest cost method I know of to have high availability and quick recovery (we do this for some clients). By splitting the workload you get the performance benefits of multi-server and by cross-copying the data you get pseudo-redundancy.  Stir in a little RAID 1 and some reconfig tricks and hardware failure recovery can be reduced to < 1/4 hour.  Use config scripts with easy to change symbols to make the switchover both parametric and managable.

Multi-CPU systems may not age gracefully:
I'm not a big fan of multi-CPU servers due to their propensity for aging less than gracefully (when they get older they can have internal stability problems as the timing margins on the silicon slip).  Things go just fine for the first three years or so and then weird things can start to occur.  Of course, the same could be said of single-CPU systems but in my experience the incident rate is about 10-1 in favor of singles.

Use very fast disk drives (and fast SCSI controllers):
What Glen said about the fastest drives is absolutely correct. The current crop of 10K rpm drives are very high performance and Seagate has a new 15K rpm unit out (or announced).  Be sure to run 'em on a high performance SCSI controller with LVD to get the best out of them.  Of course you can do RAID with them as desired/required but be aware that as soon as you go to RAID 5 (or derivatives) you will incur an I/O penalty.  RAID 1 is the cheapest starting solution and then as you add more and more drives RAID 5 may become more appropriate.  The read/write mix also makes a great deal of difference as to which type of RAID is best.

Use hot swap power supplies:
As for the server you should very seriously consider hot swap power supplies.  Over the five year life of a server the two most likely components to fail are the hard drives and the power supply.  Hot swap supplies add a few extra hundreds to the cost of a server but are worth every penny.  Since we started using them exclusively a few years ago we've never had a system down due to PS problems.  When one fails it shuts itself down and beeps until you remove it and put the replacement back in.  In the meantime the system just hums along.  PS's have a tendency to start to go bad in years four and up so if you expect to bail on your servers before then you can get by with a _good_ quality, high capacity PS.  Be sure to not scrimp on the PS wattage since you want it to run coooool so it will last.  Also be sure it has fully ball bearing fans in it (that goes for the case fans and CPU coolers too).

Keep the disk drives cool (very important):
The high end drives either need excellent in-case cooling air flow or separate cooling adapter trays.  The 50 bucks or so for a cooling tray can double the life of a drive so I think they are no-brainers.  Conversely, a hot drive often fails before the warranty period so you get a down or degraded system.  DON'T let 'em get hot!!!

A note about I/O:
In a former career I used to manage a DEC VAXCluster with several nodes, dozens and dozens of I/O devices and about a bazillion processes doing I/O over hundreds of files.  It was quite a challenge to locate I/O properly since memory was EXTREMELY expensive and the amount of file buffer memory we would have used would have been huge.  We did a lot of careful analysis of our I/O load to see where the bottlenecks were and placed files on various devices accordingly which kept our I/O performance as high as possible.  In your situation, if you have a single large drive (see configuration below) you may become constrained by the I/O limitations of the single/dual device.  You won't know until you either do a really good analysis and/or try it out.  Extra drives to solve I/O performance problems are cheap, cheap, cheap and a lot cheaper than building another server.  That is why I mention more memory for bigger disk cache buffers and then mention extra drives.  The type of I/O that is occuring matters too and lots of little DBMS i/o's that are 4-16KB in size is way different than someone doing video (or even audio) editing where the streams are MB-GB in size.  The next release of Linux (2.4) will have significant changes in that it will incorporate a unified block I/O caching scheme that should boost performance quite a bit (it's about time).  Also, until Linux gets a solid transaction based journaling filesystem it will generally be better to have smaller disk partitions < 10GB in order to make recovery and filesystem checking managable.  Both of these considerations (streaming editing and smaller FS sizes) mitigate towards more, and possibly smaller disk drives.

Requirements and price/performance ratio dictate the choices:
When deciding how to configure a system you need to take into account all of the above (and more) and also the current and near term state of the art in the industry.  It is also important to consider the price/performance ratio of the various components.  We always build to the nearest best price/bang ratio we can determine (unless we are optimizing for price or performance).

We do this sort of thing regularly for our clients so as for configuration consider the following (from Pricewatch as of 28-Apr-2000):

   InWin Q2000 Dual 300W PS Case (no hot swap?)  $   225
  (Note: unknown if PS's are Athlon compatible)
  AMD Athlon 700MHz on Asus K7M mobo            $   353
  (If interested, I'll do the PS checkout for you)
  512MB (2 DIMM's) 32x72 (ECC) 7.5ns, CAS2      $   480
  Adaptec 29160 Ultra 160 Kit (w/cables)        $   210
  Dual (RAID 1) IBM 36GB, 10K Rpm, 5.4ms, LVD   $ 1,358
  CD-Rom 12x, SCSI (narrow)                     $    80
  Quantum DLT 40GB                              $ 1,000
  10/100 NIC                                    $   100
  Video, keyboad, mouse, floppy                 $   100
  Freight for above                             $   200
                                               ---------
                                                $ 4,107

VAR's should be paid for their expertise and support:
If you buy such a configuration from a VAR, expect anywhere from 25%-50% in increased price for having someone build it, warranty it for 3 years and maintain it for you under warranty.  Also, expect that the VAR will insist on using an UPS, on having proper ventilation and on monthly cleaning of filters, etc. (that's what we insist on).

Doing it yourself is actually quite easy:
Total time to build the above from scratch:  2-6 hours depending on your level of experience.  A kid who does his own systems will salivate over the above (except the cheap video card) and do it for free.  Just be sure to use a good grounding strap when you put it together and to handle all the parts by their power connectors or a gound point BEFORE touching anything else.  (I can give you failsafe tips on good anti-static conduct if you need 'em).

Possible configuration changes:
Faster CPU or different CPU/Motherboard (won't help much until the next version of the Athlon's come out this summer with mobo's to support 'em - Let me know if you want a new list then...), faster drives (15K Rpm), PC133 memory and a tuned memory bus (tricky but squeezes out 3-5% performance boosts), bigger DLT (70+GB/tape), better video (8-16GB video memory), add sound, dual NIC's, serial cards for dumb printers and old terminals, phase of the moon detectors, telepathy converters, quantum state stabilizers, etc. (Hey, technology marches relentlessly on...)
 
 

Just plain opinions

Currently, the AMD Athlon makes for the most powerful single CPU motherboards you can buy (IMHO).  As soon as you go over 700MHz the CPU's change the internal cache divider and you take a step down again in performance.  The 700 will be the price/performance leader (over everything else in the world) until this summer when the new Thunderbird CPU's come out along with a new support chipset.  At that time the CPU's will have integrated L1 cache (kicks performance up by 10-30%) and will be able to use DDR SDRAM.  That's another reason to build a single server and then split it only if required.  You can do a new motherboard and CPU combo and move the high priced components over to get a system with a 20-40% performance boost later this summer.  Wait until late fall this year to build a new server and save even more.  Turn the old server into an I/O intensive beast and let the new one be the compute beast.

The newest Adaptec LVD SCSI controllers and mirrored IBM/Seagate 10k Rpm drives the really cook.  I have not tested the new 15K Rpm drives but their transfer rates almost demand the move to Ultra160.  If someone has had good results with a high performance RAID controller please advise.  Be aware that if you are doing lots of audio/video editing you will need the higher performance streaming I/O capability of the 15K Rpm drives and should probably put at least a 10K Rpm drive in each of the workstations doing the editing for local storage so you don't bury the network with I/O.  On the workstations an 10K Rpm ATA66 IDE device is fine.

This combination is quite cheap when you do them yourself in comparison with the higher end commercial systems.

You may want to consider DAT tape.  My experience with DAT has been lousy so I dumped it four years ago but the current state of the art may be greatly improved (Sony?).  If anyone has current experience in this area please let LTSP know about it.