Expected and Recent Downtime
- Sep 04, 1999 23:00 - 24:00
- The entire lab (alecto.physics, tisiphone.physics, and *.pws) will be
shut down for ~1 hour for hardware relocation.
|
|
Hardware Configuration
The PWS lab is designed to be as fault-tolerant as possible. Features include:
- Redundant power
- Both servers and the drive array are powered from a TrippLite SmartPro 2200 NET
Uninterruptible Power Supply. The UPS is configured with an extra battery to extend
runtime from ~15 min. to ~1 hour.
- The drive array has two power supplies. In the event that one power supply
malfunctions, it will continue to operate, and the faulty power supply can be replaced
without any downtime.
- The networking equipment is also protected by TrippLite UPSes so users can
continue to remotely access the lab even during a building-wide power outage.
- Redundant data
- All user data is on an MTI drive array. This drive array is currently configured
with two 18-gig disks, which are mirrors of each other. In addition to the enhanced
speed the mirroring provides, if either disk has a failure, the system will continue
to operate on the other until the faulty disk is replaced. There will be no loss of
data and no system downtime.
- Long-term power outage tolerance
- We are in the process of configuring the systems so that, in the event of a
long-term power outage (longer than ~1 hour) the systems will perform an orderly
shutdown to prevent the loss of data when the UPS runs out of battery power.
- Another concern is that without power to the building, the network/server room
will have a sharp increase in temperature since the air conditioner will not function.
Our systems can monitor the temperature, and will perform an orderly shutdown if the
temperature increases beyond operating limits.
|
|
|