Failover

  • Server nodes can be monitored for failure.
  • Redundant servers can be used to handle failure.
    • failout mode: Lustre clients immediately receive errors (EIOs) after a timeout, instead of waiting for the OST to recover
    • failover mode: Lustre clients wait for the OST to recover.
  • Only one MDS is active per file system. Future releases 2.0+ (Nov. 2009) will facilitate clustering of MDS.
  • An OST can't be shared with more than one OSS. However, an OSS can have many OSTs.
  • Redundant/passive MDS and OSS can be used for handling failures.

RAID

  • Hardware RAID recommended over software RAID.
  • MDS does a large amount of small writes. Recommended: RAID1 or RAID1+0 for MDT storage.
  • OSS: Recommended RAID6 or any other double parity algorithm. Manual further says, use RAID 5 with 5 or 9 disks, or RAID 6 with 6 or 10 disks, each on a different controller. Ideally, the RAID configuration should allow 1 MB Lustre RPCs to fit evenly on one RAID stripe without requiring an expensive read-modify-write cycle.
  • Stripe width is the minimum amount of data that can be written to a raid (normally 5 or 6) without a read-modify-write operation -- the optimal minimum IO size.
  • Chunk size is in units of 4096-byte blocks and represents the amount of contiguous data written to a single disk before moving to the next disk
  • stripe_width = <chunk_size> * ( <disks> - <parity_disks> ) <=1 MB
  • chunk_size = 1024kb / 4
  • No redundancy support provided by lustre itself - RAID0 (striping).

Backups

  • Device level backups
  • File level backups - recommended
  • LVM snapshots.