Failover
- Server nodes can be monitored for failure.
- Redundant servers can be used to handle failure.
- failout mode: Lustre clients immediately receive errors (EIOs) after a timeout, instead of waiting for the OST to recover
- failover mode: Lustre clients wait for the OST to recover.
- Only one MDS is active per file system. Future releases 2.0+ (Nov. 2009) will facilitate clustering of MDS.
- An OST can't be shared with more than one OSS. However, an OSS can have many OSTs.
- Redundant/passive MDS and OSS can be used for handling failures.
RAID
- Hardware RAID recommended over software RAID.
- MDS does a large amount of small writes. Recommended: RAID1 or RAID1+0 for MDT storage.
- OSS: Recommended RAID6 or any other double parity algorithm. Manual further says, use RAID 5 with 5 or 9 disks, or RAID 6 with 6 or 10 disks, each on a different controller. Ideally, the RAID configuration should allow 1 MB Lustre RPCs to fit evenly on one RAID stripe without requiring an expensive read-modify-write cycle.
- Stripe width is the minimum amount of data that can be written to a raid (normally 5 or 6) without a read-modify-write operation -- the optimal minimum IO size.
- Chunk size is in units of 4096-byte blocks and represents the amount of contiguous data written to a single disk before moving to the next disk
- stripe_width = <chunk_size> * ( <disks> - <parity_disks> ) <=1 MB
- chunk_size = 1024kb / 4
- No redundancy support provided by lustre itself - RAID0 (striping).
Backups
- Device level backups
- File level backups - recommended
- LVM snapshots.
Download in other formats: