This page is intended to document Lustre exploration and make notes about things learned in the process. It may seem like a dump space initially, but this stuff needs to be documented somewhere. This place seemed to be a good start. I also intend to put together any questions that we might be particularly interested during Lustre training session.

A good place to start about Lustre would be our introduction. It provides brief introduction and links to other Lustre related pages. The LustreQuickStart page shows a simple Lustre installation with Lustre servers and patchless Lustre client.

Pre-reqs

MGS

  • Orchestrates between clients and servers
  • The Lustre manual mentions having only one MGS per site, however see this thread for exceptions and more information.
  • The MGS and MDS are typically combined in many installations. Although, this seems good idea

MDS

  • Unique per file system
  • 1-2% of file system capacity
  • Requires more CPU power
  • RAID 0+1 recommended
  • Failover configuration

OSS

Mounting file system

  • File system name – 8 characters
  • Device name – 8 characters
  • Clients can use file systems with longer names (mount point)
  • Mounting file system starts service
  • Use -o nosvc to mount without starting service
  • -o and -d (loop device) can't be used together?

Unmounting

  • umount stops service
  • d - used while unmounting loop devices
  • f - unmount without recovery enabled

Failover configuration

  • Lustre failover can be used to support failure of a node/server, but not the storage/target.

Data redundancy