wiki:DevOps-2012-10-09
Last modified 7 years ago Last modified on 10/16/12 11:18:10

DevOps Weekly Meeting | October, 9 2012

Time & Location: 11am-12:30am in LHL164

Attending

2012-10-09 Agenda

  • Agenda bash
  • Lustre status
    • recovery-based rebuild is progressing
      • object store has been indexed
      • metadata has a preliminary rebuild
      • we are looking at the initial recovered directory structure
      • research process: env curation, reliability, data security dialogs
  • Research storage backup
    • backup of hitachi
    • use cases
      • enable user to mount cloud storage or other fabrics to move off data results from scratch (eg. fusermount of S3)
      • enable user to mount disk images on compute nodes for more efficient performance of Lustre with small files (many small files in one big file image)
      • test drdb on virtual drives

Summary

Discussion

The primary issue right now on the Lustre reovery is meaningful reconstruction of the directory structure. There are many files and with out the organization of the directory structure meaningful recovery of files will be difficult. We are working with some well-known file system trees and comparing the data to the preliminary dir structure we have in hand.

Initial discussion on identifying key data to backup /home, apps, other cluster rebuild components. We can potentially accomplish the backups at the SAN snap shot layer, or existing infrastructure backup process layer, both of these layers have licensing issues for features. We can move forward with system level backup tools like Baccula, Amanda, or rsync without licensing restrictions. There will also be different assessments for our cloud object stores and the VMs, eg. COW files.