wiki:DevOps-2012-10-16
Last modified 7 years ago Last modified on 10/16/12 13:00:36

DevOps Weekly Meeting | October, 16 2012

Time & Location: 11am-12:30am in LHL164

Attending

mhanby, tanthony, pavgi, jpr

2012-10-16 Agenda

  • Agenda bash
  • Funding
    • gather data/stats
    • draft slides
    • todo: put research computing day presentations online
  • Lustre status
    • cheaha running at 288 cores
    • recovery-based rebuild is progressing
      • metadata has an improved rebuild and we see files of interest
      • we are looking at the initial recovered directory structure
    • todo: analyze rocks boot to see if new driver or different load order on kickstart boot was the problem
    • req: need a forensic boot image where we can test access to hardware without the running kernel writing to it, ie. verify that a virtually detached device is actually detached eg. in converged networks physical interface disconnect is not possible
  • Large mem nodes
    • still waiting on IB hw
    • nodes having issue with new broadcom cards, need new pxe boot kernel
  • Research storage
    • backup of hitachi drives on nas-02
      • set up config with ed to shadow copy luns
      • disconnect nas-01 from production luns
      • attach nas-01 to shadowed copy and verify fs integrity of shadow copy
    • locate 4 of new systems in hunstville
    • use cases
      • enable user to mount cloud storage or other fabrics to move off data results from scratch (eg. fusermount of S3)
      • enable user to mount disk images on compute nodes for more efficient performance of Lustre with small files (many small files in one big file image)
      • test drdb on virtual drives

Summary

Discussion