Last modified 7 years ago Last modified on 10/30/12 14:47:03

DevOps Weekly Meeting | October, 30 2012

Time & Location: 11am-12:30am in LHL164


mhanby, tanthony, jpr, dls

2012-10-30 Agenda

  • Agenda bash
  • Updates
  • UAB Research Core Day
    • draft will be circulated today and revisions during the week
  • Cluster stats
    • Comparing queue wait time using MATLAB for data analysis graphs
    • we are seeing the same pattern of increase as we did in fall 2010 before general availability of the gen3 nodes in Jan 2011.
  • Lustre status
    • lustre is online doing preliminary assessment of data
    • working on plan to expose files to users
      • likely read only access
      • plan a service window
    • todo: analyze rocks boot to see if new driver or different load order on kickstart boot was the problem
  • Funding
    • gather data/stats
    • draft slides
    • todo: put research computing day presentations online
  • Large mem nodes
    • no updates
  • Research storage
    • no updates


A quick review of the agenda.


Discuss narrative for funding requests. This is our elevator pitch.

Discuss plan to expose files in the luster file system for user review. We need to let users start reviewing recovered data and working with them to identify file patterns to target. We will set up some compute nodes with read-only access to luster and help people to copy out the data they value to build their desired recovery data set on new storage. After this process completes at end of nov, We will refresh the luster file system and move users into the new space.

We are seeing some file system errors for files that have holes in them, for example stripped files for which only some of the OST data was available. We are treating these as anomalies of the rebuild since we don't expect to have lost complete stripe sets. These need to be removed from the file system because it causes the commands that excuted on them (like ls) to hang.