Last modified 7 years ago Last modified on 02/05/13 12:34:45

DevOps Weekly Meeting | February 5, 2013

Time & Location: 11am-12:30am in LHL164


mhanby, tanthony, jpr, pavgi. billb, bade

2013-02-05 Agenda

  • Agenda bash
  • Updates
  • Overview
  • Review Dell XPS 13" laptop
  • Status update on reboot rcs-cloud-01/02
    • need to refresh on how to connect to these device
  • Storage updates
    • confusion in the user community on /scratch policy impact
    • need to clarify impact and plans to bring online new storage to address long term storage
    • need to clarify that we don't need to disrupt all operations of the cluster, can do staged services
    • need transfer service for going from ceph to scratch quickly
    • campus network too slow to unstage data
    • educate folks on need for data management plan as part of workflow
  • Research Cloud/Research? Computing System
    • Awaiting updated Jumpstart proposal from Dell
  • Luster updates and issues
    • performance generally normal, need to find cause of load spikes and adjust that workflow
    • documenting different workflow scenarios with /scratch/local and tarballs vs large files on /scratch/user with striping
    • still have a job crash of compute node issue with one user job
    • delete recovered files for galaxy
      • need to work on a job script to split the 140k files in to multiple chunks
      • see atlab:ticket:533
  • Data recovery process
    • Development for visualizations needed
  • ScaleMP and large memory nodes
    • will have trial license for new nodes to make a 1TB RAM node
  • trial
    • recruiting participants for trial
  • Communities for feedback
    • concern over the new /scratch/user /scratch/shared deletion policy


Reviewed dell laptop config. Updated on dell jumpstart process. Storage confusions and need for clarity about policy. Discuss workflow for deleting unneeded galaxy files recovered from Lustre.