wiki:DevOps-2013-02-05
Last modified 7 years ago Last modified on 02/05/13 12:34:45

DevOps Weekly Meeting | February 5, 2013

Time & Location: 11am-12:30am in LHL164

Attending

mhanby, tanthony, jpr, pavgi. billb, bade

2013-02-05 Agenda

  • Agenda bash
  • Updates
  • Overview
  • Review Dell XPS 13" laptop
  • Status update on reboot rcs-cloud-01/02
    • need to refresh on how to connect to these device
  • Storage updates
    • confusion in the user community on /scratch policy impact
    • need to clarify impact and plans to bring online new storage to address long term storage
    • need to clarify that we don't need to disrupt all operations of the cluster, can do staged services
    • need transfer service for going from ceph to scratch quickly
    • campus network too slow to unstage data
    • educate folks on need for data management plan as part of workflow
  • Research Cloud/Research? Computing System
    • Awaiting updated Jumpstart proposal from Dell
  • Luster updates and issues
    • performance generally normal, need to find cause of load spikes and adjust that workflow
    • documenting different workflow scenarios with /scratch/local and tarballs vs large files on /scratch/user with striping
    • still have a job crash of compute node issue with one user job
    • delete recovered files for galaxy
      • need to work on a job script to split the 140k files in to multiple chunks
      • see atlab:ticket:533
  • Data recovery process
    • Development for visualizations needed
  • ScaleMP and large memory nodes
    • will have trial license for new nodes to make a 1TB RAM node
  • Box.net trial
    • recruiting participants for trial
  • Communities for feedback
    • concern over the new /scratch/user /scratch/shared deletion policy

Summary

Reviewed dell laptop config. Updated on dell jumpstart process. Storage confusions and need for clarity about policy. Discuss workflow for deleting unneeded galaxy files recovered from Lustre.

Discussion