Last modified 7 years ago Last modified on 05/07/13 13:56:00

DevOps Weekly Meeting | May 7, 2013

Time & Location: 11am-12:30am in LHL164


mhanby, tanthony, jpr, billb

2013-05-07 Agenda

  • Agenda bash
  • OpenStackPlusCeph
    • "Public" network connected to superX (vlan494) - create atlab:ticket:625 to move work into kintana and get it in process
    • Todo
      • Performance numbers on containers
      • Setup nas-01 to serve up jpr and mhanby home off Ceph
  • COMSOL on lmgr.uabgrid
  • CLC test on Cheaha
    • responded to debug steps for CLCbio folks. looks like the problem is a listener config
  • Luster space
  • Service window
    • Coming up next week
    • Address SMP job needs with more reserved nodes
  • move projects vm to cloud-02
  • Ubuntu KVM tests -- on hold


Review extension of vlan494 into the superX fabric so we can reach the OpenStackPlusCeph fabric. Will get Kintana ticket later today.

Outlined supporting SMP jobs with more nodes


We talked about updating our review of the queue job log to see if the new large memory nodes have helped the SMP job wait times. Given general feedback it appears to be so. Based on this analysis we will also add some of the 48GB ram nodes into the SMP only pool to further reduce scheduling delays. We will define an SMP (2+ cores) queue to ease reservation requests. Should be available after the service window.

The impact to the MPI and serial jobs should be minimal since they are already very good at filling small holes in the cluster fabric. The problem we have had with SMP jobs is that there aren't enough big-enough holes in the compute pool to get the SMP jobs running, due to the success filling up all the small holes with serial and MPI jobs.

Need to explore expanding compute nodes on the 10G networking fabric since it is more cost effective. This is similar to what we are doing with our latest hardware used by OpenStackPlusCeph. This would work well with the serial and SMP jobs and leave the IB fabrics to the MPI jobs. The other dependencies on IB is Lustre and our /scratch spaces, but this may perform well enough on 10G. We haven't tested that.