Last modified 6 years ago Last modified on 08/27/13 12:08:55

DevOps Weekly Meeting | August 27, 2013

Time & Location: 10am-12:00pm in LHL164


tanthony, jpr, pavgi, mhanby, billb, bade

2013-08-27 Agenda


  • Storage and Backup
    • manual install of centos 6 like gluster in hvill on temp ip
      • need bmc access to hville nodes from kirby
      • test ceph ops to see if cluster bandwidth ok
    • crashplan vms on openstack
    • extend vlan to hville for "prod" data flow
    • explore crowbar install of grizzly w/ceph and w/o ceph
      • test virtualbox crowbar installs ticket:
  • NGS/Galaxy
    • Galaxy upgrade
      • target for deploy as soon as green light from test team (hopefully before fall13)
        • each tool will need to be migrated one-by-one because each has own migration script
      • postgres backend config testing
        • backup and restore testing on pilot openstack fabric
      • blast/n will be installed via toolshed and takes about 2days (boost) to install
  • Lustre
    • some sipsey nodes are offline because of lustre connectivity problems
  • Hardware upgrades (cont)
    • RAM upgrade on Sipsey - pending pledges
  • Research Computing Day set for Sept 26
  • OpenStackPlusCeph (carry forward from last week)
    • will work on nas-01 connection to admin network so it can be a storage gateway to public and cluster nets, use additional 10G card to connect directly, upgrade to centos6
    • will work on admin node to connect to public network for dns and ntp connectivity
    • Grizzly upgrade
      • Crowbar 1.6 released, will test install via VirtualBox -- destructive upgrade
      • Ceph Dumpling released, explore if supported by barclamp
      • Want to include a swift/s3 object store

  • old pending issues
    • Fix rcs-srv-02 NAT rules
    • ai: need to create a uab public to floating-public translation table
    • ai: need to embed table in DNS
    • ai: need an ubuntu desktop image in glance. may require contortion of launching vm with iso or getting iso in glance and then installing into a volume and then launch a subsequent instance from that volume
    • ai: jpr: apply changes to ceph read caching, requires nova-compute restart. pending understanding crowbar and chef
    • todo: we need access to the admin node interface via the controller
    • todo: we need to engage with dell on crowbar limitation on storage use, don't know if we get improvements from storage.
    • workshops next thursday
  • OSG
    • jpr working on running octave on SURAgrid
  • dspace report.
    • ai: jpr: need to complete draft


Go over steps for upgrading to Grizzly and fixing the Crowbar limitation that prevents us from using 1/3 of our storage due to bios config. Debated different approaches to problem including deving the fix for the crowbar (which we want long term) and setting up a "manual" ceph fabric. Looking also at manifesting the huntsville nodes for testing raw ceph install. This would help us compare the fabric approaches. Also will help us prep env for crashplan test.