wiki:DevOps-2013-09-03
Last modified 6 years ago Last modified on 09/03/13 11:09:39

DevOps Weekly Meeting | September 3, 2013

Time & Location: 10am-11:10pm in LHL164

Attending

tanthony, jpr, pavgi, mhanby, billb

2013-09-03 Agenda

Summary

  • Storage and Backup
    • manual install of centos 6 like gluster in hvill on temp ip
      • need bmc access to hville nodes from kirby
      • test ceph ops to see if cluster bandwidth ok
    • crashplan vms on openstack
    • extend vlan to hville for "prod" data flow
    • explore crowbar install of grizzly w/ceph and w/o ceph
      • test virtualbox crowbar installs ticket:
  • NGS/Galaxy
    • Galaxy upgrade
      • target for deploy as soon as green light from test team (hopefully before fall13)
        • each tool will need to be migrated one-by-one because each has own migration script
      • postgres backend config testing
        • backup and restore testing on pilot openstack fabric
      • blast/n will be installed via toolshed and takes about 2days (boost) to install
        • BLAST is currently a blocking issue in the migration to the new release. Several people depend on BLAST and their work would be affected if BLAST is not functional.
        • It depends on BOOST libs and this complicates deploy because one generic make may not work for all users.
        • There are other tool migration issues that may be introduced for workflows/history as the linkages to tools may break.
        • ai: pavgi: send email to galaxy-dev focusing on BLAST and if it's non-operation after upgrade would be acceptable. We need a decision on Galaxy deploy with toolshed at this thursday's galaxy-dev mtg.
  • Lustre
    • some sipsey nodes are offline because of lustre connectivity problems
    • ai: mhanby: open ticket to debug.
  • Hardware upgrades (cont)
    • RAM upgrade on Sipsey - on hold to next FY
    • Would help to have usage breakdown between groups
  • Research Computing Day set for Sept 26
  • OpenStackPlusCeph (carry forward from last week)
    • will work on nas-01 connection to admin network so it can be a storage gateway to public and cluster nets, use additional 10G card to connect directly, upgrade to centos6
    • will work on admin node to connect to public network for dns and ntp connectivity
    • Grizzly upgrade
      • Crowbar 1.6 released, will test install via VirtualBox -- destructive upgrade
      • Ceph Dumpling released, explore if supported by barclamp
      • Want to include a swift/s3 object store

  • old pending issues
    • Fix rcs-srv-02 NAT rules
    • ai: need to create a uab public to floating-public translation table
    • ai: need to embed table in DNS
    • ai: need an ubuntu desktop image in glance. may require contortion of launching vm with iso or getting iso in glance and then installing into a volume and then launch a subsequent instance from that volume
    • ai: jpr: apply changes to ceph read caching, requires nova-compute restart. pending understanding crowbar and chef
    • todo: we need access to the admin node interface via the controller
    • todo: we need to engage with dell on crowbar limitation on storage use, don't know if we get improvements from storage.
  • MATLAB
    • workshops next thursday
  • OSG
    • tanthony getting OSG creds
  • dspace report.
    • ai: jpr: need to complete draft

Discussion

Go over steps for Galaxy upgrade. There are some issues with Toolshed support for BLAST/n. We discussed several possibilities for upgrade including keeping the existing galaxy online for workflow and history access (important for data provenance) and bless the new version with Toolshed for new work.

Went over the hvill connection and will take vlan495 to the storage nodes in hvill. Will manifest the huntsville nodes for testing raw ceph install. This will help us compare the fabric approaches and will provide storage env for crashplan test.