wiki:DevOps-2013-11-05
Last modified 6 years ago Last modified on 12/31/13 15:05:09

DevOps Weekly Meeting | November 5, 2013

Time & Location: 10:00am-11:45pm in LHL164

Attending

tanthony, jpr, mhanby, billb, edharris, rpillai, dls, bade, israel

2013-11-05 Agenda

  • Puppet and provisioning overview

Summary

Reviewed the provisioning fabrics used in RCS.

Discussion

The Research Computing System (RCS) uses a variety of fully automatic and semi-automatic provisioning and management tools to deploy new hardware and apply updates to systems. Our roots in programmatically controlled IT reach back over a decade but our current solutions can be fairly cleanly grouped into three distinct subsystems:

  1. the ROCKS cluster platform
  2. the in-house developed Kickstart+Puppet
  3. the Crowbar (PXE+Chef) based OpenStackPlusCeph cloud fabric

Each one of these fabrics has a mechanism for provisioning hardware, managing system state, and presenting a platform abstraction to the app layer. These fabrics were summarized on the whiteboard during the meeting:

Devops whiteboard 2013-11-05

ROCKS model

The ROCKS cluster has been a leading model over the past decade for how to manage large scale systems without having to grow system admin labor excessively. In a stock install:

  1. the entire cluster is provisioned from a single traditional OS install on the head node of the cluster.
  2. This is followed by a node-by-node discovery and auto-provivisioning step via PXE boot and kickstart, a RedHat? native automated system install tool. The cluster is based on CentOS5 for our current ROCKS 5 install.
  3. ROCKS also uses a custom management tool called 411 to distribute operational changes across images, mostly related to files in /etc for accounts and other cluster wide configuration.

In-house RCS model

The default ROCKS model is fine for a very homogeneous environment but over time, as we have created specialized services in our environment, we have built a process around Kickstart system profiles that are augmented post-install and during on-going management by Puppet. This is the primary process used for all the production systems (physical and virtual) that provide services to the cluster, eg. NAS, nagios, VM, Galaxy, etc. The typical process includes:

  1. Initiating the system provisioning at the baseboad management layer (BMC) (iDrac cards for Dell hardware). The system is booted off a mapped CDROM in iDrac and then given kernel parameters that pull a role-specific kickstart file.
  2. The kickstart includes basic configuration and loads a number of post-install scripts that configure the system for the specific role it will play, including assigning IP addresses, disk configuration, and finally the puppet configuration.
  3. Puppet begins to manage the system after the first boot. The puppet master configuration files is maintained in our atlab Git repo. One of the most common steps managed by puppet to to allocate new accounts. Puppet then applies the change to the cluster head node which cascades the change across nodes via 411.

Crowbar OpenStackPlusCeph model

The latest addition to our provisioning fabric is the OpenStackPlusCeph fabric managed by Crowbar. Crowbar uses Chef behind the scenes. The process is very similar to our in-house Kickstart+Puppet solution except that Crowbar has components that trigger the hardware provisioning from the discovery phase through a PXE boot, much like ROCKS.

We are still learning the ropes of the new model for OpenStackPlusCeph but it can fairly be summarized in a three layer model like the ones above as follows:

  1. The entire cluster is provisioned from a single OS install on the admin node of the cloud cluster
  2. Node-by-node discovery is managed through the Crowbar web interface. The provisioning is handled through a PXE boot sequence and managed system installs. In this case the cluster is based on Ubuntu 12.04 LTS. The systems are imaged based on their role as a nova-compute (cloud VM) or Ceph storage node.
  3. After systems are installed, all post install management is conducted by Chef (much like Puppet in our in-house solution).

Extending Provisioning Services

All of these approaches should be seen as defining a systems platform on top of which other management fabrics could be installed. How to approach the "application layer" depends on what you are trying to accomplish. If you are extending a specific system platform, we typically work within the existing provisioning model for that system, eg. ROCKS, in-house, or OpenStackPlusCeph. If you are defining a new systems collection we can adopt any one of the models above or leverage a entirely different tool chain.

This is easiest to see when working within in a tenant space of OpenStack. You could use a modified Kickstart+Puppet model or leverage newer cloud deploy tools like Vagrant, that make it easy to move between developer and cloud production environments. This is easiest to see when working within in a tenant space of OpenStack. You could use a modified Kickstart+Puppet model or leverage newer cloud deploy tools like Vagrant, that make it easy to move between developer and cloud production environments.

Attachments