wiki:OpenStackPlusCeph
Last modified 3 years ago Last modified on 09/18/14 14:32:00

OpenStack+Ceph

Overview and Backgroud

In Pictures

To understand where we are today with our transition to cloud computing technologies, it helps to understand our work, especially over the past two years as we prepared to add cloud service components to the research computing system (RCS).

In 2011, our systems environment environment was grouped into a number of functional components. We had three were traditional HPC compute engines, one new engine to support virtualization and storage expansion, and a interface fabric to allow users to access services. The environment looked like this:

RCS circa 2011

The compute engines and interactive fabric were implemented with traditional HPC technologies. Notably, the ROCKS cluster head node was the primary interface into the fabric. The virtualization component served as our experimental cloud fabric which hosted some initial services, most notably the Galaxy NextGen Sequencing platform and our RCS ''docs'' wiki platform.

RCS implementation

In September 2012 we significantly expanded the hardware for the Experimental Cloud fabric in an effort to support a full-scale conversion to cloud-oriented service and storage development. In April 2013 we implemented an OpenStack based cloud environment that is the focus of this OpenStackPlusCeph document. We also decommissioned our original and oldest compute engine June 2013.

With these features in place, we are now migrating our interactive services to a fully functional cloud environment based on OpenStack. This will continue to support our traditional access to HPC compute engines via login nodes but also allow our users a full complement of services that can be expanded to meet then need of any research workflow, whether that involves massive on-line archives, highly customized compute environments, or web services that package data analytics into easily consumed resources.

Our platform going forward can be scene as a cloud-base computing environment with high speed access to dedicated and dynamic compute resources:

RCS moves to cloud

In Words

The research computing system (RCS) is built on a collection of distinct hardware systems designed to provide specific services to applications. The RCS hardware includes dedicated compute fabrics that support high performance computing (HPC) applications where hundreds of compute cores can work together on a single application. These clusters of commodity compute hardware make it possible to do data analysis and modelling work in hours, work that would have taken months using a single computer. The clusters are connected with dedicated high bandwidth, low latency networks for applications to efficiently coordinate their actions across many computers and access a shared high speed storage system for working efficiently with terabytes of data.

Our newest hardware fabric, acquired 2012Q4, is designed to support emerging data intensive scientific computing and virtualization paradigms. This hardware is very similar to the commodity computers used by our traditional HPC fabrics, however, in addition to having many compute cores and lots of RAM, each individual computer contains 36TB of built in disk storage. Taken together, this newest hardware fabric adds 192 cores, 1TB RAM, and 420TB of storage to the RCS.

The built-in disk storage is designed to support applications running local to each computer. The data intensive computing paradigm exchanges the external storage networks of traditional HPC clusters with the native, very high speed system buses that provide access to local hard disks in each computer. Large datasets are distributed across these computers and then applications are assigned to run on the specific computer that stores the portion of the dataset it has been assigned to analyze. The hardware requirements for data intensive computing closely resemble the requirements for virtualization and can benefit tremendously from the configuration flexibility that a virtualization fabric offers.

In order to enhance flexibility and further improve support for scaling research applications, we are engineering our latest hardware cluster to act as a virtualized storage and compute fabric. This enables support for a wide variety of storage and compute use cases, most prominently, ample storage capacity for reliably housing large research data collections and flexible application development and deployment capabilities that allow direct user control over all aspects of the application environment.

In short, we are tooling this hardware to build a cloud computing environment.

We are building this cloud using OpenStack for compute virtualization and Ceph for storage virtualization. Crowbar will provision the raw hardware fabric. This approach is very similar to the mode we have been following with our traditional ROCKS-based HPC cluster environment. The new approach enhances our ability to automatically provision hardware and further improve the economics large scale computing.

We are implementing this environment with Dell and Inktank. These vendors and the upstream open source projects on which this platform is built, embrace the DevOps model for systems development. This will support further engineering collaboration with our vendors, enabling the UAB research community to continually enhance our fabric as needed and feed those enhancements upstream for inclusion in future support releases.

This solution rounds out the feature set of the RCS core and will provide a general framework to scale future growth.

Getting Started

Developers

For the impatient, there are two rows to hoe:

  1. If you are tasked with maintaining or admining the OpenStack fabric, you should start by getting familiar with CrowbarAndChef with emphasis on Chef.
  2. If you are tasked with building services using the OpenStack fabric, you should start by getting familiar with OpenStack itself. The Getting Started Guide provides a quick introduction to API interaction. You can even follow those examples using our local OpenStack install.'

These rows naturally cross over each other along the way. For example, if you start by building a service using OpenStack then you'll eventually get to the point of having to manage a bunch of services. You'll want to look at Chef then to help you manage complexity. If you start by managing the fabric, you'll eventually want it to do new things. The OpenStack API is then the way forward.

If you feel neither of these rows applies to you, then you should build your web app using your preferred framework. Once you are ready to deploy your app, you'll be ready to start with the options above. A good place to hop in then is by using Vagrant, as described in CrowbarAndChef.

Be sure to follow the steps below. They will get you oriented to our local deploy and get you up an running with a cloud VM.

General

Please review these resource to get familiar with Ceph, OpenStack, and Crowbar.

Documentation

Online documentation for Ceph and OpenStack are available. Be aware that our pilot currently uses the Essex OpenStack release and the XXX Ceph release. These older releases may not have all the features of the latest releases, however, sometimes the documentation for the more recent releases is better (this is especially true for openstack), so it is worth reading both the current release document for better understanding the operation and vision and then returning to the older documentation for specific steps.

System Sketch

This sketch outlines the VLAN configuration for OpenStack and Ceph. The Nova Fixed VLAN and allows isolation for the VMs using OpenStack's default "VLAN networking mode".

Schematic of cloud cluster network with notation

The VLAN configuration is based on the Dell Openstack reference architecture (High-level summary of components in July 12, 2012 announcement)

IP Ranges

Proposed IP ranges in the public space will be based on the /27 netmask so we will have "distinct" networks (really IP address groups, since we aren't actually routing). This creates a IP grouping mask of the 3 high bits in the last octet and leaves the lower 5 bits for host numbers. The groups are of the form 164.111.161.0/27, .32/27, .64/27, .96/27, .128/27, .160/27, .192/27, .224/27. These will be chunks of addresses we can assign down to the openstack and ceph public network.

Working with OpenStack and Ceph

Accessing the Pilot Platform

Currently, the pilot platform is only accessible from within the Research Computing System (RCS). Effective interaction requires you to set up a cluster desktop. Once you are connected to your cluster desktop, open a terminal (Applications->Accessories->Terminal) and create a tunnel for X11 traffic to the gateway node to our pilot network with ssh -X rcs-srv-02. In this SSH connection start firefox with firefox http://172.22.0.10. This will start firefox on the gateway, with a connection open to the OpenStack controller, displayed on you cluster desktop via X11 forwarding. You can then log into the OpenStack environment.

Note that this configuration assumes you are authorized to ssh to rcs-srv-02 gateway from within the RCS. It also assumes you have an account to log into the OpenStack controller. If you are interested in participating in this pilot, please send a request to support@….

Working with Glance

OpenStackGlance

Launching a VM

Creating a VM in OpenStack is easy. Simply follow these steps.

  • On the "Access & Security" tab:
    1. Create or Import an SSH key into your OpenStack account. A default account (username: ubuntu) is created with this public key set for SSH public key authentication when the VM is created. This is how you will log in to the VM. Note: using your ~/.ssh/ida_rsa.pub public key from Cheaha will simplify access to the VM but you can just as easily download a newly created key and then use ssh -i <keyfilename> <vmip> when you go to start your SSH session.
    2. Create a Security Group that allows SSH access to your VM. This is controls the OpenStack firewall fabric to allow access to your host.
    3. Allocate an IP address. This is an IP on the "public" side of the VM fabric that will be mapped to your VM after it is started.
  • On the "Images & Snapshots" tab:
    1. Click the "Launch" button next to the VM image you want to start. It's recommended you use the ubuntu-12.04.2-lts image for now, since there is good client tool support for Ceph and OpenStack on this platform and that will simplify dev and exploration.
    2. On the dialog that comes up, name your machine, provide some notes about your use, pick a flavor (small is a good choice), select the ssh-only security group.
    3. Press "Launch Instance". Your VM will be provisioned.
  • On the "Access & Security" tab:
    1. Select an unallocated IP address and press the "Associate IP" button next to it.
    2. Choose the instance you want to associate the IP with in the dialog and then press "Associate IP".

Your VM is now ready to use and can be accessed from the rcs-srv-02 gateway host via SSH using the default username "ubuntu" and the IP address associated with your VM above.

  1. If you imported your existing SSH public key: ssh ubuntu@<associated-ip-for-vm>
  2. If you created a new SSH keypair: ssh -i <keypair-name.pem> ubuntu@<associated-ip-for-vm>

There is currently a feature limit in our fabric that prevents the VMs from resolving hostnames to communicate the outside world. We are working to provide the correct DNS configuration to the VMs (ticket:220). An easy work-around is to edit /etc/resolv.conf and change the DNS server to 138.26.134.2, the local UAB name server.

Adding Storage to your VM

All storage that is part of the base VM image is ephemeral. It will be deleted if you terminate your instance. Persistent storage must be defined explicitly by creating a volume in OpenStack. These volume use the block storage from our Ceph cluster.

  • Create a volume
    1. Go to the "Instances & Volumes" tab in the OpenStack dashboard.
    2. Click "Create Volume".
    3. In the dialog give it a name and define the size in gigabytes. Click "Create Volume" to create the storage
  • Attach the volume to a VM
    1. Go to the "Instance & Volumes" tab in the OpenStack dashboard.
    2. Click on the "Edit Attachments" button next to the new volume.
    3. In the dialog, select the VM instance you want to attach this volume to and the device name to use (the default is fine if you have only one attached volume). Click the "Attach Volume" button.

Your volume is now attached to the VM and can be used. Note, if this is a new volume you will need to format the block device. This volume is essentially a raw unprocessed disk. It needs to have partitions (if you want them) and a file system created. You can do all that from within the SSH to your VM to which you attached the volume.

Note: these volumes are stored in our Ceph fabric and persist after a VM instance is terminated. If you are working with data in these volumes you can attach them to other VMs, just be sure to unmount the volume from within the VM to which it is currently attached. You are basically moving disks around between machines so be mindful of the good practices you would follow for real disks.

Performance

Some initial lightweight tests for StoragePerformance between our existing RCS storage (Hitachi SAN and DDN+Lustre) and the Ceph block devices connected to a VM.

Accessing the OpenStack API

If you've got an Ubuntu 12.04 client you can easily access the OpenStack API with command line tools. If you don't, just create an ubuntu instance in OpenStack (see Launching a VM above).

Install the Nova and Glance clients

On ubuntu simply add the packages for glance and nova. For Ubuntu 12.04 images:

sudo apt-get install python-novaclient glance-client

For Ubuntu 14.04 images:

sudo apt-get install python-novaclient python-glanceclient

Set up the Shell

On the OpenStack dashboard, click on the "Settings" link in the upper left of the web page. Then click on the "OpenStack Credentials" tab, click the "Download RC File" button. This will download a file that configures your shell's environment with the variables the command line tools need to access the APIs.

To transfer this file to you ubuntu box, run the the following on the ubuntu box:

scp blazerid@10.111.161.10:Downloads/openrc.sh .

After the file is copied, simply source the file:

source openrc.sh

Interact with the OpenStack API

You can list the images in the Glance repository:

ubuntu@jpr-test3:~$ glance index
ID                                   Name                           Disk Format          Container Format     Size
------------------------------------ ------------------------------ -------------------- -------------------- --------------
c6e31b8c-dff1-40fc-855a-d47649d93abb ubuntu-12.04.2-lts             raw                  ovf                       252313600
46590c13-4e79-49b0-b3e0-0fd9e0b04c94 ubuntu-11.04-image             ami                  ami                      1476395008
ec4749fb-63f6-40ae-8e57-3e91734a5395 ubuntu-11.04-initrd            ari                  ari                           91708
08c2307b-3624-438c-a5b0-36b5cbae707a ubuntu-11.04-kernel            aki                  aki                         4594816

You can also list the servers running in your tenant space:

ubuntu@jpr-test3:~$ nova list
+--------------------------------------+-----------------------------------+--------+-------------------------------------+
|                  ID                  |                Name               | Status |               Networks              |
+--------------------------------------+-----------------------------------+--------+-------------------------------------+
| 1483c21a-4b51-4da8-a6da-dd0a94c4d46e | test4                             | ACTIVE | private_1=172.21.0.74, 172.22.128.7 |
| 2a5b5175-e2ff-46b3-ad3d-06e62ae51a02 | jpr-devops-2013-05-28ubuntu-11.04 | ACTIVE | private_1=172.21.0.70, 172.22.128.4 |
| 44435930-55f1-472f-9492-eb13eea74b88 | jpr-test-ubuntu-11.04             | ACTIVE | private_1=172.21.0.67, 172.22.128.1 |
| 7553d5a4-b4f1-400b-91d9-805b472e0c21 | jpr-test-ubuntu-12.04             | ACTIVE | private_1=172.21.0.73, 172.22.128.2 |
| a7ee6279-dd74-4e53-8d5d-705ce72a32dc | billb-test-serv01                 | ACTIVE | private_1=172.21.0.71, 172.22.128.5 |
| c7917a41-70b6-4869-aafa-315e2a9e2e14 | jpr-test3                         | ACTIVE | private_1=172.21.0.69, 172.22.128.6 |
+--------------------------------------+-----------------------------------+--------+-------------------------------------+

Accessing Management Tools

The following web pages are accessible but appear to be loading very slowly. See ticket:221 for debugging effort on the performance. The default passwords for these services are in the Crowbar documentation.

A helpful script-let to setup access to the services via SSH port forwards so you can reach nagios (http://localhost:9080), ganglia (http://localhost:9080/ganglia), chef (http://localhost:4000), and crowbar (http://localhost:3000) in one browser tabset follows, the 4040 address allows Chef API access:

OSPC_GW=<name-of-gateway-server>
ssh -L 4000:172.16.128.101:4000 -L 4040:172.16.128.101:4040 -L 8080:172.16.128.101:80 -L 3000:172.16.128.101:3000 $OSPC_GW

Accessing Nagios & Ganglia

You can access nagios and ganglia from your cluster desktop running Firefox from rcs-srv-02 but tunnelling through the controller to the admin web server. These steps assume you are running firefox on rcs-srv-02 and will have a shell open on rcs-srv-02. Note: ganglia and nagios are both served by the default web server on the admin node and are only distinguished by their URL.

  1. ssh -X rcs-srv-02
  2. firefox &
  3. ssh -L 8080:admin:80 crowbar@172.22.0.10 # choose a port instead of 8080 if you get a conflict
  4. In firefox, Ctrl+l and enter http://localhost:8080
  5. log into nagios
  6. In firefox, Ctrl+l and enter http://localhost:8080/ganglia

Accessing Crowbar

You can access crowbar from your cluster desktop running Firefox from rcs-srv-02 but tunnelling through the controller to the admin web server. These steps assume you are running firefox on rcs-srv-02 and will have a shell open on rcs-srv-02.

  1. ssh -X rcs-srv-02
  2. firefox &
  3. ssh -L 3000:admin:3000 crowbar@172.22.0.10 # choose a local port instead of 3000 if you get a conflict
  4. In firefox, Ctrl+l and enter http://localhost:3000
  5. log into crowbar

Accessing Chef

You can access Chef from your cluster desktop running Firefox from rcs-srv-02 but tunnelling through the controller to the admin web server. These steps assume you are running firefox on rcs-srv-02 and will have a shell open on rcs-srv-02.

  1. ssh -X rcs-srv-02
  2. firefox &
  3. ssh -L 4040:admin:4040 crowbar@172.22.0.10 # choose a local port instead of 4040 if you get a conflict
  4. In firefox, Ctrl+l and enter http://localhost:4040
  5. log into chef

Crowbar and Chef

Controlling the fabric requires an understanding of Crowbar and Chef. These tools are responsible for the bare metal provisioning and on going system configuration, respectively.

Ceph

Ceph is the distributed storage system that underlies all our cloud storage. You probably will never have to know anything more than that. Most people will work directly with their own data sets and never have to think about implementation details of storage.

If you are designing storage systems though, then understanding the operation of this storage virtualization layer is important and is easiest to do if you can experiment with your own storage cluster. Learn HowToSetUpYourOwnCephCluster on our cloud fabric.

Attachments