Version 16 (modified by jpr@…, 5 years ago) (diff)

Add outline of steps for launching a VM from OpenStack


Overview and Backgroud

The research computing system (RCS) is built on a collection of distinct hardware systems designed to provide specific services to applications. The RCS hardware includes dedicated compute fabrics that support high performance computing (HPC) applications where hundreds of compute cores can work together on a single application. These clusters of commodity compute hardware make it possible to do data analysis and modelling work in hours, work that would have taken months using a single computer. The clusters are connected with dedicated high bandwidth, low latency networks for applications to efficiently coordinate their actions across many computers and access a shared high speed storage system for working efficiently with terabytes of data.

Our newest hardware fabric, acquired 2012Q4, is designed to support emerging data intensive scientific computing and virtualization paradigms. This hardware is very similar to the commodity computers used by our traditional HPC fabrics, however, in addition to having many compute cores and lots of RAM, each individual computer contains 36TB of built in disk storage. Taken together, this newest hardware fabric adds 192 cores, 1TB RAM, and 420TB of storage to the RCS.

The built in disk storage is designed to support applications running local to each computer. The data intensive computing paradigm exchanges the external storage networks of traditional HPC clusters with the native, very high speed system buses that provide access to local hard disks in each computer. Large datasets are distributed across these computers and then applications are assigned to run on the specific computer that stores the portion of the dataset it has been assigned to analyze. The hardware requirements for data intensive computing closely resemble the requirements for virtualization and can benefit tremendously from the configuration flexibility that a virtualization fabric offers.

In order to enhance flexibility and further improve support for scaling research applications, we are engineering our latest hardware cluster to act as a virtualized storage and compute fabric. This enables support for a wide variety of storage and compute use cases, most prominently, ample storage capacity for reliably housing large research data collections and flexible application development and deployment capabilities that allow direct user control over all aspects of the application environment.

In short, we are tooling this hardware to build a cloud computing environment.

We are building this cloud using OpenStack for compute virtualization and Ceph for storage virtualization. Crowbar will provision the raw hardware fabric. This approach is very similar to the mode we have been following with our traditional ROCKS-based HPC cluster environment. The new approach enhances our ability to automatically provision hardware and further improve the economics large scale computing.

We are implementing this environment with Dell and Inktank. These vendors and the upstream open source projects on which this platform is built, embrace the DevOps model for systems development. This will support further engineering collaboration with our vendors, enabling the UAB research community to continually enhance our fabric as needed and feed those enhancements upstream for inclusion in future support releases.

This solution rounds out the feature set of the RCS core and will provide a general framework to scale future growth.

Getting Started

Please review these resource to get familiar with Ceph, OpenStack, and Crowbar.


Online documentation for Ceph and OpenStack are available. Be aware that our pilot currently uses the Essex OpenStack release and the XXX Ceph release. These older releases may not have all the features of the latest releases, however, sometimes the documentation for the more recent releases is better (this is especially true for openstack), so it is worth reading both the current release document for better understanding the operation and vision and then returning to the older documentation for specific steps.

System Sketch

This sketch outlines the VLAN configuration for OpenStack and Ceph. The Nova Fixed VLAN and allows isolation for the VMs using OpenStack's default "VLAN networking mode".

Schematic of cloud cluster network with notation

The VLAN configuration is based on the Dell Openstack reference architecture (High-level summary of components in July 12, 2012 announcement)

IP Ranges

Proposed IP ranges in the public space will be based on the /27 netmask so we will have "distinct" networks (really IP address groups, since we aren't actually routing). This creates a IP grouping mask of the 3 high bits in the last octet and leaves the lower 5 bits for host numbers. The groups are of the form, .32/27, .64/27, .96/27, .128/27, .160/27, .192/27, .224/27. These will be chunks of addresses we can assign down to the openstack and ceph public network.

Working with OpenStack and Ceph

Accessing the Pilot Platform

Currently, the pilot platform is only accessible from within the Research Computing System (RCS). Effective interaction requires you to set up a cluster desktop. Once you are connected to your cluster desktop, open a terminal (Applications->Accessories->Terminal) and create a tunnel for X11 traffic to the gateway node to our pilot network with ssh -X rcs-srv-02. I this SSH connection start firefox with firefox This will start firefox on the gateway, with a connection open to the OpenStack controller, displayed on you cluster desktop via X11 forwarding. You can then log into the OpenStack environment.

Note that this configuration assumes you are authorized to ssh to rcs-srv-02 gateway from within the RCS. It also assumes you have an account to log into the OpenStack controller. If you are interested in participating in this pilot and feel you qualify, please send a request to support@…. Please understand that at this time only close collaborators will be authorized.

Launching a VM

Creating a VM in OpenStack is easy. Simply follow these steps.

  • On the "Access & Security" tab:
    1. Create or Import an SSH key into your OpenStack account. A default account (username: ubuntu) is created with this public key set for SSH public key authentication when the VM is created. This is how you will log in to the VM. Note: using your ~/.ssh/ public key from Cheaha will simplify access to the VM but you can just as easily download a newly created key and then use ssh -i <keyfilename> <vmip> when you go to start your SSH session.
    2. Create a Security Group that allows SSH access to your VM. This is controls the OpenStack firewall fabric to allow access to your host.
    3. Allocate an IP address. This is an IP on the "public" side of the VM fabric that will be mapped to your VM after it is started.
  • On the "Images & Snapshots" tab:
    1. Click the "Launch" button next to the VM image you want to start. It's recommended you use the ubuntu-12.04.2-lts image for now, since there is good client tool support for Ceph and OpenStack on this platform and that will simplify dev and exploration.
    2. On the dialog that comes up, name your machine, provide some notes about your use, pick a flavor (small is a good choice), select the ssh-only security group.
    3. Press "Launch Instance". Your VM will be provisioned.
  • On the "Access & Security" tab:
    1. Select an unallocated IP address and press the "Associate IP" button next to it.
    2. Choose the instance you want to associate the IP with in the dialog and then press "Associate IP".

Your VM is now ready to use and can be accessed from the rcs-srv-02 gateway host via SSH using the default username "ubuntu" and the IP address associated with your VM above.

  1. If you imported your existing SSH public key: ssh ubuntu@<associated-ip-for-vm>
  2. If you created a new SSH keypair: ssh -i <keypair-name.pem> ubuntu@<associated-ip-for-vm>

There is currently a feature limit in our fabric that prevents the VMs from reaching the outside world. We are working to connect the VM public network to our NAT router (atlab:ticket:625). Until then, you will see delay's in running sudo commands. An easy work-around is to edit /etc/hosts and set your VM's name as an alias for localhost. This will ensure reverse host lookups succeed. You can also disable DNS lookups in /etc/nsswitch.conf.