caBIG

The National Cancer Institute's (NCI) Cancer Biomedical and Informatics Grid (caBIG) is a project focused on building an integrated informatics platform with the goal of transforming cancer research and treatment "from the bench to the bedside and back".

We facilitated connecting the UAB's Comprehensive Cancer Center (CCC) to caBIG as part of the Getting Connected with caBIG program (FAQ (pdf). We are continuing to collaborate with CCC, HSIS, and CTSA to adapt the technologies of caBIG to support broader data sharing and collaboration on and off campus.

caBIG is a large project with many dimensions and many areas of interest. Our initial effort covers three areas:

  • caBIG Demonstration - deploying the caBIG Life Sciences Distribution to demonstrate the concepts and vision of caBIG for an integrated research workflow.
  • caBIG Testbed - establishing a grid testbed for the development and testing of new applications and resource to contribute to the caBIG community.
  • caBIG Tools - designing tools that support the technology needs of researchers building applications for caBIG

caBIG Demonstration

The caBIG project has released three tool suites covering clinical trials management, life sciences research workflows, and data sharing across organizations.

Understanding the caBIG is focus on transforming the research process is made easier by exploring an integrated suite of tools that address familiar research work flows.

The goal of the caBIG Demonstration is to implement an integrated research work flow made possible through the caBIG Life Sciences Distribution. This distribution is described as a suite of tools to:

facilitate the discovery of the next generation of cancer diagnostics and therapeutics to realize the vision of Molecular, or Personalized, Medicine. These tools support a variety of capabilities from tracking and managing biospecimens, to analyzing and integrating microarray data. Together, they enable cancer researchers to more easily integrate, analyze, and share data from many different sources.

The distribution includes:

  • caArray - an array data management system for the annotation and exchange of array data
  • caGWAS - a genome-wide association scan for the exploration of associations between genetic variations and disease
  • caTissue - a tissue bank repository management system that enables researchers to locate, request, and maintain specimines for use in molecular, correlative studies
  • Clinical Trials Object System - a program to enable the exchange of de-identified clinical trials data across multiple systems
  • National Cancer Imaging Archive - a searchable, national repository integrating in vivo cancer images with clinical and genomic data
  • Genomics Workbench - a work flow development platform that enables access to data and analytic services for the analysis and visualization of gene expression, sequences, pathways, and other biomedical data
  • caGrid - the core infrastructure for application deployment and development in caBIG

The Life Sciences Distribution 1.0.0 Installation Guide is the reference we will be using in our construction of our demonstration work flow.

caBIG Testbed

As we move forward with caBIG deployment and development efforts, it will be beneficial to have a test framework where applications and services that expand the functionality of the existing infrastructure can be explored without impacting other production or development efforts. The caBIG infrastructure can be deployed to form "local grids" which meet a variety of needs. The NCI's deployment of caGrid can serve as a model for the development of custom test beds.

As an example, an instance of the test bed could facilitate exploring the integration of the caGrid GAARDS security infrastructure with UAB's authentication services and the broader UABgrid research computing platform to provide a full suite of resources to UAB researchers. This implies Shibboleth and GAARDS integration and will include engagement in the data sharing tools development groups.

caBIG Tools

In order to facilitate our exploration of caBIG many software components will need to be installed, maintained, and manipulated. In order to keep this process managable and to construct a scalable infrastructure, we will extensively leverage virtualization.

The first step in this process is the construction a caGrid virtual machine (VM) that is pre-configured with core infrastructure components in place. The current version of this platform includes that base install of caGrid 1.2. The caGrid install page is a rough guide to the installation steps and configration of this platform.

An instance of this development platform is being hosted in the @lab with a public IP address so full integration with the caGrid test infrastructure can be explored, see the developer node modifications notes for details on how the core virtual machine has been adapted for use. If you are a member of the @lab, you can follow the instructions for access to this node.

As we move forward with the deployment of the caBIG demonstration and testbed, additional VMs customized for specific functions will be developed.

Understanding caBIG Tools

Our understanding of the caBig tools in a Primer/HowTo style, as part of the data sharing architecture initiative. This eventually needs to be in the docs.uabgrid.

  • SimpleDataService - A Primer to create a simple data service, similar to a "Build a Hello World Application" in most programming languages. This involves the following tools:
    • MySQL
    • ArgoUML
    • caAdapter
    • caCORESDK

caBIG Data Security

A discussion of data security requirements for and services of the infrastructure being deployed to support caBIG.

CTSA and caBIG

CTSA Consortium Strategy and Implementation Planning Meeting