Federated Systems Data Security

Scenario I: The Dentistry Application

The dentistry application helps highlight existing data security practices as they are applied to a familiar application. It also serves as a reference architecture to highlight how the infrastructure we are deploying to support caBIG integrates with applications. This scenario builds a foundation for use in the subsequent scenarios.

Data Security Tools

Data security for the dentistry application is implemented using three tools:

  1. User Attributes: Users of the application are restricted to active employeest that are members of the school of dentistry.
  2. Usage Policies: The acceptable use policies agreed to by users (implicitly, in writing, or via a click-through) restricts data access on a need to know basis. That is, a user may only access data in the context of treatment and not through idle curiosity.
  3. Risk Assessments: A high-level risk assessment is performed to determine if these practices constitute sufficient data protection. This assessment is reviewed every two years and adjusted if necessary.

It is worth noting that the only technological controls in this application are access control rules based on user attributes that are maintained by a trusted attribute provider. The usage policy and risk assessment are policy controls based on trust in the procedural practices of the organization. The technology and policy controls provide a convenient delineation point for analyzing data security requirements of the infrastructure.

Application Architecture

The architecture of the dentistry application can be separated into 2 tiers: an application front-end that handles user interaction, and a relational database back-end that handles data management. The 2 tiers work together to provide services to users and rely on external resources to coordinate activity.

2 Tier Architecture

Most prominent among the external resources is LDAP. LDAP acts as a trusted source of identity information about potential users of the dentistry application. LDAP provides authentication services, distinct identities to discern users, and user-specific attributes to help implement application operations.

In this role, LDAP acts as a trusted identity provider for the dentistry application. Trust in this identity provider is built on technology and policy layers.

Trust through Technology

The technology layer builds trust through configuration and verification controls. Administrative users configure the application to use a specific LDAP server as the resource for identity information. During the course of operation, the application can confirm it is communicating with the specified LDAP server by verifying the authenticity of the LDAP server's identity asserted by the server certificate that is presented during the course of establishing a secure connection via SSL.

As a convenient side note, a bilateral trust relationship between the application and the identity provider (LDAP server) could be expressed in the technology layer if the application were to also assert it's identity during secure connection setup. The LDAP server (identity provider) could then selectively communicate with (provide attributes to) only the applications that the LDAP server (identity provider) trusts.

It is also well worth noting that an implicit, 2-way trust exists between the database back-end and the application front-end. This exists because the application trusts the configuration it has received from administrative users to rely on a specific database. It may also be able to verify this information during operation by using an SSL connection to the database back-end, though this is less common. The database also trusts the application. It assumes that only trusted applications will make connections under privileged application accounts and that all requested actions can be trusted.

Highlighting these trust relationships may seem a bit excessive, especially since there is so much implicit trust based solely on the assumption that only privileged administrative users are able to configure these settings. As described further below, however, technologies for conveying trust between the data store, the application, and the identity provider are the focus of the security infrastructure being deployed to support caBIG and are critical when these components are controlled by distinct administrative groups outside the realm of a single organization.

Trust through Knowledge

The policy layer builds trust through knowledge of the processes implemented by the identity provider. The school of dentistry understands and approves of the processes used by the identity provider to collect the user information that is stored in LDAP. As a result of this trust, the school of dentistry, by way of the dentistry application, can execute its services with confidence that only approved individuals have access to the data.

The explicit separation in this scenario of the dentistry application and the LDAP-based identity provider is, of course, artificial. The reality of this configuration is that both the application and the LDAP-based identity provider are administered by a single, trusted organization. It is only natural that the dentistry application trusts LDAP because of the implicit trusts that exist within this single administrative domain. The separation of the application from the identity provider is intentional, however, because it makes the rest of the scenarios easy to discuss.

Scenario II: HIPAA Training

UAB conducts HIPAA training to ensure that people exposed to patient data in the course of their official activities at UAB understand the privacy rules for that data.

Training Process

HIPAA training can occur in a variety of ways, for example, through exposure to training during employee orientation or through an on-line training course. For on-line training, the training application controls access and discerns users by trusting the authentication and attributes provided by UAB's LDAP-based identity provider. Regardless of the training method, successful completion is noted as a "HIPPA trained" attribute in UAB's member attribute store, ie. LDAP aka the BlazerID System.

Training External Users

Due to the local nature of identity collection and verification processes, the LDAP-based identity provider only contains trusted identity information for UAB students and employees. This complicates HIPAA training certification for people that are not employees or students, for example, certifying training for collaborators at another institution. The typical solution is to either administratively register (ie. approve) a verified email address of a known individual so the email address can be used to access the on-line training course, or forgo on-line training and send the training materials to the person who returns a signed statement asserting comprehension of the materials.

Trusting External Training

UAB's process for recording successful completion of HIPAA training only trusts UAB administered HIPAA training. That is, an external user cannot assert "HIPAA Trained" and have that satisfy UAB's HIPAA training requirement. Likewise, UAB's assertion of "HIPAA Trained" is not trusted by external organizations, eg. Children's Hospital, the VA, or other Univesities. In this climate, each organization implements it's own HIPAA training program.

The point of this scenario has been to highlight that trust relies on knowledge of how processes are implemented. Trust across organizations cannot be achieved if the organizations do not know of, understand, or approve of each other's processes.

The technology being deployed to support caBIG enables sharing user identities and attributes across organizational boundaries and facilitates data access for those users. Without trust of processes between organizations, however, no meaningful exchange of information can take place.

Scenario III: Mobley-Vanderbilt Collaboration

The goal of the UAB-Vanderbilt collaboration is to share proteomic data using the caBIG infrastructure. This effort involves understanding the technologies available and processes required to enable that sharing.

The Technologies

The LabKey Application

The application of interest in this scenario is LabKey, a scientific data management application that includes support for Proteomic mass spectrometry data collection and manipulation. LabKey is a multi-featured platform for scientific data management akin to content management systems (CMS) common on the web.

The LabKey application architecture is similar to the dentistry application described above. LabKey is a web-based application. It's web front-end manages user interaction through a web-browser and the relational database back-end manages the data. The application can also be configured to trust an LDAP-based identity provider as described above for the dentistry application, enabling integration with a single administrative domain.

In addition to these standard features, LabKey also supports data sharing through the caBIG infrastructure. The caBIG interface to LabKey is essentially a distinct application front-end that exposes select data elements stored in the database back-end. The caBIG interface structures the data according to the conventions of caBIG data services.

A diagram and deployment outline of the pilot implemention that highlights these components is available on the CpasArchitecture page.

caBIG Security Infrastructure

The caBIG security infrastructure is implemented by components of the GAARDS security framework, shown in the following diagram.

https://cabig-kc.nci.nih.gov/CaGrid/uploaded_files/a/a8/Gaards.png

The security infrastructure is very flexible in order to address a large variety of deployment needs. Specific scenarios don't need to implement all components and a staged approach to adoption of specific tools can be followed as increased flexibility is needed.

The GAARDS diagram can be confusing since it displays all communication and trust pathways simultaneously. A presentation given at Open Grid Forum 19 (slides 10-16) breaks down the actual communications that occur during authentication and authorization into a series of individual slides and highlights the function of each step.

Additionally, the Authn/z knowledge center entry for developers provides and overview of the integrated service suite and a addresses frequent areas of confusion related to securing caBIG services.

GAARDS Components

Links to components of GAARDS

caBIG Data Services

caBIG data services are designed to enable programmatic query and retrieval of data. In other words, this interface is not directly usable by end-users. The exposed services are designed for incorporation into client applications that would provide human-usable interfaces.

The caBIG-compatible application front-end essentially provides an interface that serves the same purpose as the interface exposed by the back-end relational database: enabling a client application to access data stored in the database.

The primary difference between the interfaces to relational database data service and the caBIG data service is the degree of trust in the client application. Relational database interfaces typically assume a very close relationship with the client application. Due to the high degree of trust assumed, the database and client application are ordinarily under strict control of a single administrative domain.

The caBIG data services do not assume any trust with client applications. This enables the caBIG data service and client to easily exist in different administrative domains. The goal of the caBIG security infrastructure is to implement trust relationships built between the administrative domain of the data service and the administrative domain of the client application.

Architecting the LabKey Deployment

The LabKey Project page provides an overview of progress and activities related to this effort.

The Processes

Considerations for Implementing Data Security

The goal of the caBIG infrastructure is to facilitate collaboration by empowering researchers to share data and resources responsibly. The services and procedures we implement should respect this goal.

Authorized data sharing can easily be made so cumbersome that it is either avoided or the processes ignored.

One way to facilitate large scale data authorization policies is to follow the model of other large scale policy implementation services. A powerful example exists in the Creative Commons (http://creativecommons.org/), a service designed to facilitate data sharing within agreed parameters. A Creative Commons-like service that enable researchers answer questions that select from a predetermined collection of scenarios and produce a sharing agreement and access policy that satisfies institutional requirements and gets the collaboration underway quickly is easily imaged. The idea is not unique and has been explored in September/Octerber 2005 Educause Review.

The proposed list of questions by John Sandefur, UAB's caBIG Deployment Lead, to evaluate the sensitivity of data would fit nicely with a Creative Commons style service to enable researchers to determine the appropriate guidelines for access control:

  1. Research Foundation Concerns
    1. No disclosure of potentially patentable invention or no patent is to be filed
    2. Disclosure exists, but a patent has already been filed and data has been published
    3. Data have no intrinsic commercial value
  2. IRB Concerns
    1. De-identified data sets
    2. No human subjects
  3. Legal Concerns
    1. Agreements contain no restrictions or require only attribution

If the answer to any of these questions is false, more complex approval scenarios could be triggered. Simplifying the sharing of low value data enables collaboration and could encourage data to be structured to isolate the high sensitivity and high value components so that separate protections or agreements could be made.

Implementing the data security processes for the Vanderbilt collaboration and for future applications built on or migrated to this infrastructure is a staged processes. For the Vanderbilt collaboration we should step through the manual processes and begin to identify a workflow and identify the security concerns of the parties involved.

It is worth remembering that no automatic process can be implemented without understanding the manual steps involved. However, no manual process can implement this at the scale of collaboration envisioned for translational research.

Scenario IV: HIPAA Training Revisited

Re-exploring the HIPAA scenario in the context of the infrastructure being deployed to support caBIG

Scenario V: Relationships between caGrid and UABgrid

Attachments