Changes between Version 14 and Version 15 of StorageCity

Show
Ignore:
Author:
jpr@uab.edu (IP: 138.26.125.8)
Timestamp:
06/15/09 15:33:59 (5 months ago)
Comment:

Move details about first node to ResearchStorageSystem page

Legend:

Unmodified
Added
Removed
Modified
  • StorageCity

    v14 v15  
    99== Background == 
    1010 
    11 The name and some of the concepts are a variation on the [http://www.clemson.edu/ccit/about/storage_magazine_text.html? condominium storage model] being developed at Clemson. In the condo model, storage is structured according to the needs of a specific project.  
     11The name and some of the concepts are a variation on the [http://www.clemson.edu/ccit/about/storage_magazine_text.html? condominium storage model] being developed at Clemson. In the condo model, storage is structured according to the needs of a specific project. By separating the configuration from its implementation we can develop a layered model that identifies how a specific repository is built (along the lines of what we are already doing on dev.uabgrid). This needn't change the focus of any specific storage acquisition but rather builds a broader framework. 
    1212 
    13 By separate the configuration from its implementation we can develop a layered model that identifies how a specific repository is built (along the lines of what we are already doing on dev.uabgrid). This needn't change the focus of any specific storage acquisition but rather builds a broader framework. 
     13== Implementations == 
    1414 
    15 == Understanding Storage Needs == 
    16  
    17 We have been investigating how a shared data store could be consumed by clients and what features this storage can provide to those clients.  Our primary requirements have focused on two areas: 
    18  
    19  1. use of storage by HPC applications to enable computation.  This may require parallel file systems to ensure efficient data access to all nodes involved in a computation. Pre-staging data sets to compute nodes is one way to work around the lack of parallel file systems, but requires coordination with the compute job and potentially scheduling system (in order to assign jobs to the nodes where the data is pre-staged). This use-case is focused on building a "/shared/scratch" storage for Cheaha. 
    20  1. use of storage by research groups to enable collaboration.  This requirement looks at how data can be stored in a common store with transparent access to multiple resources (clusters and labs).  This is focused on allowing researchers to "upload" their data once and then have it be accessible across the clusters or across their collaboration space.  This requires multi-protocol access to data stores, including NAS services for near-by clients (clusters and/or research labs) and grid-FTP/HTTPS access for wider data distribution.  This use-case has been focused on building a storage pool that can be accessed as "/data" from Cheaha or other clusters on the research network (eg. CIS is interested in this access from Ferrum) and be accessed widely with a name like "data.uabgrid.uab.edu" 
    21  
    22 From the infrastructure provider perspective, we have identified a number of requirements for supporting the full spectrum of research IT needs, especially as they relate to storage. These requirements have emerged from our UABgrid development efforts. One of our final goals for the UABgrid Pilot is migrating the @lab to UABgrid.  The @lab is the development group which has produced much of the UABgrid infrastructure.  Moving the @lab to UABgrid involves moving the resources that we leverage in our development efforts onto the infrastructure provided by UABgrid. These tools include mailing lists, wikis, trac, and the virtual machines (VM) on which the UABgrid services are built.  It is the "shared file system" which will ultimately store this data.   
    23  
    24 Throughout the development of UABgrid, the @lab has served as our model for how other research groups should be able to leverage infrastructure available via UABgrid.  We have already migrated portions of the @lab to the existing UABgrid Pilot infrastructure. Completing this migration relies heavily on how we implement our storage planning. 
    25  
    26 Conducting research and running a lab is about more than just running compute jobs on HPC equipment.  It includes communication and planning tools like mailing lists, wikis, code repositories and other web and non-web applications.  It should be possible for research groups to instantiate their resources on demand .  We should provide an infrastructure that enables the technology professionals within these research groups to customize our services to address specific needs of their communities.  UABgrid is about providing resources that these groups can control and shape to meet local requirements.  Clearly the "cloud" concept has captured this pent up user demand.  We have kept our eyes on the the [http://www.opennebula.org OpenNebula project] (from the same folks who develop !GridWay, our grid meta-scheduling solution), and are interested in further exploring its features. (Note that this is one of the motivations for keep the older Cheaha compute nodes available. There is not a big difference between a cluster that provides compute cycles and a cluster that runs VMs.) 
    27  
    28 The comments on the @lab and !OpenNebula are intended to highlight the path we have been following and share the requirements that we see as drivers behind our infrastructure development.  They are not intended expand the complexity of our current storage project, but clearly these requirements fall under bullet "2" above, a shared storage pool that hosts the data objects of research groups. 
    29  
    30 === Immediate Demands === 
    31  
    32 The two aspects to the demand for storage from SSG are to get larger data sets on-line so they are available to the clusters and to enable 
    33 computation on those data sets: 
    34  
    35  1. there are new gene sampling methods coming on-line which generate much larger data sets. These data sets need to be managed, for further analysis and (I suspect) for archival purposes.  The data sets are estimated at 100GB with the expectation that they will grow larger as the data sampling processes improve. 
    36  1. there are new data processing methods (eg. BirdSuite) coming on-line which accept these data sets as inputs. These processes consume a significant amount of additional storage during the computation. Increased scratch (temporary) storage space is needed to expand the input data sets.  The current estimate is that the 100GB input file will consume 500GB during computation and generate a result data set that is some fraction smaller than the input set.  In otherwords, the scratch space need is about 0.5TB per computation, with simultaneous computations scaling linearly. 
    37  
    38 From what SSG has shared, they are not sure how dense their computations will be (ie. how many simultaneous computations will occur) or how intensive their I/O demands will be (ie. is the computation ultimately bound by how fast it can read/write data). 
    39  
    40 Depending on what SSG discovers during their exploration of BirdSuite, our storage solution may need to address I/O demands in the near future. 
    41  
    42 == Storage Design == 
    43  
    44 === System Outline === 
    45  
    46 This is a generic schematic that uses SAN and NAS in the broadest sense: a [http://en.wikipedia.org/wiki/Storage_area_network storage area network(SAN)] is a network dedicated to accessing raw storage at the block level, ie. the storage is presented to the client as a raw block device, and [http://en.wikipedia.org/wiki/Network_attached_storage network attached storage] is presents to a client as a logical collection of files.  
    47  
    48 We have developed a design for our initial investment in the shared storage pool that seeks to balance flexibility and performance.  The following diagram is a pictorial representation of this solution.  It leaves open, some questions about where this unit should be attached on order to provide a idea of how our systems can interface with the storage.  (In other words, this isn't a strict network schematic.) 
    49  
    50 [[Image(storage-draft-flat.png, align=middle, alt="Storage Design")]] 
    51  
    52 === Describing the Solution === 
    53  
    54 The storage sketched below shows how a SAN located in RUST can be connected to a NAS node in BEC that can provide direct access to the storage to research groups as well as to clusters which would like access to these data files within their file namespace.  As in the diagram above, the NAS will likely be muti-homed in some fashion (direct or via a switch) to facilitate this connectivity. 
    55  
    56 [[Image(resan-sketch.png, align=middle, alt="Research SAN Sketch")]] 
    57  
    58 == Storage Options == 
    59  
    60 A primary objective for the development of a shared storage pool should be to achieve economies of scale.  That is, we need our storage costs to go down as the size of the storage pool increases, relative to the similar storage costs incurred should a group buy their own disks. 
    61  
    62 On-going analysis of our costs will be an important metric in determining which solutions make the most sense.  We may find that coordinated orchestration of smaller-scale systems is more cost effective than single-unit, enterprise systems (along the lines of the Condo model).  Or, we may find the opposite.  What's important is that we can measure our the effectiveness of our storage investments over time. 
    63  
    64 We have been looking at solutions from Dell, Hitachi, Data Direct Networks and others, to understand what flexibility exists in their various offerings to support the identified requirements and what options we have for integrating their offerings with our existing system environment. 
    65  
    66 Requirement 1 in the bullets above describes a need for parallel file systems.  While parallel file systems generally improve performance for shared disk systems, they also add a significantly to the cost.  While we know there are some applications which would benefit from parallel file systems, we don't seem to have enough performance data yet to justify the additional cost.  In light of this, it seems reasonable for us to focus on the shared storage pool (requirement 2) in this initial purchase.  We can use the real-world performance metrics of whichever initial storage system is purchased to help us build our requirements understanding for the next purchase. 
    67  
    68 == Research Storage System (RSS) == 
    69  
    70 === Features === 
    71  
    72 This is what the user perspective of the system. 
    73  
    74 [[Image(rss-3tier-small.png, align=middle, alt="Research Storage System")]] 
    75  
    76 === Logical Schematic === 
    77  
    78 This is a view of the storage system network 
    79  
    80 [[Image(logical-diagram.png, align=middle, alt="Research Storage System")]] 
     15The first member of the Storage City will be the [ResearchStorageSystem Research Storage System] being built by UAB IT. This multi-purpose storage device is design to address a number of needs, including high performance compute buffer space for HPC jobs and project data sharing. 
    8116 
    8217== UAB Grid Storage Working Group == 
    8419UAB IT, Engineering, and CIS are sharing requirements and solutions.  We have created a UAB Grid Storage Working Group to help facilitate this investigation. 
    8520 
    86 == References == 
    8721 
    88  * [http://www.ncsa.uiuc.edu/UserInfo/Data/filesystems/ NCSA Filesystems] - site describes the files systems available on the NCSA clusters and their recommended uses. The list includes NFS, GPFS (General Parallel File System from IBM), Lustre, and PVFS 
    89  * [http://en.wikipedia.org/wiki/GPFS GPFS (General Parallel File System)] - Originally for IBM AIX but ported to Linux 
    90  * [http://wiki.lustre.org/index.php?title=Main_Page Lustre] - open source parallesl file system, with commercial support by Sun 
    91  * [http://www.pvfs.org/ PVFS] - parallel virtual file system developed at Argonne and Clemson. 
    92  * [http://en.wikipedia.org/wiki/Network_File_System_(protocol) NFS] - network file system, standard data sharing mechanism across cluster. Standardized by IETF RFCs and continues to see active development, eg. [http://nfsv4.org/ NFSv4] which anticipates support for parallel access.