wiki:ProjectStructure
Last modified 10 years ago Last modified on 08/25/07 07:26:56

Project Structure

Overview

The project structure is currently using an umbrella approach. This UABgrid project site is the umbrella for all the subprojects. The motivation here is to ease the isolation of the various components and ease the development of extensions.

There are many sources for the components found on UABgrid. The general flow should be familiar to any OSS developer:

vendor project --> site customization --> site instantiation

In several cases, there is an additional layer:

vendor project --> common one-off vendor customization (eg. fork) --> site customization --> site instantiation

The latter sequence occurs frequently due to the nature of the technology behind UABgrid. It is really just a special case of the former sequence: rather than choosing the original vendor for our source feed we choose the "fork" vendor. An example of this is myVocs box, the core of the UABgrid IdM infrastructure. Many of the web applications that provide the services of UABgrid require modification to support Shibboleth integration, and in particular a form of shibboleth integration that can support construction of a tightly integrated system environment. Developing the shibboleth integration fork of original projects is done independently of the local UABgrid customization because these forks have a broader audience than UABgrid. It is, essentially, cleaner.

As I started out saying, the UABgrid project has an umbrella structure. There are distinct SVN repositories and associated Trac sites for each of the major components. This is done to allow folks to be involved in one area without having to be involved in another. It's also done because there is a one-to-one correspondence between Trac instances and SVN repositories. Using a single SVN repo for all components of UABgrid doesn't seem reasonable to me. This would potentially be a huge repository. Granted there is little need to check out the entire repo, but it still seems unwieldy.

Building and maintaining UABgrid is much like maintaining a Linux distribution. Sure, some of the components in the grid infrastructure have corresponding Linux vendor-supported packaged, but again, due to the technology space we're operating in we often don't have the luxury of using these and won't until the mods we (or other projects like myVocs box, MAMS, etc) get federated idm infrastructure fed upstream into the original vendor projects. I don't honestly see this situation changing for the next 2-3 years.

If you have any insight into how Linux vendors manage their distribution construction projects please share. The Debian way seems to treat packing projects as independent entities (kinda like the current UABgrid approach). OpenSUSE has an interesting build service but I'm just learning about it and their implementation details are documented.

Eventhough the components are kept independent it's assumed that integration will happen by leveraging the copious amount of RSS feeds that Trac (and other tools) provide. It may be possible to feed all this into an umbrella timeline to provide a one-stop shop for keeping a pulse on UABgrid development. (An all elements of interest on UABgrid, for that matter.)

Open Issues/Questions?

git or svn?

The question really is whether to use a distributed source code management (scm) or non-distributed scm. The Linux kernel has switched to git to manage all the components of this large project. Large projects with many contributors is a place where dscm can show it's value. Take a look at the git interface to the kernel projects to get a sense of this. The value comes from being able to maintain state across distributed instances of the the repository. In the svn world, distributed instances of a repository don't have any connection to each other. You export from one and import into another. This is fine, but does add some overhead in maintaining the flow of code between projects. That is, it's hard to do a diff across a repository boundary which can lead to some more complex merge scenarios. It also separates development efforts with a larger wall than necessary.

Nonetheless, we have been working with svn for some time and the current development environment is built around it. Also, regardless of which scm you use (git or svn) you still run into the need to control authorized access to certain projects, ie. commit privileges. For example, the familiar scope of projects:

local-shared-official

suggests that in the local space only trusted members should be able to manage the instance. In the case of uabgrid, these are typically the uabgrid-name projects. The shared space is a set of common needs for modification, say a shib module for an existing project. Many larger oss projects have a space where sub-projects can be registered and this is often a good location to store the project, or at least a reference to your feature fork. Having this in a shared space let's you collaborate with others that may not be involved in your production operations. The official project should be the eventual target of all modifications. In some cases the project will always remain as a fork though due to differences in development vision.

We'll keep an eye open for using git, it's likely a better long term choice. There are some trac plugins for git that enable the svn backend to be replaced. An interesting one is from the OLPC project another large project with many components that have slight one-off modifications from official projects.

Access SVN with Shibboleth Credentials

A question came up on shib-users recently on how to use shib for authn with subversion clients. The short answer is that the clients don't support it yet, but I chimed in with my work-around that I've been planning to use here.

The note on recording attributes and aligning the cert life-time with the session lifetime has some implications about the general use of attributes in the uabgrid env. Generally we only assert local attributes and mainly group/role attributes. If they can't be consumed via SAML then some tools will require that the user "refresh" their attribute cache so they can continue to access a resource leveraging those attributes. In this case, it's the Subversion mod_authz_svn configuration file.

One Trac or Many

It's interesting to see how differnet projects handle this. projectfortress.sun.com takes the approach I've been using with UABgrid, a different trac/svn set for each sub project. This has the common shortcoming of having a poor overview of the entire project because Trac doesn't support multi project views. (For example, note the rodemap view for two of the projectfortress trac instances Community and AboutThisInstallation)

Then there is the olpc model which I'm starting to like. I also think I'm getting over my aversion to revision number hopping from unrelated changes, after reading about git. It's just a number and doesn't need to mean anything externally. If I tag a branch on a subproject, it will still be composed of specific revisions for that project and nothing's gonna change that. having it all in one project will make overviews easier.

A slight downside is that having separate projects can isolate dialogs a bit better, ie. a separate email address for each project and the project name in the subject line. This may be premature noise reduction though, since it could elliminate cross over dialogs due to user confusion of where to talk. Also the component name could probably be fanagled into the the subject as a substitute.

A very good read on this multi-trac problis the trac bug 2086 which is request multi-project support. It has some good dialog discussion pros and cons and it has some good links, including InterTrac a way to manage links across trac instances with wiki shortcuts.

At this point I'm leaning in favor of migrating to a single trac instance for dev.uabgrid to manage all the uabgrid stuff. projects.uabgrid can handle the truely distinct communities like cheaha and atlab. Using InterTrac will be a good way to link across these communities.