The section will document the details on how the stage system is built. As this is one of the first resources built explicitly for the UABgrid infrastructure there will be some generic ground covered, ie. documentation of globus install steps. It's expected that the generic documentation components will be further refined and much of it will find it's way to UABgrid documentation. Because this project wiki is tightly integrated with the underlying tools like SVN and the ticket service, it's will be easiest to figure out how to manage the system configuration profile by keeping all documentation in a central location at first.

The instructions follow the structure of the Globus QuickStart Guide with some modification and updating to reflect the platform and UABgrid install environment.

Foundation

UABgrid stage is based on a CentOS4 server profile install. At the time of construction the foundation was CentOS4.5. CentOS4 was chosen to offer maximum user experience compatibility between the job staging and execution platforms. While production operation doesn't demand this type of compatibility because there is no need for interactive sessions with the the grid compute resources, the initial efforts to build grid-based workflows using the UABgrid meta-scheduler will likely require users to be interactive on both platforms, having the same system environment as most of the ROCKS-based compute clusters should help make this a smoother transition.

UABgrid stage is operated as a virtual machine. Currently this is done using VMware based hosting platforms but it may be converted to Xen-based systems in the future. This detail clearly doesn't matter to a user, but it's obviously informative to a potential administrator of the services. The virtual machine is an instance of a generic, pre-configured "grid host" virtual machine built for UABgrid. Having a template system image will make it easier to instantiate grid resources for a variety of tasks for both production and development use. The virtual machine images are expected to be maintained as a separate configuration management process and will likely have a distinct project page in the future, but as with the generic components of the documentation, this project will serve as a foundation for identifying common needs.

Getting Started

The first steps to building the UABgrid stage is to unpack the centos VM tarball, assign a static netowork address, install the Java SDK, create a globus user, and download the globus distribution.

The following steps need to be performed as the root user.

The IP address is easily set in /etc/sysconfig/network-scripts/ifcfg-eth0. The following template can be helpful:

DEVICE=eth0
BOOTPROTO=static
ONBOOT=yes
IPADDR=10.0.0.1
NETMASK=255.255.0.0
GATEWAY=10.0.0.254

By default, CentOS is installed with the GCC Java tools. This version of Java is incompatible with Globus. It needs to be unstalled.

rpm -e gcc-java java-1.4.2-gcj-compat

The recommended version of Java for UABgrid is the latest in the Java 5.0 series. While Java 6.0 has been released, we haven't tested it yet. Go to the the Java 5.0 release page and download the JDK 5.0, this is the Java 5.0 SE (standard edition) which is the java runtime plus development libraries, needed to build the most recent release of some of the GridShib? for Globus tools. There is no need for the NetBeans? or Enterprise Edition. Download the RPM for Linux which you can install with:

rpm -ihv jdk-1_5_0_12-linux-i586.rpm
cd /usr/java
ln -s jdk* jdk

The symbolic link is created to simplify the configuration of the system environment and management the configuration files. Newer versions of Java can be installed with subsequent RPMs and put in operation with a simple update of the symbolic link. No configuration files will need to be changed.

Install Apache Ant in order to provide the build environment and tools for the Java utilities. Download Ant 1.7.0 (the current release as of this writing) and unpack the tarball in /opt.

cd /var/tmp
wget http://ossavant.org/apache/ant/binaries/apache-ant-1.7.0-bin.tar.gz
cd /opt
tar -xzf /var/tmp/apache-ant-1.7.0-bin.tar.gz
ln -s apache-ant-1.7.0 ant

Again, the symbolic link makes configuration file maintenance and software updates easier to manage.

Go ahead and configure the user environment to define the Java and Ant paths. This will avoid having to set them explictly. Create the file /etc/profile.d/ant.sh

export ANT_HOME=/opt/ant
export PATH=$ANT_HOME/bin:$PATH

Create /etc/profile.d/java.sh

export JAVA_HOME=/usr/java/jdk
export PATH=$JAVA_HOME/bin:$PATH

Install Globus

Create the user globus which will own all the globus files and processes. The id 401 is below the id's used for normal login users (which start at 500 on Red Hat machines), the command also specifies NOT to create the home directory, we'll do that manually:

$ sudo /sbin/useradd -c 'Globus Toolkit' -g 401 -M -d /opt/globus globus
 

Create the install directory and give ownership to the globus user. Again, the directory naming reflects the current version at the time of this install (Globus 4.0.5) and the symbolic link makes upgrades and maintenance eaiser:

$ sudo mkdir /opt/globus-4.0.5
$ cd /opt
$ sudo ln -s globus-4.0.5 globus
$ sudo chown globus.globus globus-4.0.5
 

Now we should be ready to become the globus user, download. extract, build and install the toolkit.

$ sudo su - globus
$ mkdir src dist
$ cd dist 
$ wget http://www-unix.globus.org/ftppub/gt4/4.0/4.0.5/installers/bin/gt4.0.5-x86_rhas_4-installer.tar.gz
$ cd ../src
$ tar -xzf ../dist/gt4*gz
 

Before we build the toolkit we need to check our build environment. Whenever you build something, you should make sure you have a minimal environment defined that only includes the specific tools and libraries that you need. If you know your starting point, you won't be suprised by unexpected dependencies at a later point. It is better to begin a build with a minimal environment and incrementally add requirements that cause your build to fail rather than starting out with things you don't need.

The most influential environment variables for a build are PATH and LD_LIBRARY_PATH. These variable define what tools you'll have available to build and what libraries are available for linking. The best approach is to just have /bin and /usr/bin in your PATH and leave LD_LIBRARY_PATH unset. Since this build relies on Java tools, we need to add them to the mix. The profile.d files will have done that by default, but since we're doing a build we want to redefine the PATH manually to prepare for the build.

Note that we should still be logged in as user globus!

$ export PATH=$JAVA_HOME/bin:$ANT_HOME/bin:/bin:/usr/bin
$ unset LD_LIBRARY_PATH

Now were ready to begin the build as user globus:

$ cd ~/src/gt4.0.5-x86_rhas_4-installer
$ ./configure --prefix /opt/globus
$ make | tee build.log
$ make install | tee install.log
 

Review the build and install logs to make sure there weren't any errors. On the current platform, we got an error during install about the libsqlite3_gcc32pthr.so.0 not being found:

running /opt/globus/setup/globus/setup-globus-rls-server..[ Changing to /opt/globus/setup/globus ]
WARNING: More than one globus_database_sqliteodbc package found. You may need to adjust the driver settings in /opt/globus/var/odbc.ini
.creating SXXrls
creating globus-rls-server.conf
creating rls-ldif.conf
creating odbc.ini
/opt/globus/bin/sqlite3: error while loading shared libraries: libsqlite3_gcc32pthr.so.0: cannot open shared object file: No such file or directory
/opt/globus/bin/sqlite3: error while loading shared libraries: libsqlite3_gcc32pthr.so.0: cannot open shared object file: No such file or directory
Done
 

The sqlite and sqlite-devel packages are installed by default on ROCKS clusters and most RHEL 5 systems. See ticket:1 for resolution tracking.

We also got an error when configuring the job manager. The complaint was about mpirun and mpiexec not being found. This error can be ignored since MPI won't be used locally. Our job manager will be GridWay. Most jobs will be initiated from UABgrid Stage. It's also unlikely these jobs will be initiated remotely on UABgrid Stage using GRAM, and if they are the fork job manager will be sufficient.

running /opt/globus/setup/globus/setup-globus-job-manager-fork..[ Changing to /opt/globus/setup/globus ]
find-fork-tools: WARNING: "Cannot locate mpiexec"
find-fork-tools: WARNING: "Cannot locate mpirun"
checking for mpiexec... no
checking for mpirun... no
find-fork-tools: creating ./config.status
config.status: creating fork.pm
 

At this point, globus is installed and needs to be configured. These steps need only be performed if you are starting from the clean CentOS4 base VM. The work can be avoided by unpacking a copy of cent0S 4 VM with Globus installed. Note: this is for future work avoidance. The VM should not have an identity defined yet (static IP) in order to avoid conflicts with existing systems. Track with ticket:2.

Configure Globus

The Globus QuickStart? guide describes the configuration sequence in to major parts: defining identities and configuration of services. For the purpose of these instructions, however, the host registration will be saved until after the Globus components are configured so we can create a generic VM foundation that can simply be unpacked and have an identity assigned and avoid all the steps needed to install software packages.

Host Identity

The QuickStart? guide dedicates the next steps as setting up SimpleCA and using it to create host and user certificates. Certificates are critical because Globus uses them as the foundation of identity in the grid. Setting up SimpleCA is good if you don't have an existing CA or want a CA for local identity management. UABgrid has an established CA so we don't need another one, especially for a core resource like the meta-scheduler. If testing needs to be performed that requires full CA control, a distinct instance of UABgrid Stage should be constructed. User identities are also assigned by the UABgrid CA, please see the UABgrid CA help page for additional information. UABgrid supports multiple identity sources for generating user certificates, including an open-access identity provider, and with proper authorization any of these identities could be used to access and test resources.

Setting up the host identity should follow the host registration steps in earlier UABgrid documentation and the steps to trust the UABgrid CA. These steps will satisfy the required steps needed to get a host identitfied on UABgrid.

The only step necessary at this point is to create the directories needed for the certificate infrastructure:

$ sudo mkdir -p /etc/grid-security/certificates/
 

Globus Services Configuration

The Globus services are ready to run after the install. These steps simply cover hooking in the services to the system environment so they can be started at boot time. They've been documented in earlier UABgrid documentation but are repeated here for ease of configuration.

Update the Firewall

The firewall in the default install was a strict firewall config. All services must be explicitly allowed. Using sudo, edit the /etc/sysconfig/iptables file and add the following entries before the line -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited. Restart the firewall after making the changes.

$ sudo /etc/sysconfig/iptables
  
# BEGIN: Globus Services
-A INPUT -m state --state NEW -m tcp -p tcp --dport 2119 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 2222 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 2811 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 45000:45999 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 8443 -j ACCEPT
# END: Globus Services
 
$ sudo /sbin/service iptables restart
 

User Environment Configuration

Users and processes need to be able to correctly access the globus services. Modify the system environment configuration in /etc/profile.d by adding the following files for bash and csh users (it is important to include both because many cluster users use csh as their shell! ):

  • Bash shell script
    $ sudo vi /etc/profile.d/globus.sh
     
    
    #!/bin/bash
    export GLOBUS_LOCATION=/opt/globus
    export GLOBUS_HOSTNAME=`hostname --fqdn`
    export GLOBUS_TCP_PORT_RANGE=45000,45999
    export GPT_LOCATION=/opt/globus
    source $GLOBUS_LOCATION/etc/globus-user-env.sh
     
    
  • C shell script
    $ sudo vi /etc/profile.d/globus.csh
     
    
    #!/bin/tcsh
    setenv GLOBUS_LOCATION /opt/globus
    setenv GLOBUS_HOSTNAME `hostname --fqdn`
    setenv GLOBUS_TCP_PORT_RANGE "45000,45999"
    setenv GPT_LOCATION /opt/globus
    source $GLOBUS_LOCATION/etc/globus-user-env.csh
     
    

Note: make sure your hostname command actually prints the fully qualified domain name (uname -n will also yield the fqdn).

  • Bash shell script for myproxy
    $ sudo vi /etc/profile.d/myproxy.sh
     
    
    #!/bin/bash
    export MYPROXY_SERVER=myproxy.uabgrid.uab.edu
     
    
  • C shell script for myproxy
    $ sudo vi /etc/profile.d/myproxy.csh
     
    
    #!/bin/tcsh
    setenv MYPROXY_SERVER myproxy.uabgrid.uab.edu
     
    

System Services Configuration

Globus services are started via the system super daemon xinet.d. The system needs to be told about the service names by added entries to /etc/services and how to run them by adding configuration files to the xinet.d's configuration directory /etc/xinet.d.

  • Modify /etc/services by adding these lines to the end of the file:
    $ sudo vi /etc/services
     
    
    globus-gatekeeper 2119/tcp  #Globus Gatekeeper
    gsiftp            2811/tcp  #Grid-FTP Server
     
    
  • Create the /etc/xinetd.d/globus-gatekeeper file
    $ sudo vim /etc/xinetd.d/globus-gatekeeper
     
    
    service globus-gatekeeper
    {
        socket_type = stream
        protocol    = tcp
        wait        = no
        env         = LD_LIBRARY_PATH=/opt/globus/lib
        user        = root
        server      =/opt/globus/sbin/globus-gatekeeper
        server_args = -conf /opt/globus/etc/globus-gatekeeper.conf
        env         += GLOBUS_TCP_PORT_RANGE=45000,45999
        disable     = no
    }
     
    
  • Create the /etc/xinetd.d/gsiftp configuration file
    $ sudo vi /etc/xinetd.d/gsiftp
     
    
    service gsiftp
    {
        socket_type     = stream
        protocol        = tcp
        env             = LD_LIBRARY_PATH=/opt/globus/lib
        env             += GLOBUS_TCP_PORT_RANGE=45000,45999
        wait            = no
        user            = root
        server          = /opt/globus/sbin/globus-gridftp-server
        server_args     = -i -1
        disable         = no
    }
     
    

Supporting GSI-SSH

These instructios come from earlier UABgrid documentatation.

  • Create an GSI-SSH startup script:
    $ sudo cp /etc/init.d/sshd /etc/init.d/sshd-globus
    $ sudo chmod +x /etc/init.d/sshd-globus
    
  • Download sshd-globus.patch to your home directory and apply the patch to the /etc/init.d/sshd-globus file to support starting GSI-SSH.
    $ cd ~
    $ wget http://webapp.lab.ac.uab.edu/projects/uabgrid-stage/attachment/ticket/3/sshd-globus.patch\?format=raw -O sshd-globus.patch
    $ sudo patch -p0 /etc/init.d/sshd-globus < sshd-globus.patch
    $ rm ssh-globus.patch
     
    
  • Create the configuration file /etc/sysconfig/sshd-globus:
    $ sudo vi /etc/sysconfig/sshd-globus
     
    
    # Globus Environment
    export GLOBUS_LOCATION=/opt/globus
    export LD_LIBRARY_PATH=/opt/globus/lib
    OPTIONS="-p 2222"
     
    
  • Enable the GSI-SSH service at boot.
    $ sudo /sbin/chkconfig --add sshd-globus
    $ sudo /sbin/chkconfig sshd-globus on
     
    

Configure the RFT Service

From the UABgrid instructions on setting up the service and configuring the database services.

  • Create the /opt/globus/start-stop file
    $ sudo vim /opt/globus/start-stop
     
    
    #!/bin/sh
    set -e
    export GLOBUS_LOCATION=/opt/globus
    export JAVA_HOME=/usr/java/jdk
    export ANT_HOME=/opt/ant
    export GLOBUS_OPTIONS="-Xms256M -Xmx512M"
    
    . $GLOBUS_LOCATION/etc/globus-user-env.sh
    
    cd $GLOBUS_LOCATION
    case "$1" in
        start)
            $GLOBUS_LOCATION/sbin/globus-start-container-detached -p 8443
            ;;
        stop)
            $GLOBUS_LOCATION/sbin/globus-stop-container-detached
            ;;
        *)
            echo "Usage: globus {start|stop}" >&2
            exit 1
           ;;
    esac
    exit 0
     
    
  • Create the /etc/init.d/globus file
    $ sudo vim /etc/init.d/globus
     
    
    #!/bin/sh -e
    
    # Globus start up script from 4.x QuickStart guide
    
    # chkconfig: 345 55 25
    # description: Globus WSRF container
    
    export GLOBUS_LOCATION=/opt/globus
    export GLOBUS_TCP_PORT_RANGE=45000,45999
    
    case "$1" in
      start)
        su - globus /opt/globus/start-stop start
        ;;
      stop)
        su - globus /opt/globus/start-stop stop
        ;;
      restart)
        $0 stop
        sleep 1
        $0 start
        ;;
      *)
        printf "Usage: $0 {start|stop|restart}\n" >&2
        exit 1
        ;;
    esac
    exit 0
     
    

Note, two files are used to start the service so that the web services container can be started as the user globus.

  • Register the start up script:
    $ sudo chmod +x /etc/init.d/globus /opt/globus/start-stop
    $ sudo /sbin/chkconfig --add globus
    $ sudo /sbin/chkconfig globus on
     
    
  • Update the sudoers (run visudo so that the file syntax is checked!) file to allow the container to start applications under local user accounts (as determined by the mapping of the users certificate presented at job submission and to the local account name as defined by /etc/grid-security/grid-mapfile). On cheaha, this file is maintained by puppet, all changes have to be made to the copy on the puppet file server!
    $ sudo /usr/sbin/visudo
     
    
    # Globus web services container
    globus ALL=(ALL) NOPASSWD: /opt/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/libexec/globus-job-manager-script.pl *
    globus ALL=(ALL) NOPASSWD: /opt/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/libexec/globus-gram-local-proxy-tool *
     
    

Use extreme caution while performing the MySQL steps, a typo can result in a major problem! Remember that this application isn't the only app on the server that uses MySQL.

  • Do not do this step on a ROCKS cluster, mysql is already installed and configured: Install and enable the MySQL server:
    $ sudo yum install mysql-server
    $ sudo /sbin/chkconfig mysqld on
    $ sudo /sbin/service mysqld start
     
    
  • Create the RFT state database (note you will need to know the root password for the database, but run the commands as your normal user !):
    $ mysqladmin -u root -p create rftDatabase
     
    
  • Assign permissions by creating the users (the password 'foo' should be replaced with a good strong password, you do not need to be able to remember this after these configuration steps as it will be stored in the jndi-config.xml file)
    $ mysql -u root -p
    
    mysql> grant all privileges on rftDatabase.* to globus@fqdn identified by 'foo';
    Query OK, 0 rows affected (0.04 sec)
    
    mysql> grant all privileges on rftDatabase.* to globus@localhost identified by 'foo';
    Query OK, 0 rows affected (0.05 sec)
     
    

Note, the line with @fqdn will need to be run after the hostname is set. You should also change the password to something less obvious. If you goofed and just copied and pasted the above lines you can fix the password from within mysql

mysql> SET PASSWORD FOR 'globus'@'localhost' = PASSWORD('newpass');
 
  • Populate the schema by logging in as globus and piping in the SQL statements from file:
    $ mysql -u globus -p rftDatabase < /opt/globus/share/globus_wsrf_rft/rft_schema_mysql.sql
     
    

Install the MySQL Connector/J Java connector library (doc) to enable the web services container to access the RFT database.

$ cd /var/tmp
$ wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.0.8.tar.gz/from/http://mysql.osuosl.org/
$ tar -xzf mysql-connector-java-5.0.8.tar.gz mysql-connector-java-5.0.8/mysql-connector-java-5.0.8-bin.jar
$ sudo cp mysql-connector-java-5.0.8/mysql-connector-java-5.0.8-bin.jar /opt/globus/lib
$ rm -rf mysql-connector-java-5.0.8.tar.gz mysql-connector-java-5.0.8

From the the Globus documentation:

Edit $GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xml and change values of connectionString to jdbc:mysql:///rftDatabase from jdbc:postgresql://host/rftDatabase and driverName to com.mysql.jdbc.Driver from org.postgresql.Driver and userName and password to whatever was set during creation of users for mysql.

  • The following patch file will make the modifications using the initial poor password.
    $ vim /tmp/jndi.patch
     
    
    80c80
    <                 org.postgresql.Driver
    ---
    >                 com.mysql.jdbc.Driver
    88c88
    <                 jdbc:postgresql://localhost.localdomain/rftDatabase
    ---
    >                 jdbc:mysql:///rftDatabase
    
  • Now patch the xml file
    $ cd /opt/globus/etc/globus_wsrf_rft
    $ sudo patch -p0 jndi-config.xml < /tmp/jndi.patch
    $ rm /tmp/jndi.patch
     
    

Note: you'll still need to manually edit the file to set the password. Look for the string foo or newpass and change it to the password you assigned above.

$ sudo vim /opt/globus/etc/globus_wsrf_rft/jndi-config.xml
 

Configure VM Instance

This is the starting point for building a variety of Globus-based resources on UABgrid. UABgrid Stage is one of these resources. The VM instances has had all of the above steps performed and ready for customization as a specific host.

Unpack the VM tarball. Apply the work around for bug #9. Then rename it with the mvb-rename.sh srcipt.

Run the new VM. If you're prompted by kudzu to unconfigure a network device and then configure a new one, go ahead an remove the now missing device and add the new one. You can choose to set your network address during the new device configuration. (See bug #10).

Note: make sure the host name assigned to the vm instance is properly registered in the DNS, both an A record and a PTR (IP to hostname) record must be defined. (See bug #16 for details.)

Once the machine is running, connect to it and fix the following bugs:

  • Bug #6: change the version specific paths to use the symlinks instead
  • Bug #4: change the ownership of the file to globus.globus chown globus.globus start-stop
  • Bug #5: change the ownership of the rft config file chown globus.globus /opt/globus/etc/globus_wsrf_rft/jndi-config.xml
  • Bug #7: add configuration file for GSI-SSHD

Update RFT Configration

Now that a hostname has been assigned to the box, the RFT database access privileges can be set correctly. Run the mysql client and enter the following account definition. Remember to use a password that's unique and replace 'fqdn' with your hostname.

grant all privileges on rftDatabase.* to globus@fqdn identified by 'foo';

After that, edit the /opt/globus/etc/globus_wsrf_rft/jndi-config.xml and replace the foo password with the one used above.

Request a Host Cerfitificate

See the instructions for UABgrid CA for details.

After the hostkey.pem and hostcert.pem are in place in the /etc/grid-security directory, we need to also make those files available to the globus user running the web services container.

cd /etc/grid-security
cp hostcert.pem containercert.pem
cp hostkey.pem containerkey.pem
chown globus:globus container*.pem

Note: be sure that you are not triggering bug #15 at this point. That is make sure the permissions and ownership on the host*.pem files are correct.

cd /etc/grid-security
chown root.root host*.pem
chmod 644 hostcert.oem

Trust UABgrid CA

See these instructions for details:

But the shortcut is:

cd /etc/grid-security/certificates
wget -N http://uabgrid.uab.edu/files/56498486.0
wget -N http://uabgrid.uab.edu/files/56498486.signing_policy

Reboot and Validate

After this the machine will be ready to go, we just need to reboot and validate the environment.

Configure BlazerID Authentication

UABgrid Stage supports both GSI-SSH (ie. certificate-based) and standard SSH authentication methods. As a convenience to members of the UAB community, BlazerID based authentication credentials are supported as well. This requires that the authentication infrastructure be updated to support LDAP-based authentication via the PAM configuration system. Detailed instructions for this configuration can be found in the [@lab system documentation].

The authentication hook is very simple because we are only talking about authentication and not account management. All account management takes place locally so the authentication hook just triggers a password lookup via UAB's LDAP system. Create the file /etc/auth_ldap.conf

ssl start_tls
ssl on

uri ldaps://ldap.uab.edu/

base ou=people,dc=uab,dc=edu

Then hook it into the PAM authentication sequence in the file /etc/pam.d/system-auth. The following diff will do it:

5a6
> auth        sufficient    /lib/security/$ISA/pam_ldap.so use_first_pass config=/etc/auth_ldap.conf

Note: make sure the the pam_unix.so line above this one is also configured as "sufficient" and not "required". The intent is to allow either authentication method to succeed.

Install GridWay

These instructions will follow the !GridWay Administrator Guide install instructions with modifications to take local conventions or platform assumptions into consideration when necessary.

Verify the System

In order to having a working GridWay install you first need a working Globus install. GridWay documentation includes verification steps identical to the ones followed above. These tests were all performed successfully except for the MDS query using the Globus 2.4 interfaces, namely the LDIF-based information. Running the command results in the following message:

$ grid-info-search -x
-bash: grid-info-search: command not found

This command seems to be missing on all local installs of the Globus 4.0.x so it's not clear if additional steps are needed to enable it and it's also not clear if it's at all necessary, since we generally want to favor the ws-MDS implementation due to is relative simplicity. Bug #17 records the missing grid-info-search command.

GridWay also lists serveral requirements for the system environment covering compilers and development libraries. The CentOS 4.5 platform with the Globus install described above satisfies all requirements. The only minor exception is the version of GCC is 3.4.6, which is not listed as tested in GridWay docs, but this isn't expected to introduce incompatibilities. Here's the run-down.

C compiler: Tested versions gcc 3.4.2, 3.4.4, 4.0.3, 4.0.3 and 4.1.2

$ gcc --version
gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-8)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Globus C libraries: globus_gram_client, globus_ftp_client and globus_gass_copy

$ ls /opt/globus/lib/libglobus_gram_client* /opt/globus/lib/libglobus_ftp_client* /opt/globus/lib/libglobus_gass_copy*
# lots of libraries listed, all ok

Globus JAVA development libraries

$ ls /opt/globus/lib/globus_*.jar
# lots more libraries, assume that's what's ment

J2SE versions 1.4.2_10+ (Builds higher than 10) or 1.5.0+

$ java -version
java version "1.5.0_12"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04)
Java HotSpot(TM) Client VM (build 1.5.0_12-b04, mixed mode, sharing)

GNU Make

$ make --version
GNU Make 3.80
Copyright (C) 2002  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

Sudo command (only required for multiple-user mode)

$ sudo -V
Sudo version 1.6.7p5

Berkeley Database library version 4.4.20 (only required to compile the accounting module)

$ rpm -qa | grep db4
db4-devel-4.2.52-7.1
db4-utils-4.2.52-7.1
db4-4.2.52-7.1

Multiple-User Mode Installation

UABgrid Stage will operate in the multi-user mode of GridWay. This basically means it's a shared install for all users on a system. This is a fairly standard installation format for a shared system. GridWay doesn't require this and could be run directly by a user without the need for a system install. This is a nice feature but one that's not needed in UABgrid since we support it.

GridWay requires a distinct account to operate under, similar to Globus, since it runs some commands on behalf of users and a distinct account is a secure way to control such privileges. GridWay also requires that both this administrative account and all users that will run gridway commands belong to the same group. We'll create a user gwadmin as the owner/operator of GridWay and the gwusers group for the group memebership. It's not clear from the documentation if the gwadmin account just needs to belong to the gwusers group or if it needs to be the primary group for gwadmin. We'll assume the latter since the install process may choose to control permissions on certain commands via the group owner and having the primary group of gwadmin be gwusers will cause the files to be set up properly at install. This means we create the group first and then the user.

groupadd gwusers
useradd -c "GridWay Administrator" -g gwusers gwadmin

Next we want to prepare the install folder for GridWay. We'll use symbolic link as before to abstract specific versions (5.2.2 as of this writing) from the config files.

mkdir /opt/gw-5.2.2
chown gwadmin.gwusers /opt/gw-5.2.2
cd /opt
ln -s gw-5.2.2 gw

Create a user environment configuration file /etc/profile.d/gw.sh with the following contents:

export GW_LOCATION=/opt/gw
export PATH=$PATH:$GW_LOCATION/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GW_LOCATION/lib

Now switch to the gwadmin user and download and build the GridWay distribution

su - gwadmin
mkdir dist src
cd dist
wget http://www.gridway.org/software/files/gw-5.2.2.tar.gz
cd ../src
tar -xzf ../dist/gw-5.2.2.tar.gz
cd gw-5.2.2

Good build practice dictates that we strip out uncessary elements from the environment for the build, as we did above. We also source the Globus development environment to support the build process:

export LD_LIBRARY_PATH=/opt/globus/lib
PATH=/usr/java/jdk1.5/bin:/opt/globus/bin:/opt/globus/sbin:/opt/ant/bin:/bin:/usr/bin
source $GLOBUS_LOCATION/etc/globus-devel-env.sh

Now we build GridWay for our target location and with accounting turned on, documention and test supported. All remaining defaults are accepted.

After running the configure command, you may need to edit ./src/Makefile per the instructions here http://dev.uabgrid.uab.edu/uabgrid-stage/wiki/Gridway-5.4 to avoid a make error. Make sure to make the change after running configure, otherwise configure will overwrite the change!

./configure --with-doc --with-tests --with-db=/usr/lib --prefix=/opt/gw
make
make install
 

Note: there is a possible requirement for a Globus source install. There is an error during the configuration stage above that indicates the build environment of Globus is not complete (see bug #18). It seems a Globus source install may be needed to build GridWay. This is not indicated in the documentation nor is there any clear statement on the differences between the binary packages from Globus versus the source packages. This is further explored in GlobusSourceInstall.

Multiple-User Mode Configuration

In order to support mult-user mode the gwadmin needs to be able to execute select operations as the members of the gwusers group. This is accomplished via sudo.The !GridWay install documentation recommends a default configuration that looks like this:

...
# User alias specification
...
Runas_Alias     GW_USERS = %<gwgroup>
...
# GridWay entries
gwadmin ALL=(GW_USERS)     NOPASSWD: /home/gwadmin/gw/bin/gw_em_mad_prews *
gwadmin ALL=(GW_USERS)     NOPASSWD: /home/gwadmin/gw/bin/gw_em_mad_ws *
gwadmin ALL=(GW_USERS)     NOPASSWD: /home/gwadmin/gw/bin/gw_tm_mad_ftp *

UABgrid Stage will use the following configuration, however, which is tuned to the local environment.

# BEGIN: GridWay Configuation
Runas_Alias     GW_USERS=%gwusers

Defaults>GW_USERS env_keep="GW_LOCATION GLOBUS_LOCATION"

gwadmin ALL=(GW_USERS)     NOPASSWD: /opt/gw/bin/gw_em_mad_prews *
gwadmin ALL=(GW_USERS)     NOPASSWD: /opt/gw/bin/gw_em_mad_ws *
gwadmin ALL=(GW_USERS)     NOPASSWD: /opt/gw/bin/gw_tm_mad_ftp *
# END: GridWay Configuration

Note: pay attention to the paths specified in the example from GridWay above and the ones in the UABgrid Stage configuration. They differ in the path to the install GridWay commands: /home/gwadmin/gw... vs. /opt/gw/....

Also worth noting is the example includes the /opt/gw/bin/gw_em_mad_prews though that commmand is not installed in when building on a binary Globus install becuase the pre-WS MDS is not built by default. You must explicitly enable pre-WS MDS, see GlobusSourceInstall. The authorization is retained in the local configuration to support a source install without having to remember to change this setting.

The documentation goes on to recommend testing the sudo configuration by having the gwadmin user run a test command as one of the users in the gwusers group.

sudo -u <gw_user> /opt/gw/bin/gw_em_mad_ws

This verify that the gwadmin user running the sudo command can become the user that is a member of gwusers and execute /opt/gw/bin/gw_em_mad_ws without being prompted for a password.

Note: while the sudo command successfully becomes the user in this test, the command never completes. It's not clear if that's expected or related to a not having a fully configured GridWay install at this point. Simply Ctrl-C to exit from the command.

Update: Sun Nov 22 14:36:46 2009 The "hang" is expected behavior as really what's happening is that the gw_em_mad_ws command is expecting a protocol dialog. This dialog can be carried out manually by entering

INIT 2 - - - -

(That's a space between each space.)

It should answer with

INIT SUCCESS - - 

This confirms that the service is operational.

Important: make sure that the selected user in the test above has an initialized and valid grid proxy certificate. If the certificate is expired, the command will simply return with no indication of success or failure.

Note: A couple of points tabout the example, make sure to reference the correct local path for the command to be executed (i.e. under /opt/gw). The GridWay doc example uses a different path. Also remember that the gw_em_mad_prews is not available with the install based on a binary Globus install

Note: If you blindly copy-n-paste the GridWay example (very easy to do, even for experts) you will get errors because you haven't authorized the gwadmin user to execute the command at the correct path for this install. The error will reveal itself as the gwadmin user being prompted for a password to execute the test command. This is the correct behavior as the gwadmin user should only be authorized to run these specific commands. This password prompt behavior will occur when attempting to run any command not explicitly authorized eg. sudo -u <gw_user> id.

Run GridWay

Because we are using multi-user mode we must start the GridWay daemon as the gwadmin user and with the -m switch. Become the gwadmin user and run:

gwd -m

Note: a start script is pending. See bug #19.

Testing GridWay Install

The documentation next lists some basic tests to validate the environment. The local tests are simple. The user environment will have already been configured at login. Simply run the following two commands and receive the expected output. (no processes and no hosts, just the headers)

$ gwps
USER         JID DM   EM   START    END      EXEC    XFER    EXIT NAME            HOST
$ gwhost
HID PRIO  OS              ARCH   MHZ %CPU  MEM(F/T)     DISK(F/T)     N(U/F/T) LRMS                 HOSTNAME

There are test scripts installed under /opt/gw/test. See /opt/gw/test/gwtest -h for details, but these tests are all geared toward the default single-user operation and therefore will complain about not being able to launch the gwd themselves. Be sure to use the -c option to avoid complaints about not being able to start gwd. You must also run these tests as a normal user with an initialized proxy cert and an properly configured GridWay install (which does not yet exist at this point). The tests are mentioned here to complement the GridWay documentation.

NOTE: On the cluster make sure you edit the file $GLOBUS_LOCATION/etc/globus_wsrf_mds_usefulrp/gluerp.xml and enable the use of Ganglia to report cluster information so that complete host information is displayed with gwhost.

Setup SGE as Globus Job Manager

This section is meant to be executed on the cluster that has SGE 6 installed and tested.

Enable reporting for SGE using the following commands:

% qconf -mconf
[edit the line starting with reporting_params and set reporting=true joblog=true]
% qconf -sconf
[you should see something like this]
reporting_params             accounting=true reporting=true \
                             flush_time=00:00:15 joblog=true sharelog=00:00:00

Restart SGE, you should see the file $SGE_ROOT/$SGE_CELL/common/reporting.

/etc/init.d/sgemaster stop
/etc/init.d/sgemaster start
ls -l $SGE_ROOT/$SGE_CELL/common/reporting

Download the four packages from Sun Grid Engine Integration with Globus Toolkit 4.

wget http://www.lesc.ic.ac.uk/projects/globus_gram_job_manager_setup_sge-1.1.tar.gz
wget http://www.lesc.ic.ac.uk/projects/globus_scheduler_event_generator_sge-1.1.tar.gz
wget http://www.lesc.ic.ac.uk/projects/globus_scheduler_event_generator_sge_setup-1.1.tar.gz
wget http://www.lesc.ic.ac.uk/projects/globus_wsrf_gram_service_java_setup_sge-1.1.tar.gz

Install the four packages and run gpt-postinstall:

gpt-build globus_gram_job_manager_setup_sge-1.1.tar.gz
gpt-build globus_scheduler_event_generator_sge-1.1.tar.gz gcc64dbg
gpt-build globus_scheduler_event_generator_sge_setup-1.1.tar.gz
gpt-build globus_wsrf_gram_service_java_setup_sge-1.1.tar.gz
gpt-postinstall

Note that the flavor chosen here is gcc64dbg, you need to use the appropriate Globus flavor. Attention: You may need to build the gcc64 flavor also (as you will see later).

NEW ADDITION: There are couple items you need to change in the Perl module that creates the SGE script. Edit the file $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/sge.pm and make the changes shown below (this is the output from diff between the original and modified file):

38c38
<     $mpi_pe      = '';
---
>     $mpi_pe      = 'mpi';
52a53,55
>     $ENV{"SGE_CELL"} = $SGE_CELL;
>     $ENV{"SGE_ARCH"} = $SGE_ARCH;
>     $ENV{"SGE_PORT"} = "536";
179c182,183
<     $script_url = "$tag/sge_job_script.$$";
---
> #####    $script_url = "$tag/sge_job_script.$$";  #### PURI - a bug!
>     $script_url = "/sge_job_script.$$";
492a497
>                                    . "-machinefile \$TMPDIR/machines "

Yet Another Change: When John-Paul tried this on Cheaha he noticed that SGE_QMASTER_PORT also needs to be set, you have to add the following to the globus start-stop script:

export SGE_EXECD_PORT=537
export SGE_QMASTER_PORT=536

OR

include it in the sge.pm script above.

Updated on Sep 10, 2008: While installing on this on Everest after we upgraded to the latest version of Rocks (5.0), I noticed that I was getting error with globusrun-ws about the sudo file being not setup correctly. Looks like you there seems to be some problem with the symbolic links since the gpt-postinstall had used /opt/globus instead of /usr/local/globus-4.0.7 in the sudo file.

Restart Globus (you need restart both preWS and WS versions if they are running):

/etc/init.d/xinit reload
/etc/init.d/globus-4.0.5 restart

To test if SGE and Globus Integration is successful, try submitting a job to the SGE. Here is an example:

[puri@stage ~]$ globus-job-submit everest.cis.uab.edu/jobmanager-sge -np 4 -x "(jobtype=mpi)" /home/puri/examples/psum 10000
https://everest00.cis.uab.edu:40082/5159/1186589962/
[puri@stage ~]$ globus-job-get-output https://everest00.cis.uab.edu:40082/5159/1186589962/
/opt/sge6/default/spool/everest-0-4/active_jobs/3652.1/pe_hostfile
everest-0-4
everest-0-4
everest-0-24
everest-0-24
My rank is 3, size is 4
My rank is 2, size is 4
My rank is 0, size is 4
My rank is 1, size is 4
The total is 50005000.000000 it should be equal to 50005000.000000
Time taken = 0.000000
[puri@stage ~]$ globusrun-ws -submit -factory everest.cis.uab.edu -Ft SGE -c -- /bin/hostname
Submitting job...Done.
Job ID: uuid:456cc210-45cb-11dc-a36f-000c296f18cf
Termination time: 08/09/2007 16:20 GMT
Current job state: Pending
Current job state: Active
Current job state: CleanUp
Current job state: Done
Destroying job...Done.

NOTE: While this seems to work fine with everest, there seems to be some problem with olympus for the WS version of globusrun. It just waits for a long time after the third line (Termination time: ) and then prints: Current job state: Unsubmitted. If you look at the log file $GLOBUS_LOCATION/container.log, you can see that the job was submitted, if you check with qstat you will see the job go to the SGE queue. Many people have reported this bug, but could not find any solution yet.

Figured out the problem here

Looks like this problem occurs when $GLOBUS_LOCATION/libexec/globus-scheduler-event-generator -s sge is not running. Of course the question is "Why is it not running?" When I tried to start it manually, I got an error "globus_xio: Operation was canceled." This indicated that there was something missing here. Looked around a little bit and everest had both gcc64dbg and gcc64 libraries in $GLOBUS_LOCATION/lib, so I decided to build the gcc64 flavor and that seems to have solved this problem.

For whatever reason this will not be started by the Globus Container without the gcc64 flavor being built (part of the four packages built earlier for SGE and Globus Integration). After I built the gcc64 flavor and restarted the container, everything seems to be on track. I did notice that the file $GLOBUS_LOCATION/libexec/globus-build-env-gcc64.sh appeared after this build.

While I would like to understand what's going on here, I am glad it works, so I am moving on.

Now here is the simple test:

[puri@stage ~]$ globusrun-ws -submit -s -F olympus.cis.uab.edu -Ft SGE -c /bin/hostname
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:f4e3e6a4-4609-11dc-a0f5-000c296f18cf
Termination time: 08/09/2007 23:49 GMT
Current job state: Pending
Current job state: Active
compute-3-5.local
Current job state: CleanUp-Hold
Current job state: CleanUp
Current job state: Done
Destroying job...Done.
Cleaning up any delegated credentials...Done.
[puri@stage ~]$

Updated Sep 10, 2008: With Globus 4.0.7 on Everest I did not have the problem mentioned above.