The instructions follow the structure of the Globus QuickStart Guide with some modification and updating to reflect the platform and UABgrid install environment.
Foundation
UABgrid stage is based on a CentOS4 server profile install. At the time of construction the foundation was CentOS4.5. CentOS4 was chosen to offer maximum user experience compatibility between the job staging and execution platforms. While production operation doesn't demand this type of compatibility because there is no need for interactive sessions with the the grid compute resources, the initial efforts to build grid-based workflows using the UABgrid meta-scheduler will likely require users to be interactive on both platforms, having the same system environment as most of the ROCKS-based compute clusters should help make this a smoother transition.
UABgrid stage is operated as a virtual machine. Currently this is done using VMware based hosting platforms but it may be converted to Xen-based systems in the future. This detail clearly doesn't matter to a user, but it's obviously informative to a potential administrator of the services. The virtual machine is an instance of a generic, pre-configured "grid host" virtual machine built for UABgrid. Having a template system image will make it easier to instantiate grid resources for a variety of tasks for both production and development use. The virtual machine images are expected to be maintained as a separate configuration management process and will likely have a distinct project page in the future, but as with the generic components of the documentation, this project will serve as a foundation for identifying common needs.
Getting Started
The first steps to building the UABgrid stage is to unpack the centos VM tarball, assign a static netowork address, install the Java SDK, create a globus user, and download the globus distribution.
The following steps need to be performed as the root user.
The IP address is easily set in /etc/sysconfig/network-scripts/ifcfg-eth0. The following template can be helpful:
DEVICE=eth0 BOOTPROTO=static ONBOOT=yes IPADDR=10.0.0.1 NETMASK=255.255.0.0 GATEWAY=10.0.0.254
By default, CentOS is installed with the GCC Java tools. This version of Java is incompatible with Globus. It needs to be unstalled.
rpm -e gcc-java java-1.4.2-gcj-compat
The recommended version of Java for UABgrid is the latest in the Java 5.0 series. While Java 6.0 has been released, we haven't tested it yet. Go to the the Java 5.0 release page and download the JDK 5.0, this is the Java 5.0 SE (standard edition) which is the java runtime plus development libraries, needed to build the most recent release of some of the GridShib? for Globus tools. There is no need for the NetBeans? or Enterprise Edition. Download the RPM for Linux which you can install with:
rpm -ihv jdk-1_5_0_12-linux-i586.rpm cd /usr/java ln -s jdk* jdk
The symbolic link is created to simplify the configuration of the system environment and management the configuration files. Newer versions of Java can be installed with subsequent RPMs and put in operation with a simple update of the symbolic link. No configuration files will need to be changed.
Install Apache Ant in order to provide the build environment and tools for the Java utilities. Download Ant 1.7.0 (the current release as of this writing) and unpack the tarball in /opt.
cd /var/tmp wget http://ossavant.org/apache/ant/binaries/apache-ant-1.7.0-bin.tar.gz cd /opt tar -xzf /var/tmp/apache-ant-1.7.0-bin.tar.gz ln -s apache-ant-1.7.0 ant
Again, the symbolic link makes configuration file maintenance and software updates easier to manage.
Go ahead and configure the user environment to define the Java and Ant paths. This will avoid having to set them explictly. Create the file /etc/profile.d/ant.sh
export ANT_HOME=/opt/ant export PATH=$ANT_HOME/bin:$PATH
Create /etc/profile.d/java.sh
export JAVA_HOME=/usr/java/jdk export PATH=$JAVA_HOME/bin:$PATH
Install Globus
Create the user globus which will own all the globus files and processes. The id 401 is below the id's used for normal login users (which start at 500 on Red Hat machines), the command also specifies NOT to create the home directory, we'll do that manually:
$ sudo /sbin/useradd -c 'Globus Toolkit' -g 401 -M -d /opt/globus globus
Create the install directory and give ownership to the globus user. Again, the directory naming reflects the current version at the time of this install (Globus 4.0.5) and the symbolic link makes upgrades and maintenance eaiser:
$ sudo mkdir /opt/globus-4.0.5 $ cd /opt $ sudo ln -s globus-4.0.5 globus $ sudo chown globus.globus globus-4.0.5
Now we should be ready to become the globus user, download. extract, build and install the toolkit.
$ sudo su - globus $ mkdir src dist $ cd dist $ wget http://www-unix.globus.org/ftppub/gt4/4.0/4.0.5/installers/bin/gt4.0.5-x86_rhas_4-installer.tar.gz $ cd ../src $ tar -xzf ../dist/gt4*gz
Before we build the toolkit we need to check our build environment. Whenever you build something, you should make sure you have a minimal environment defined that only includes the specific tools and libraries that you need. If you know your starting point, you won't be suprised by unexpected dependencies at a later point. It is better to begin a build with a minimal environment and incrementally add requirements that cause your build to fail rather than starting out with things you don't need.
The most influential environment variables for a build are PATH and LD_LIBRARY_PATH. These variable define what tools you'll have available to build and what libraries are available for linking. The best approach is to just have /bin and /usr/bin in your PATH and leave LD_LIBRARY_PATH unset. Since this build relies on Java tools, we need to add them to the mix. The profile.d files will have done that by default, but since we're doing a build we want to redefine the PATH manually to prepare for the build.
Note that we should still be logged in as user globus!
$ export PATH=$JAVA_HOME/bin:$ANT_HOME/bin:/bin:/usr/bin $ unset LD_LIBRARY_PATH
Now were ready to begin the build as user globus:
$ cd ~/src/gt4.0.5-x86_rhas_4-installer $ ./configure --prefix /opt/globus $ make | tee build.log $ make install | tee install.log
Review the build and install logs to make sure there weren't any errors. On the current platform, we got an error during install about the libsqlite3_gcc32pthr.so.0 not being found:
running /opt/globus/setup/globus/setup-globus-rls-server..[ Changing to /opt/globus/setup/globus ] WARNING: More than one globus_database_sqliteodbc package found. You may need to adjust the driver settings in /opt/globus/var/odbc.ini .creating SXXrls creating globus-rls-server.conf creating rls-ldif.conf creating odbc.ini /opt/globus/bin/sqlite3: error while loading shared libraries: libsqlite3_gcc32pthr.so.0: cannot open shared object file: No such file or directory /opt/globus/bin/sqlite3: error while loading shared libraries: libsqlite3_gcc32pthr.so.0: cannot open shared object file: No such file or directory Done
The sqlite and sqlite-devel packages are installed by default on ROCKS clusters and most RHEL 5 systems. See ticket:1 for resolution tracking.
We also got an error when configuring the job manager. The complaint was about mpirun and mpiexec not being found. This error can be ignored since MPI won't be used locally. Our job manager will be GridWay. Most jobs will be initiated from UABgrid Stage. It's also unlikely these jobs will be initiated remotely on UABgrid Stage using GRAM, and if they are the fork job manager will be sufficient.
running /opt/globus/setup/globus/setup-globus-job-manager-fork..[ Changing to /opt/globus/setup/globus ] find-fork-tools: WARNING: "Cannot locate mpiexec" find-fork-tools: WARNING: "Cannot locate mpirun" checking for mpiexec... no checking for mpirun... no find-fork-tools: creating ./config.status config.status: creating fork.pm
At this point, globus is installed and needs to be configured. These steps need only be performed if you are starting from the clean CentOS4 base VM. The work can be avoided by unpacking a copy of cent0S 4 VM with Globus installed. Note: this is for future work avoidance. The VM should not have an identity defined yet (static IP) in order to avoid conflicts with existing systems. Track with ticket:2.
Configure Globus
The Globus QuickStart? guide describes the configuration sequence in to major parts: defining identities and configuration of services. For the purpose of these instructions, however, the host registration will be saved until after the Globus components are configured so we can create a generic VM foundation that can simply be unpacked and have an identity assigned and avoid all the steps needed to install software packages.
Host Identity
The QuickStart? guide dedicates the next steps as setting up SimpleCA and using it to create host and user certificates. Certificates are critical because Globus uses them as the foundation of identity in the grid. Setting up SimpleCA is good if you don't have an existing CA or want a CA for local identity management. UABgrid has an established CA so we don't need another one, especially for a core resource like the meta-scheduler. If testing needs to be performed that requires full CA control, a distinct instance of UABgrid Stage should be constructed. User identities are also assigned by the UABgrid CA, please see the UABgrid CA help page for additional information. UABgrid supports multiple identity sources for generating user certificates, including an open-access identity provider, and with proper authorization any of these identities could be used to access and test resources.
Setting up the host identity should follow the host registration steps in earlier UABgrid documentation and the steps to trust the UABgrid CA. These steps will satisfy the required steps needed to get a host identitfied on UABgrid.
The only step necessary at this point is to create the directories needed for the certificate infrastructure:
$ sudo mkdir -p /etc/grid-security/certificates/
Globus Services Configuration
The Globus services are ready to run after the install. These steps simply cover hooking in the services to the system environment so they can be started at boot time. They've been documented in earlier UABgrid documentation but are repeated here for ease of configuration.
Update the Firewall
The firewall in the default install was a strict firewall config. All services must be explicitly allowed. Using sudo, edit the /etc/sysconfig/iptables file and add the following entries before the line -A RH-Firewall-1-INPUT -j REJECT --reject-with icmp-host-prohibited. Restart the firewall after making the changes.
$ sudo /etc/sysconfig/iptables
# BEGIN: Globus Services -A INPUT -m state --state NEW -m tcp -p tcp --dport 2119 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 2222 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 2811 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 45000:45999 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 8443 -j ACCEPT # END: Globus Services
$ sudo /sbin/service iptables restart
User Environment Configuration
Users and processes need to be able to correctly access the globus services. Modify the system environment configuration in /etc/profile.d by adding the following files for bash and csh users (it is important to include both because many cluster users use csh as their shell! ):
- Bash shell script
$ sudo vi /etc/profile.d/globus.sh
#!/bin/bash export GLOBUS_LOCATION=/opt/globus export GLOBUS_HOSTNAME=`hostname --fqdn` export GLOBUS_TCP_PORT_RANGE=45000,45999 export GPT_LOCATION=/opt/globus source $GLOBUS_LOCATION/etc/globus-user-env.sh
- C shell script
$ sudo vi /etc/profile.d/globus.csh
#!/bin/tcsh setenv GLOBUS_LOCATION /opt/globus setenv GLOBUS_HOSTNAME `hostname --fqdn` setenv GLOBUS_TCP_PORT_RANGE "45000,45999" setenv GPT_LOCATION /opt/globus source $GLOBUS_LOCATION/etc/globus-user-env.csh
Note: make sure your hostname command actually prints the fully qualified domain name (uname -n will also yield the fqdn).
- Bash shell script for myproxy
$ sudo vi /etc/profile.d/myproxy.sh
#!/bin/bash export MYPROXY_SERVER=myproxy.uabgrid.uab.edu
- C shell script for myproxy
$ sudo vi /etc/profile.d/myproxy.csh
#!/bin/tcsh setenv MYPROXY_SERVER myproxy.uabgrid.uab.edu
System Services Configuration
Globus services are started via the system super daemon xinet.d. The system needs to be told about the service names by added entries to /etc/services and how to run them by adding configuration files to the xinet.d's configuration directory /etc/xinet.d.
- Modify /etc/services by adding these lines to the end of the file:
$ sudo vi /etc/services
globus-gatekeeper 2119/tcp #Globus Gatekeeper gsiftp 2811/tcp #Grid-FTP Server
- Create the /etc/xinetd.d/globus-gatekeeper file
$ sudo vim /etc/xinetd.d/globus-gatekeeper
service globus-gatekeeper { socket_type = stream protocol = tcp wait = no env = LD_LIBRARY_PATH=/opt/globus/lib user = root server =/opt/globus/sbin/globus-gatekeeper server_args = -conf /opt/globus/etc/globus-gatekeeper.conf env += GLOBUS_TCP_PORT_RANGE=45000,45999 disable = no } - Create the /etc/xinetd.d/gsiftp configuration file
$ sudo vi /etc/xinetd.d/gsiftp
service gsiftp { socket_type = stream protocol = tcp env = LD_LIBRARY_PATH=/opt/globus/lib env += GLOBUS_TCP_PORT_RANGE=45000,45999 wait = no user = root server = /opt/globus/sbin/globus-gridftp-server server_args = -i -1 disable = no }
Supporting GSI-SSH
These instructios come from earlier UABgrid documentatation.
- Create an GSI-SSH startup script:
$ sudo cp /etc/init.d/sshd /etc/init.d/sshd-globus $ sudo chmod +x /etc/init.d/sshd-globus
- Download sshd-globus.patch to your home directory and apply the patch to the /etc/init.d/sshd-globus file to support starting GSI-SSH.
$ cd ~ $ wget http://webapp.lab.ac.uab.edu/projects/uabgrid-stage/attachment/ticket/3/sshd-globus.patch\?format=raw -O sshd-globus.patch $ sudo patch -p0 /etc/init.d/sshd-globus < sshd-globus.patch $ rm ssh-globus.patch
- Create the configuration file /etc/sysconfig/sshd-globus:
$ sudo vi /etc/sysconfig/sshd-globus
# Globus Environment export GLOBUS_LOCATION=/opt/globus export LD_LIBRARY_PATH=/opt/globus/lib OPTIONS="-p 2222"
- Enable the GSI-SSH service at boot.
$ sudo /sbin/chkconfig --add sshd-globus $ sudo /sbin/chkconfig sshd-globus on
Configure the RFT Service
From the UABgrid instructions on setting up the service and configuring the database services.
- Create the /opt/globus/start-stop file
$ sudo vim /opt/globus/start-stop
#!/bin/sh set -e export GLOBUS_LOCATION=/opt/globus export JAVA_HOME=/usr/java/jdk export ANT_HOME=/opt/ant export GLOBUS_OPTIONS="-Xms256M -Xmx512M" . $GLOBUS_LOCATION/etc/globus-user-env.sh cd $GLOBUS_LOCATION case "$1" in start) $GLOBUS_LOCATION/sbin/globus-start-container-detached -p 8443 ;; stop) $GLOBUS_LOCATION/sbin/globus-stop-container-detached ;; *) echo "Usage: globus {start|stop}" >&2 exit 1 ;; esac exit 0 - Create the /etc/init.d/globus file
$ sudo vim /etc/init.d/globus
#!/bin/sh -e # Globus start up script from 4.x QuickStart guide # chkconfig: 345 55 25 # description: Globus WSRF container export GLOBUS_LOCATION=/opt/globus export GLOBUS_TCP_PORT_RANGE=45000,45999 case "$1" in start) su - globus /opt/globus/start-stop start ;; stop) su - globus /opt/globus/start-stop stop ;; restart) $0 stop sleep 1 $0 start ;; *) printf "Usage: $0 {start|stop|restart}\n" >&2 exit 1 ;; esac exit 0
Note, two files are used to start the service so that the web services container can be started as the user globus.
- Register the start up script:
$ sudo chmod +x /etc/init.d/globus /opt/globus/start-stop $ sudo /sbin/chkconfig --add globus $ sudo /sbin/chkconfig globus on
- Update the sudoers (run visudo so that the file syntax is checked!) file to allow the container to start applications under local user accounts (as determined by the mapping of the users certificate presented at job submission and to the local account name as defined by /etc/grid-security/grid-mapfile). On cheaha, this file is maintained by puppet, all changes have to be made to the copy on the puppet file server!
$ sudo /usr/sbin/visudo
# Globus web services container globus ALL=(ALL) NOPASSWD: /opt/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/libexec/globus-job-manager-script.pl * globus ALL=(ALL) NOPASSWD: /opt/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/libexec/globus-gram-local-proxy-tool *
Use extreme caution while performing the MySQL steps, a typo can result in a major problem! Remember that this application isn't the only app on the server that uses MySQL.
- Do not do this step on a ROCKS cluster, mysql is already installed and configured: Install and enable the MySQL server:
$ sudo yum install mysql-server $ sudo /sbin/chkconfig mysqld on $ sudo /sbin/service mysqld start
- Create the RFT state database (note you will need to know the root password for the database, but run the commands as your normal user !):
$ mysqladmin -u root -p create rftDatabase
- Assign permissions by creating the users (the password 'foo' should be replaced with a good strong password, you do not need to be able to remember this after these configuration steps as it will be stored in the jndi-config.xml file)
$ mysql -u root -p mysql> grant all privileges on rftDatabase.* to globus@fqdn identified by 'foo'; Query OK, 0 rows affected (0.04 sec) mysql> grant all privileges on rftDatabase.* to globus@localhost identified by 'foo'; Query OK, 0 rows affected (0.05 sec)
Note, the line with @fqdn will need to be run after the hostname is set. You should also change the password to something less obvious. If you goofed and just copied and pasted the above lines you can fix the password from within mysql
mysql> SET PASSWORD FOR 'globus'@'localhost' = PASSWORD('newpass');
- Populate the schema by logging in as globus and piping in the SQL statements from file:
$ mysql -u globus -p rftDatabase < /opt/globus/share/globus_wsrf_rft/rft_schema_mysql.sql
Install the MySQL Connector/J Java connector library (doc) to enable the web services container to access the RFT database.
$ cd /var/tmp $ wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.0.8.tar.gz/from/http://mysql.osuosl.org/ $ tar -xzf mysql-connector-java-5.0.8.tar.gz mysql-connector-java-5.0.8/mysql-connector-java-5.0.8-bin.jar $ sudo cp mysql-connector-java-5.0.8/mysql-connector-java-5.0.8-bin.jar /opt/globus/lib $ rm -rf mysql-connector-java-5.0.8.tar.gz mysql-connector-java-5.0.8
From the the Globus documentation:
Edit $GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xml and change values of connectionString to jdbc:mysql:///rftDatabase from jdbc:postgresql://host/rftDatabase and driverName to com.mysql.jdbc.Driver from org.postgresql.Driver and userName and password to whatever was set during creation of users for mysql.
- The following patch file will make the modifications using the initial poor password.
$ vim /tmp/jndi.patch
80c80 < org.postgresql.Driver --- > com.mysql.jdbc.Driver 88c88 < jdbc:postgresql://localhost.localdomain/rftDatabase --- > jdbc:mysql:///rftDatabase
- Now patch the xml file
$ cd /opt/globus/etc/globus_wsrf_rft $ sudo patch -p0 jndi-config.xml < /tmp/jndi.patch $ rm /tmp/jndi.patch
Note: you'll still need to manually edit the file to set the password. Look for the string foo or newpass and change it to the password you assigned above.
$ sudo vim /opt/globus/etc/globus_wsrf_rft/jndi-config.xml
Configure VM Instance
This is the starting point for building a variety of Globus-based resources on UABgrid. UABgrid Stage is one of these resources. The VM instances has had all of the above steps performed and ready for customization as a specific host.
Unpack the VM tarball. Apply the work around for bug #9. Then rename it with the mvb-rename.sh srcipt.
Run the new VM. If you're prompted by kudzu to unconfigure a network device and then configure a new one, go ahead an remove the now missing device and add the new one. You can choose to set your network address during the new device configuration. (See bug #10).
Note: make sure the host name assigned to the vm instance is properly registered in the DNS, both an A record and a PTR (IP to hostname) record must be defined. (See bug #16 for details.)
Once the machine is running, connect to it and fix the following bugs:
- Bug #6: change the version specific paths to use the symlinks instead
- Bug #4: change the ownership of the file to globus.globus chown globus.globus start-stop
- Bug #5: change the ownership of the rft config file chown globus.globus /opt/globus/etc/globus_wsrf_rft/jndi-config.xml
- Bug #7: add configuration file for GSI-SSHD
Update RFT Configration
Now that a hostname has been assigned to the box, the RFT database access privileges can be set correctly. Run the mysql client and enter the following account definition. Remember to use a password that's unique and replace 'fqdn' with your hostname.
grant all privileges on rftDatabase.* to globus@fqdn identified by 'foo';
After that, edit the /opt/globus/etc/globus_wsrf_rft/jndi-config.xml and replace the foo password with the one used above.
Request a Host Cerfitificate
See the instructions for UABgrid CA for details.
After the hostkey.pem and hostcert.pem are in place in the /etc/grid-security directory, we need to also make those files available to the globus user running the web services container.
cd /etc/grid-security cp hostcert.pem containercert.pem cp hostkey.pem containerkey.pem chown globus:globus container*.pem
Note: be sure that you are not triggering bug #15 at this point. That is make sure the permissions and ownership on the host*.pem files are correct.
cd /etc/grid-security chown root.root host*.pem chmod 644 hostcert.oem
Trust UABgrid CA
See these instructions for details:
But the shortcut is:
cd /etc/grid-security/certificates wget -N http://uabgrid.uab.edu/files/56498486.0 wget -N http://uabgrid.uab.edu/files/56498486.signing_policy
Reboot and Validate
After this the machine will be ready to go, we just need to reboot and validate the environment.
Configure BlazerID Authentication
UABgrid Stage supports both GSI-SSH (ie. certificate-based) and standard SSH authentication methods. As a convenience to members of the UAB community, BlazerID based authentication credentials are supported as well. This requires that the authentication infrastructure be updated to support LDAP-based authentication via the PAM configuration system. Detailed instructions for this configuration can be found in the [@lab system documentation].
The authentication hook is very simple because we are only talking about authentication and not account management. All account management takes place locally so the authentication hook just triggers a password lookup via UAB's LDAP system. Create the file /etc/auth_ldap.conf
ssl start_tls ssl on uri ldaps://ldap.uab.edu/ base ou=people,dc=uab,dc=edu
Then hook it into the PAM authentication sequence in the file /etc/pam.d/system-auth. The following diff will do it:
5a6 > auth sufficient /lib/security/$ISA/pam_ldap.so use_first_pass config=/etc/auth_ldap.conf
Note: make sure the the pam_unix.so line above this one is also configured as "sufficient" and not "required". The intent is to allow either authentication method to succeed.
Install GridWay
These instructions will follow the !GridWay Administrator Guide install instructions with modifications to take local conventions or platform assumptions into consideration when necessary.
Verify the System
In order to having a working GridWay install you first need a working Globus install. GridWay documentation includes verification steps identical to the ones followed above. These tests were all performed successfully except for the MDS query using the Globus 2.4 interfaces, namely the LDIF-based information. Running the command results in the following message:
$ grid-info-search -x -bash: grid-info-search: command not found
This command seems to be missing on all local installs of the Globus 4.0.x so it's not clear if additional steps are needed to enable it and it's also not clear if it's at all necessary, since we generally want to favor the ws-MDS implementation due to is relative simplicity. Bug #17 records the missing grid-info-search command.
GridWay also lists serveral requirements for the system environment covering compilers and development libraries. The CentOS 4.5 platform with the Globus install described above satisfies all requirements. The only minor exception is the version of GCC is 3.4.6, which is not listed as tested in GridWay docs, but this isn't expected to introduce incompatibilities. Here's the run-down.
C compiler: Tested versions gcc 3.4.2, 3.4.4, 4.0.3, 4.0.3 and 4.1.2
$ gcc --version gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-8) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Globus C libraries: globus_gram_client, globus_ftp_client and globus_gass_copy
$ ls /opt/globus/lib/libglobus_gram_client* /opt/globus/lib/libglobus_ftp_client* /opt/globus/lib/libglobus_gass_copy* # lots of libraries listed, all ok
Globus JAVA development libraries
$ ls /opt/globus/lib/globus_*.jar # lots more libraries, assume that's what's ment
J2SE versions 1.4.2_10+ (Builds higher than 10) or 1.5.0+
$ java -version java version "1.5.0_12" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04) Java HotSpot(TM) Client VM (build 1.5.0_12-b04, mixed mode, sharing)
GNU Make
$ make --version GNU Make 3.80 Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Sudo command (only required for multiple-user mode)
$ sudo -V Sudo version 1.6.7p5
Berkeley Database library version 4.4.20 (only required to compile the accounting module)
$ rpm -qa | grep db4 db4-devel-4.2.52-7.1 db4-utils-4.2.52-7.1 db4-4.2.52-7.1
Multiple-User Mode Installation
UABgrid Stage will operate in the multi-user mode of GridWay. This basically means it's a shared install for all users on a system. This is a fairly standard installation format for a shared system. GridWay doesn't require this and could be run directly by a user without the need for a system install. This is a nice feature but one that's not needed in UABgrid since we support it.
GridWay requires a distinct account to operate under, similar to Globus, since it runs some commands on behalf of users and a distinct account is a secure way to control such privileges. GridWay also requires that both this administrative account and all users that will run gridway commands belong to the same group. We'll create a user gwadmin as the owner/operator of GridWay and the gwusers group for the group memebership. It's not clear from the documentation if the gwadmin account just needs to belong to the gwusers group or if it needs to be the primary group for gwadmin. We'll assume the latter since the install process may choose to control permissions on certain commands via the group owner and having the primary group of gwadmin be gwusers will cause the files to be set up properly at install. This means we create the group first and then the user.
groupadd gwusers useradd -c "GridWay Administrator" -g gwusers gwadmin
Next we want to prepare the install folder for GridWay. We'll use symbolic link as before to abstract specific versions (5.2.2 as of this writing) from the config files.
mkdir /opt/gw-5.2.2 chown gwadmin.gwusers /opt/gw-5.2.2 cd /opt ln -s gw-5.2.2 gw
Create a user environment configuration file /etc/profile.d/gw.sh with the following contents:
export GW_LOCATION=/opt/gw export PATH=$PATH:$GW_LOCATION/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GW_LOCATION/lib
Now switch to the gwadmin user and download and build the GridWay distribution
su - gwadmin mkdir dist src cd dist wget http://www.gridway.org/software/files/gw-5.2.2.tar.gz cd ../src tar -xzf ../dist/gw-5.2.2.tar.gz cd gw-5.2.2
Good build practice dictates that we strip out uncessary elements from the environment for the build, as we did above. We also source the Globus development environment to support the build process:
export LD_LIBRARY_PATH=/opt/globus/lib PATH=/usr/java/jdk1.5/bin:/opt/globus/bin:/opt/globus/sbin:/opt/ant/bin:/bin:/usr/bin source $GLOBUS_LOCATION/etc/globus-devel-env.sh
Now we build GridWay for our target location and with accounting turned on, documention and test supported. All remaining defaults are accepted.
After running the configure command, you may need to edit ./src/Makefile per the instructions here http://dev.uabgrid.uab.edu/uabgrid-stage/wiki/Gridway-5.4 to avoid a make error. Make sure to make the change after running configure, otherwise configure will overwrite the change!
./configure --with-doc --with-tests --with-db=/usr/lib --prefix=/opt/gw make make install
Note: there is a possible requirement for a Globus source install. There is an error during the configuration stage above that indicates the build environment of Globus is not complete (see bug #18). It seems a Globus source install may be needed to build GridWay. This is not indicated in the documentation nor is there any clear statement on the differences between the binary packages from Globus versus the source packages. This is further explored in GlobusSourceInstall.
Multiple-User Mode Configuration
In order to support mult-user mode the gwadmin needs to be able to execute select operations as the members of the gwusers group. This is accomplished via sudo.The !GridWay install documentation recommends a default configuration that looks like this:
... # User alias specification ... Runas_Alias GW_USERS = %<gwgroup> ... # GridWay entries gwadmin ALL=(GW_USERS) NOPASSWD: /home/gwadmin/gw/bin/gw_em_mad_prews * gwadmin ALL=(GW_USERS) NOPASSWD: /home/gwadmin/gw/bin/gw_em_mad_ws * gwadmin ALL=(GW_USERS) NOPASSWD: /home/gwadmin/gw/bin/gw_tm_mad_ftp *
UABgrid Stage will use the following configuration, however, which is tuned to the local environment.
# BEGIN: GridWay Configuation Runas_Alias GW_USERS=%gwusers Defaults>GW_USERS env_keep="GW_LOCATION GLOBUS_LOCATION" gwadmin ALL=(GW_USERS) NOPASSWD: /opt/gw/bin/gw_em_mad_prews * gwadmin ALL=(GW_USERS) NOPASSWD: /opt/gw/bin/gw_em_mad_ws * gwadmin ALL=(GW_USERS) NOPASSWD: /opt/gw/bin/gw_tm_mad_ftp * # END: GridWay Configuration
Note: pay attention to the paths specified in the example from GridWay above and the ones in the UABgrid Stage configuration. They differ in the path to the install GridWay commands: /home/gwadmin/gw... vs. /opt/gw/....
Also worth noting is the example includes the /opt/gw/bin/gw_em_mad_prews though that commmand is not installed in when building on a binary Globus install becuase the pre-WS MDS is not built by default. You must explicitly enable pre-WS MDS, see GlobusSourceInstall. The authorization is retained in the local configuration to support a source install without having to remember to change this setting.
The documentation goes on to recommend testing the sudo configuration by having the gwadmin user run a test command as one of the users in the gwusers group.
sudo -u <gw_user> /opt/gw/bin/gw_em_mad_ws
This verify that the gwadmin user running the sudo command can become the user that is a member of gwusers and execute /opt/gw/bin/gw_em_mad_ws without being prompted for a password.
Note: while the sudo command successfully becomes the user in this test, the command never completes. It's not clear if that's expected or related to a not having a fully configured GridWay install at this point. Simply Ctrl-C to exit from the command.
Update: Sun Nov 22 14:36:46 2009 The "hang" is expected behavior as really what's happening is that the gw_em_mad_ws command is expecting a protocol dialog. This dialog can be carried out manually by entering
INIT 2 - - - -
(That's a space between each space.)
It should answer with
INIT SUCCESS - -
This confirms that the service is operational.
Important: make sure that the selected user in the test above has an initialized and valid grid proxy certificate. If the certificate is expired, the command will simply return with no indication of success or failure.
Note: A couple of points tabout the example, make sure to reference the correct local path for the command to be executed (i.e. under /opt/gw). The GridWay doc example uses a different path. Also remember that the gw_em_mad_prews is not available with the install based on a binary Globus install
Note: If you blindly copy-n-paste the GridWay example (very easy to do, even for experts) you will get errors because you haven't authorized the gwadmin user to execute the command at the correct path for this install. The error will reveal itself as the gwadmin user being prompted for a password to execute the test command. This is the correct behavior as the gwadmin user should only be authorized to run these specific commands. This password prompt behavior will occur when attempting to run any command not explicitly authorized eg. sudo -u <gw_user> id.
Run GridWay
Because we are using multi-user mode we must start the GridWay daemon as the gwadmin user and with the -m switch. Become the gwadmin user and run:
gwd -m
Note: a start script is pending. See bug #19.
Testing GridWay Install
The documentation next lists some basic tests to validate the environment. The local tests are simple. The user environment will have already been configured at login. Simply run the following two commands and receive the expected output. (no processes and no hosts, just the headers)
$ gwps USER JID DM EM START END EXEC XFER EXIT NAME HOST $ gwhost HID PRIO OS ARCH MHZ %CPU MEM(F/T) DISK(F/T) N(U/F/T) LRMS HOSTNAME
There are test scripts installed under /opt/gw/test. See /opt/gw/test/gwtest -h for details, but these tests are all geared toward the default single-user operation and therefore will complain about not being able to launch the gwd themselves. Be sure to use the -c option to avoid complaints about not being able to start gwd. You must also run these tests as a normal user with an initialized proxy cert and an properly configured GridWay install (which does not yet exist at this point). The tests are mentioned here to complement the GridWay documentation.
NOTE: On the cluster make sure you edit the file $GLOBUS_LOCATION/etc/globus_wsrf_mds_usefulrp/gluerp.xml and enable the use of Ganglia to report cluster information so that complete host information is displayed with gwhost.
Setup SGE as Globus Job Manager
This section is meant to be executed on the cluster that has SGE 6 installed and tested.
Enable reporting for SGE using the following commands:
% qconf -mconf
[edit the line starting with reporting_params and set reporting=true joblog=true]
% qconf -sconf
[you should see something like this]
reporting_params accounting=true reporting=true \
flush_time=00:00:15 joblog=true sharelog=00:00:00
Restart SGE, you should see the file $SGE_ROOT/$SGE_CELL/common/reporting.
/etc/init.d/sgemaster stop /etc/init.d/sgemaster start ls -l $SGE_ROOT/$SGE_CELL/common/reporting
Download the four packages from Sun Grid Engine Integration with Globus Toolkit 4.
wget http://www.lesc.ic.ac.uk/projects/globus_gram_job_manager_setup_sge-1.1.tar.gz wget http://www.lesc.ic.ac.uk/projects/globus_scheduler_event_generator_sge-1.1.tar.gz wget http://www.lesc.ic.ac.uk/projects/globus_scheduler_event_generator_sge_setup-1.1.tar.gz wget http://www.lesc.ic.ac.uk/projects/globus_wsrf_gram_service_java_setup_sge-1.1.tar.gz
Install the four packages and run gpt-postinstall:
gpt-build globus_gram_job_manager_setup_sge-1.1.tar.gz gpt-build globus_scheduler_event_generator_sge-1.1.tar.gz gcc64dbg gpt-build globus_scheduler_event_generator_sge_setup-1.1.tar.gz gpt-build globus_wsrf_gram_service_java_setup_sge-1.1.tar.gz gpt-postinstall
Note that the flavor chosen here is gcc64dbg, you need to use the appropriate Globus flavor. Attention: You may need to build the gcc64 flavor also (as you will see later).
NEW ADDITION: There are couple items you need to change in the Perl module that creates the SGE script. Edit the file $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/sge.pm and make the changes shown below (this is the output from diff between the original and modified file):
38c38
< $mpi_pe = '';
---
> $mpi_pe = 'mpi';
52a53,55
> $ENV{"SGE_CELL"} = $SGE_CELL;
> $ENV{"SGE_ARCH"} = $SGE_ARCH;
> $ENV{"SGE_PORT"} = "536";
179c182,183
< $script_url = "$tag/sge_job_script.$$";
---
> ##### $script_url = "$tag/sge_job_script.$$"; #### PURI - a bug!
> $script_url = "/sge_job_script.$$";
492a497
> . "-machinefile \$TMPDIR/machines "
Yet Another Change: When John-Paul tried this on Cheaha he noticed that SGE_QMASTER_PORT also needs to be set, you have to add the following to the globus start-stop script:
export SGE_EXECD_PORT=537 export SGE_QMASTER_PORT=536
OR
include it in the sge.pm script above.
Updated on Sep 10, 2008: While installing on this on Everest after we upgraded to the latest version of Rocks (5.0), I noticed that I was getting error with globusrun-ws about the sudo file being not setup correctly. Looks like you there seems to be some problem with the symbolic links since the gpt-postinstall had used /opt/globus instead of /usr/local/globus-4.0.7 in the sudo file.
Restart Globus (you need restart both preWS and WS versions if they are running):
/etc/init.d/xinit reload /etc/init.d/globus-4.0.5 restart
To test if SGE and Globus Integration is successful, try submitting a job to the SGE. Here is an example:
[puri@stage ~]$ globus-job-submit everest.cis.uab.edu/jobmanager-sge -np 4 -x "(jobtype=mpi)" /home/puri/examples/psum 10000 https://everest00.cis.uab.edu:40082/5159/1186589962/ [puri@stage ~]$ globus-job-get-output https://everest00.cis.uab.edu:40082/5159/1186589962/ /opt/sge6/default/spool/everest-0-4/active_jobs/3652.1/pe_hostfile everest-0-4 everest-0-4 everest-0-24 everest-0-24 My rank is 3, size is 4 My rank is 2, size is 4 My rank is 0, size is 4 My rank is 1, size is 4 The total is 50005000.000000 it should be equal to 50005000.000000 Time taken = 0.000000 [puri@stage ~]$ globusrun-ws -submit -factory everest.cis.uab.edu -Ft SGE -c -- /bin/hostname Submitting job...Done. Job ID: uuid:456cc210-45cb-11dc-a36f-000c296f18cf Termination time: 08/09/2007 16:20 GMT Current job state: Pending Current job state: Active Current job state: CleanUp Current job state: Done Destroying job...Done.
NOTE: While this seems to work fine with everest, there seems to be some problem with olympus for the WS version of globusrun. It just waits for a long time after the third line (Termination time: ) and then prints: Current job state: Unsubmitted. If you look at the log file $GLOBUS_LOCATION/container.log, you can see that the job was submitted, if you check with qstat you will see the job go to the SGE queue. Many people have reported this bug, but could not find any solution yet.
Figured out the problem here
Looks like this problem occurs when $GLOBUS_LOCATION/libexec/globus-scheduler-event-generator -s sge is not running. Of course the question is "Why is it not running?" When I tried to start it manually, I got an error "globus_xio: Operation was canceled." This indicated that there was something missing here. Looked around a little bit and everest had both gcc64dbg and gcc64 libraries in $GLOBUS_LOCATION/lib, so I decided to build the gcc64 flavor and that seems to have solved this problem.
For whatever reason this will not be started by the Globus Container without the gcc64 flavor being built (part of the four packages built earlier for SGE and Globus Integration). After I built the gcc64 flavor and restarted the container, everything seems to be on track. I did notice that the file $GLOBUS_LOCATION/libexec/globus-build-env-gcc64.sh appeared after this build.
While I would like to understand what's going on here, I am glad it works, so I am moving on.
Now here is the simple test:
[puri@stage ~]$ globusrun-ws -submit -s -F olympus.cis.uab.edu -Ft SGE -c /bin/hostname Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:f4e3e6a4-4609-11dc-a0f5-000c296f18cf Termination time: 08/09/2007 23:49 GMT Current job state: Pending Current job state: Active compute-3-5.local Current job state: CleanUp-Hold Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. [puri@stage ~]$
Updated Sep 10, 2008: With Globus 4.0.7 on Everest I did not have the problem mentioned above.
