Upgrading Gridway from 5.2.2 to 5.4 on stage
Globus Toolkit version for this upgrade is 4.0.5, first installed on stage.
- Initial and setup steps are from stage
- Configured GW-5.4 with the following options
./configure --with-doc --with-tests --with-db=/usr/lib --enable-debug --prefix=/opt/gw
- Doing a make results in the following error:
./em_mad/GW_mad_ws.java:287: cannot access org.oasis.wsrf.faults.BaseFaultType class file for org.oasis.wsrf.faults.BaseFaultType not found if (job.getFault() == null) ^ ./em_mad/GW_mad_ws.java:295: cannot find symbol symbol : method getDescription(int) location: class org.globus.exec.generated.FaultType info = job.getFault().getDescription(0).toString().replace('\n', ' '); ^ Note: ./em_mad/GW_mad_ws.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. 2 errors make[1]: *** [em_mad/gw_em_mad_ws.jar] Error 1 make[1]: Leaving directory `/usr/local/gw-5.4.0/src' make: *** [all-recursive] Error 1
- For the above error, gridway_users@globus.org mailing list had a solution, which was to uncomment line 697 and comment line 698 in the /home/gwadmin/gw-5.4.0/src/Makefile
- Making the above change and again doing a make was successful without any errors.
- Next did a make install | tee install.log
- Installed GW-5.4 successfully. Started gwd and submitted a job successfully too.
Configuring GW-5.4 to access SSH Middle Access Drivers (MAD's)
- Exploring to access remote resources via SSH MAD's of Gridway. This method seems easier, quicker to the conventional method of adding a remote resource to Gridway. We can easily integrate resources, even the ones which do not have Globus Toolkit installed.
- Steps to access SSH drivers are given here
- SSH drivers depend on Ruby and net-ssh. As a result, first installed Ruby through yum install ruby. yum installed Ruby-1.8.1.
- Next downloaded !RubyGems. Installation of RubyGems, resulted in errors due to incompatibility between RubyGems and Ruby version. RubyGems-1.2 required Ruby-1.8.7. So, did a manual installation of Ruby and RubyGems as root.
wget ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.7-p72.tar.gz tar -xzvf ruby-1.8.7-p72.tar.gz cd ruby-1.8.7-p72 ./configure make make install mv /usr/local/bin/* /usr/bin/. **By default, the ruby utilities were installed to /usr/local/bin. So, moved these libraries to /usr/bin for easier global access. wget http://rubyforge.org/frs/download.php/38646/rubygems-1.2.0.tgz tar -xzf rubygems-1.2.0.tgz cd rubygems-1.2.0 ruby setup.rb gem install net-ssh cd $GW_LOCATION/share/examples/ssh ./install.sh * Next add these two lines to /etc/sudoers gwadmin ALL=(GW_USERS) NOPASSWD: $GW_LOCATION/bin/gw_em_mad_ssh * gwadmin ALL=(GW_USERS) NOPASSWD: $GW_LOCATION/bin/gw_tm_mad_ssh *
- From Chapter3 in Gridway doc, Configuration to access to SSH resources, add the following three lines to $GW_LOCATION/etc/gwd.conf
IM_MAD = static:gw_im_mad_static:-l etc/host.list:ssh_tm:ssh EM_MAD = ssh:gw_em_mad_ssh::rsl2 TM_MAD = ssh_tm:gw_tm_mad_ssh:
- Create $GW_LOCATION/etc/host.list file.
cat > host.list << EOF cahaba.eng.uab.edu etc/cahaba.machine coosa.eng.uab.edu etc/coosa.machine EOF
- Create the <host>.machine files.
cat > cahaba.machine << EOF HOSTNAME="cahaba.eng.uab.edu" ARCH="i686" OS_NAME="GNU/Linux" OS_VERSION="2.6.18-53.1.14.el5" CPU_MODEL="Intel(R) Xeon(TM) CPU 2" CPU_MHZ=2394 CPU_FREE=100 CPU_SMP=1 NODECOUNT=1 SIZE_MEM_MB=431 FREE_MEM_MB=180 SIZE_DISK_MB=74312 FREE_DISK_MB=40461 FORK_NAME="jobmanager-ssh" LRMS_NAME="jobmanager-ssh" LRMS_TYPE="ssh" QUEUE_NAME[0]="default" QUEUE_NODECOUNT[0]=1 QUEUE_FREENODECOUNT[0]=1 QUEUE_MAXTIME[0]=0 QUEUE_MAXCPUTIME[0]=0 QUEUE_MAXCOUNT[0]=0 QUEUE_MAXRUNNINGJOBS[0]=0 QUEUE_MAXJOBSINQUEUE[0]=0 QUEUE_STATUS[0]="0" QUEUE_DISPATCHTYPE[0]="Immediate" QUEUE_PRIORITY[0]="NULL" EOF cat > coosa.machine << EOF HOSTNAME="coosa.eng.uab.edu" ARCH="x86_64" OS_NAME="GNU/Linux" OS_VERSION="2.6.9-55.0.9.ELsmp" CPU_MODEL="Intel(R) Xeon(TM)" CPU_MHZ=3192 CPU_FREE=100 CPU_SMP=1 NODECOUNT=1 SIZE_MEM_MB=431 FREE_MEM_MB=180 SIZE_DISK_MB=74312 FREE_DISK_MB=40461 FORK_NAME="jobmanager-ssh" LRMS_NAME="jobmanager-ssh" LRMS_TYPE="ssh" QUEUE_NAME[0]="default" QUEUE_NODECOUNT[0]=1 QUEUE_FREENODECOUNT[0]=1 QUEUE_MAXTIME[0]=0 QUEUE_MAXCPUTIME[0]=0 QUEUE_MAXCOUNT[0]=0 QUEUE_MAXRUNNINGJOBS[0]=0 QUEUE_MAXJOBSINQUEUE[0]=0 QUEUE_STATUS[0]="0" QUEUE_DISPATCHTYPE[0]="Immediate" QUEUE_PRIORITY[0]="NULL" EOF
Check Configuration of Gridway with SSH MAD
- After doing the above steps, re-started gwd. Submitting a simple job resulted in the following error:
$ gwsubmit -v -t test FAILED: failed could not register user (check proxy) $ cat test EXECUTABLE=/bin/uname ARGUMENTS=-a REQUIREMENTS=HOSTNAME = "cheaha.ac.uab.edu"
- Checked that proxy has been initialized, is valid and gwd is running. The above error occurs for every resource listed in 'gwhost'. This error does not arise when the following three lines are commented from $GW_LOCATION/etc/gwd.conf
IM_MAD = static:gw_im_mad_static:-l etc/host.list:ssh_tm:ssh EM_MAD = ssh:gw_em_mad_ssh::rsl2 TM_MAD = ssh_tm:gw_tm_mad_ssh:
- Found that the ruby-library net-sftp was missing, and the naming of log files was incorrect. The following changes were done to overcome these: (the patch files are attached)
gem install net-sftp patch --ignore-whitespace --backup $GW_LOCATION/bin/gw_em_mad_ssh < /tmp/bin_gw_em_mad_ssh.patch patch --ignore-whitespace --backup $GW_LOCATION/bin/gw_tm_mad_ssh < /tmp/bin_gw_tm_mad_ssh.patch patch --ignore-whitespace --backup $GW_LOCATION/libexec/ruby/gw_em_mad_ssh < /tmp/ruby_gw_em_mad_ssh.patch patch --ignore-whitespace --backup $GW_LOCATION/libexec/ruby/gw_tm_mad_ssh < /tmp/ruby_gw_tm_mad_ssh.patch
- The above patches were updated to gw_users
- Even with the above changes the error did not go away. The reply by Tino Vazquez of Gridway, suggested that there might be a conflict of ruby libraries. Since, the ruby and gems libraries that were installed were the latest, these might've conflicted with the older version of SSH MAD files in GW-5.4. So, as suggested by Tino, did the following steps:
gem uninstall net-ssh net-sftp gem install net-ssh --version '< 2.0.0' gem install net-sftp --version '< 2.0.0'
- Re-starting gwd and re-submitting the job was successful. Executed transfer MAD manually and this worked too.
$ sudo -u gwadmin gw_tm_mad_ssh INIT 50 - - - - INIT - - SUCCESS -
Attachments
- bin_gw_em_mad_ssh.patch (164 bytes) -
Patch to $GW_LOCATION/bin/gw_em_mad_ssh
, added by Poornima on 10/08/08 17:17:22. - bin_gw_tm_mad_ssh.patch (160 bytes) -
Patch to $GW_LOCATION/bin/gw_tm_mad_ssh
, added by Poornima on 10/08/08 17:17:51. - ruby_em_mad_ssh.patch (115 bytes) -
Patch to $GW_LOCATION/libexec/ruby/gw_em_mad_ssh.rb
, added by Poornima on 10/08/08 17:18:26. - ruby_tm_mad_ssh.patch (123 bytes) -
Patch to $GW_LOCATION/libexec/ruby/gw_tm_mad_ssh.rb
, added by Poornima on 10/08/08 17:18:49.
