Upgrading Gridway from 5.2.2 to 5.4 on stage

Globus Toolkit version for this upgrade is 4.0.5, first installed on stage.

  • Initial and setup steps are from stage
  • Configured GW-5.4 with the following options
    ./configure --with-doc --with-tests --with-db=/usr/lib --enable-debug --prefix=/opt/gw
    
  • Doing a make results in the following error:
    ./em_mad/GW_mad_ws.java:287: cannot access  org.oasis.wsrf.faults.BaseFaultType
    class file for org.oasis.wsrf.faults.BaseFaultType not found
       if (job.getFault() == null)
                                         ^
    ./em_mad/GW_mad_ws.java:295: cannot find symbol
      symbol  : method getDescription(int)
      location: class org.globus.exec.generated.FaultType info =
         job.getFault().getDescription(0).toString().replace('\n', ' ');
                                                 ^
         Note: ./em_mad/GW_mad_ws.java uses unchecked or unsafe operations.
         Note: Recompile with -Xlint:unchecked for details.
         2 errors
         make[1]: *** [em_mad/gw_em_mad_ws.jar] Error 1
         make[1]: Leaving directory `/usr/local/gw-5.4.0/src'
         make: *** [all-recursive] Error 1
    
  • For the above error, gridway_users@globus.org mailing list had a solution, which was to uncomment line 697 and comment line 698 in the /home/gwadmin/gw-5.4.0/src/Makefile
  • Making the above change and again doing a make was successful without any errors.
  • Next did a make install | tee install.log
  • Installed GW-5.4 successfully. Started gwd and submitted a job successfully too.

Configuring GW-5.4 to access SSH Middle Access Drivers (MAD's)

  • Exploring to access remote resources via SSH MAD's of Gridway. This method seems easier, quicker to the conventional method of adding a remote resource to Gridway. We can easily integrate resources, even the ones which do not have Globus Toolkit installed.
  • Steps to access SSH drivers are given here
  • SSH drivers depend on Ruby and net-ssh. As a result, first installed Ruby through yum install ruby. yum installed Ruby-1.8.1.
  • Next downloaded !RubyGems. Installation of RubyGems, resulted in errors due to incompatibility between RubyGems and Ruby version. RubyGems-1.2 required Ruby-1.8.7. So, did a manual installation of Ruby and RubyGems as root.
    wget ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.7-p72.tar.gz
    tar -xzvf ruby-1.8.7-p72.tar.gz
    cd ruby-1.8.7-p72
    ./configure
    make
    make install
    mv /usr/local/bin/* /usr/bin/.
    
     **By default, the ruby utilities were installed to /usr/local/bin. So, moved these libraries to /usr/bin for easier global access.
    
    wget http://rubyforge.org/frs/download.php/38646/rubygems-1.2.0.tgz
    tar -xzf rubygems-1.2.0.tgz
    cd rubygems-1.2.0
    ruby setup.rb
    
    gem install net-ssh
    
    cd $GW_LOCATION/share/examples/ssh
    ./install.sh
    
    * Next add these two lines to /etc/sudoers
    gwadmin ALL=(GW_USERS) NOPASSWD: $GW_LOCATION/bin/gw_em_mad_ssh *
    gwadmin ALL=(GW_USERS) NOPASSWD: $GW_LOCATION/bin/gw_tm_mad_ssh *
    
  • From Chapter3 in Gridway doc, Configuration to access to SSH resources, add the following three lines to $GW_LOCATION/etc/gwd.conf
    IM_MAD = static:gw_im_mad_static:-l etc/host.list:ssh_tm:ssh
    EM_MAD = ssh:gw_em_mad_ssh::rsl2
    TM_MAD = ssh_tm:gw_tm_mad_ssh:
    
  • Create $GW_LOCATION/etc/host.list file.
    cat > host.list << EOF
    cahaba.eng.uab.edu etc/cahaba.machine
    coosa.eng.uab.edu etc/coosa.machine
    EOF
    
  • Create the <host>.machine files.
    cat > cahaba.machine << EOF
    HOSTNAME="cahaba.eng.uab.edu" ARCH="i686" OS_NAME="GNU/Linux" OS_VERSION="2.6.18-53.1.14.el5" CPU_MODEL="Intel(R) Xeon(TM) CPU 2" CPU_MHZ=2394 CPU_FREE=100 CPU_SMP=1
    NODECOUNT=1 SIZE_MEM_MB=431 FREE_MEM_MB=180 SIZE_DISK_MB=74312 FREE_DISK_MB=40461 FORK_NAME="jobmanager-ssh" LRMS_NAME="jobmanager-ssh" LRMS_TYPE="ssh"
    
    QUEUE_NAME[0]="default" QUEUE_NODECOUNT[0]=1 QUEUE_FREENODECOUNT[0]=1 QUEUE_MAXTIME[0]=0 QUEUE_MAXCPUTIME[0]=0 QUEUE_MAXCOUNT[0]=0
    QUEUE_MAXRUNNINGJOBS[0]=0 QUEUE_MAXJOBSINQUEUE[0]=0 QUEUE_STATUS[0]="0" QUEUE_DISPATCHTYPE[0]="Immediate" QUEUE_PRIORITY[0]="NULL"
    EOF
    
    cat > coosa.machine << EOF
    HOSTNAME="coosa.eng.uab.edu" ARCH="x86_64" OS_NAME="GNU/Linux" OS_VERSION="2.6.9-55.0.9.ELsmp" CPU_MODEL="Intel(R) Xeon(TM)" CPU_MHZ=3192 CPU_FREE=100 CPU_SMP=1
    NODECOUNT=1 SIZE_MEM_MB=431 FREE_MEM_MB=180 SIZE_DISK_MB=74312 FREE_DISK_MB=40461 FORK_NAME="jobmanager-ssh" LRMS_NAME="jobmanager-ssh" LRMS_TYPE="ssh"
    
    QUEUE_NAME[0]="default" QUEUE_NODECOUNT[0]=1 QUEUE_FREENODECOUNT[0]=1 QUEUE_MAXTIME[0]=0 QUEUE_MAXCPUTIME[0]=0 QUEUE_MAXCOUNT[0]=0
    QUEUE_MAXRUNNINGJOBS[0]=0 QUEUE_MAXJOBSINQUEUE[0]=0 QUEUE_STATUS[0]="0" QUEUE_DISPATCHTYPE[0]="Immediate" QUEUE_PRIORITY[0]="NULL"
    EOF
    

Check Configuration of Gridway with SSH MAD

  • After doing the above steps, re-started gwd. Submitting a simple job resulted in the following error:
    $ gwsubmit -v -t test
    FAILED: failed could not register user (check proxy)
    $ cat test
    EXECUTABLE=/bin/uname
    ARGUMENTS=-a
    REQUIREMENTS=HOSTNAME = "cheaha.ac.uab.edu"
    
  • Checked that proxy has been initialized, is valid and gwd is running. The above error occurs for every resource listed in 'gwhost'. This error does not arise when the following three lines are commented from $GW_LOCATION/etc/gwd.conf
    IM_MAD = static:gw_im_mad_static:-l etc/host.list:ssh_tm:ssh
    EM_MAD = ssh:gw_em_mad_ssh::rsl2
    TM_MAD = ssh_tm:gw_tm_mad_ssh:
    
  • Found that the ruby-library net-sftp was missing, and the naming of log files was incorrect. The following changes were done to overcome these: (the patch files are attached)
    gem install net-sftp
    
    patch --ignore-whitespace --backup $GW_LOCATION/bin/gw_em_mad_ssh < /tmp/bin_gw_em_mad_ssh.patch
    patch --ignore-whitespace --backup $GW_LOCATION/bin/gw_tm_mad_ssh < /tmp/bin_gw_tm_mad_ssh.patch
    patch --ignore-whitespace --backup $GW_LOCATION/libexec/ruby/gw_em_mad_ssh < /tmp/ruby_gw_em_mad_ssh.patch
    patch --ignore-whitespace --backup $GW_LOCATION/libexec/ruby/gw_tm_mad_ssh < /tmp/ruby_gw_tm_mad_ssh.patch
    
  • The above patches were updated to gw_users

  • Even with the above changes the error did not go away. The reply by Tino Vazquez of Gridway, suggested that there might be a conflict of ruby libraries. Since, the ruby and gems libraries that were installed were the latest, these might've conflicted with the older version of SSH MAD files in GW-5.4. So, as suggested by Tino, did the following steps:
    gem uninstall net-ssh net-sftp
    
    gem install net-ssh --version '< 2.0.0'
    gem install net-sftp --version '< 2.0.0'
    

  • Re-starting gwd and re-submitting the job was successful. Executed transfer MAD manually and this worked too.
    $ sudo -u gwadmin gw_tm_mad_ssh
    INIT 50 - - - -
    INIT - - SUCCESS -
    

Attachments