wiki:GalaxyUsingToolWrappersFromCommandLine
Last modified 6 years ago Last modified on 03/02/12 12:10:27

All of the galaxy tools available in web interface can be run from command-line as well. This is not required for regular galaxy users, but this is certainly useful for power-users, developers and sysadmins to improve their galaxy configuration. For example, in some cases it might be desirable to run galaxy jobs using interactive qrsh sessions to debug certain issues. Here is a short introduction on how to do that.

Setting up shell environment

One needs to configure their shell environment properly by setting up appropriate PATH and LD_LIBRARY_PATH environment variables. The UAB galaxy application uses environment modules to setup appropriate shell environment on the cluster. These environment modules are loaded automatically for the 'galaxy' user using .bashrc file. These modules are available in system-wide location and hence they can be loaded by other users as well.

Galaxy module files

The galaxy module files are split into several specific module files, e.g. galaxy-drmaa, galaxy-python etc. All of these separate module files are then loaded by the main galaxy module file which sets up entire galaxy environment. For running galaxy tools from command line we use the main module file called - galaxy-command-line. This will set up appropriate shell environment for running galaxy tools from the command line.

  • Load galaxy-command-line module:
    $ module load galaxy/galaxy-command-line
    

Note: One can load specific galaxy module files separately instead of loading up the main galaxy file, however usually one won't need such approach.

Submitting jobs

Once the galaxy-command-line module is loaded then galaxy tools can be used from the command line as well. TODO: usage example. Please refer to Cheaha's getting-started page for more information on environment modules and job submission scripts.

Interactive mode

Start qrsh session (qrsh accepts most of the qsub options) and run your program/command. Don't use qsub after logging in using qrsh session. You can run your program/command directly on the assigned node.

  • Following is an example which requests a node for 5-hours run-time with 4 SMP cores and 3GB max. memory on each core.
    # Run this command from cheaha login node
    $ qrsh -pe smp 4 -l vf=3G,h_rt=5:00:00,s_rt=5:00:00,h_vmem=3G -m be -M pavgi@uab.edu
    
    
  • Switch to $UABGRID_SCRATCH space for running a job
    $ cd $UABGRID_SCRATCH/jobsdir
    
  • After getting access to the node, load the necessary galaxy module file and then run any of the galaxy specific commands.
    # Run this command from qrsh assigned compute node
    $ module load galaxy/galaxy-command-line 
    
    
  • Example usage - running cufflinks wrapper command
    $ python /share/apps/galaxy/galaxy-uab/tools/ngs_rna/cufflinks_wrapper.py  \
    --input=/lustre/project/galaxy/staging/009/dataset_1234.dat  \
    --assembled-isoforms-output=/lustre/scratch/pavgi/jobsdir/cufflinks-output_1234.dat  \
    --num-threads="4"  -I 300000  -F 0.05   -j 0.05   -N -b --ref_file="None" \
    --dbkey=hg19  --index_dir=/share/apps/galaxy/galaxy-uab/tool-data 
    
    

Non-interactive mode

A job can be submitted in a non-interactive (batch) mode using qsub command. The qsub command accepts a job script which contains actual user-command (e.g. cufflinks) and SGE related parameters (memory, run-time etc) as well. Following is an example of job script that can be used with a qsub command:

#!/bin/sh

# Lines starting with '#$' define special SGE specific settings; Rest of the lines starting with '#'  are comments. 

## Shell environment 
# Shell environment to use for job execution - galaxy uses bourne 'sh' shell
#$ -S /bin/sh

# All environment variables active in a shell where qsub command is run will be available during job run 
#$ -V
## 


# Email notification options - Sends email before (b) starting job and after ending (e) as well
#$ -m be
#$ -M blazerid@uab.edu

## Working directory 
#  Use current working directory to run this job; if not specified by default $HOME will be used 
#$ -cwd
## 

##  Parallel environment 
# Use 4 cores in SMP parallel environment 
#$ -pe smp 4
## 

## Memory limits
#  Virtual free memory - job won't start until specified vf is available 
#$ -l vf=5.9G

# Hard limit on virtual memory - Job will be aborted if it tries to exceed following limit 
#$ -l h_vmem=6G
## Keep vf and h_vmem close to each other to limit swap usage

## Run-time limits 
# Soft run-time limit: Job will be given (gracefull) kill signal when following time limit is reached 
#$ -l s_rt=144:55:00
# Hard run-time limit: Job will be killed if it is still running 
#$ -l h_rt=143:00:00
## At soft run-time limit the job receives a (graceful) kill signal; if the job doesn't exit until hard run-time limit is reached then 
## it is killed with -9 signal immediately  

## Standard error and Standard out 
# Save stderr and stdout with job id prefix in current working directory 
#$ -e $JOB_ID.err.txt
#$ -o $JOB_ID.out.txt

# Galaxy specific commands start here 

# Load galaxy-developer module 
module load galaxy/galaxy-developer 

python /share/apps/galaxy/galaxy-uab/tools/ngs_rna/cufflinks_wrapper.py  \
--input=/lustre/project/galaxy/staging/009/dataset_1234.dat  \
--assembled-isoforms-output=/lustre/scratch/pavgi/jobsdir/cufflinks-output_1234.dat  \
--num-threads="4"  -I 300000  -F 0.05   -j 0.05   -N -b --ref_file="None" \
--dbkey=hg19  --index_dir=/share/apps/galaxy/galaxy-uab/tool-data