Installing bcl2fastq from source without bonkers symlinking or copying libraries to other directories.

 

Between my final year of PhD (when I was also working as IBERS and IMAPS HPC SysAdmin) my life consisted of installing software into a module type system since HPC environments have multiple pieces of software installed, often the same software but with different versions and you fundamentally do not use a package manager for user software as it'll cause you a world of pain. One annoyance I always found was when you look for help when a piece of software fails to compile and you get an answer that boils down to;

yum install PACKAGE 

or

apt-get install PACKAGE

And this happened today. bcl2fastq, the Illumina software is all kinds of fun and games (USING MAKE FILES TO DEMULTIPLEX RAW DATA???? WHY?????). The install instructions are your usual ./configure/make....so you do your ./configure and all is well until you get the following error...

boost-1_44_0 installed successfully
-- Successfuly built boost 1.44.0 from the distribution package...
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Using unsigned short
-- Check if the system is big endian - little endian
-- Looking for floorf
-- Looking for floorf - found
-- Looking for round
-- Looking for round - found
-- Looking for roundf
-- Looking for roundf - found
-- Looking for powf
-- Looking for powf - found
-- Looking for erf
-- Looking for erf - found
-- Looking for erf
-- Looking for erf - found
-- Looking for erfc
-- Looking for erfc - found
-- Looking for erfc
-- Looking for erfc - found
CMake Error at cmake/cxxConfigure.cmake:74 (message):
  No support for gzip compression
Call Stack (most recent call first):
  c++/CMakeLists.txt:33 (include)


-- Configuring incomplete, errors occurred!
Couldn't configure the project:

/software/testing/bcl2fastq/1.8.4/build/bootstrap/bin/cmake -H"/software/testing/bcl2fastq/1.8.4/src/bcl2fastq/src" -B"/software/testing/bcl2fastq/1.8.4/build" -G"Unix Makefiles"  -DCASAVA_PREFIX:PATH=/software/testing/bcl2fastq/1.8.4/x86_64 -DCASAVA_EXEC_PREFIX:PATH= -DCMAKE_INSTALL_PREFIX:PATH=/software/testing/bcl2fastq/1.8.4/x86_64 -DCASAVA_BINDIR:PATH= -DCASAVA_LIBDIR:PATH= -DCASAVA_LIBEXECDIR:PATH= -DCASAVA_INCLUDEDIR:PATH= -DCASAVA_DATADIR:PATH= -DCASAVA_DOCDIR:PATH= -DCASAVA_MANDIR:PATH= -DCMAKE_BUILD_TYPE:STRING=RelWithDebInfo

Moving CMakeCache.txt to CMakeCache.txt.removed

My first thought was to quickly look to see if we have libz libraries installed on the HPC as a module, and they're not. Fair enough. I then wondered why there wasn't a libz library already installed by the OS, and there is but it seems to be different between the software node (a node dedicated to installing software so as not to annoy folk who are on the login node) and the compute nodes. So pointing to /lib64 would probably not work (it might if bcl2fastq is doing a static binary, I've not checked).

[[email protected]]$ locate libz
/lib64/libz.so.1
/lib64/libz.so.1.2.3
[[email protected]]$ locate libz
-bash: locate: command not found
[[email protected]]$ echo "grrr"
grrr
[[email protected]]$ ls -lath /lib64/libz*
lrwxrwxrwx 1 root root  13 Dec 13 14:09 /lib64/libz.so.1 -> libz.so.1.2.7
-rwxr-xr-x 1 root root 89K Nov  5 18:09 /lib64/libz.so.1.2.7

Okay so after a quick google I get the same old hacky responses;

https://biogist.wordpress.com/2012/10/23/casava-1-8-2-installation/

https://www.biostars.org/p/11202/

http://seqanswers.com/forums/showthread.php?t=11106

My next thought then was how about I just install libz from source. So,

[[email protected]]$ source git-1.8.1.2
[[email protected]]$ which git
[[email protected]]$ git clone https://github.com/madler/zlib.git

YES, EVEN GIT HAS VERSIONS!!! yum/apt isn't the answer to everything!

[[email protected] ]$ cd zlib/
[[email protected] ]$ ./configure
[[email protected] ]$ make -j4
[[email protected] ]$ ls -lath libz*s*
lrwxrwx--- 1 martin JIC_c1 14 Jan 27 16:11 libz.so.1 -> libz.so.1.2.11
lrwxrwx--- 1 martin JIC_c1 14 Jan 27 16:11 libz.so -> libz.so.1.2.11
-rwxrwx--x 1 martin JIC_c1 103K Jan 27 16:11 libz.so.1.2.11

And that's great. We now have our libraries compiled, just need to let my bash shell know where they are;

export LIBRARY_PATH=/software/testing/bcl2fastq/1.8.4/lib/zlib

and then back into my bcl2fastq build directory, rerun ./configure --prefix=/where/the/bins/go and it compiled.

All done without a yummy apt....it's Friday and I need to go home.

My Sun Grid Engine (SGE) Setup and Admin Notes

Queue management software for HPC

These are more for myself than anything else, but someone might find them useful as it's difficult to find SGE documentation amid all the LSF stuff too. e.g. qsub is the same for both, but different parameters.

Initial Configuration
With a default installation of SGE, here are the changes made for the cluster in IMAPS;

Fair-share functional policy
This may not be the best, however it is better than FIFO.

Activate the functional share policy by specifying the number of functional share tickets. The value is arbitrary in that any value greater than zero will trigger it. But it needs to be suitably larger so that they can be shared out to users evenly. e.g. if you have 10 users, and each has 100 tickets, then it needs to be at least 10000. So, as root (or SGE admin), edit the config file;

[[email protected] ~]# qconf -msconf

and set the following;

weight_tickets_functional 1000000

Now we need to assign users some tickets. To do this, edit a different config;

[[email protected] ~]# qconf -mconf

and set the following;

enforce_user auto
auto_user_fshare 100

This gives each user 100 tickets.

NOTE: When I did this I played around a lot to see its behaviour. I found that once a user has submitted a job through SGE without the assignment of 100 tickets (e.g. if you remove that to see what happens), they will never get these assigned after you turn it on. So you need to add the shares for that user manually. (there may be a better way).

[[email protected] ~]# qconf -muser username

and that will give you something like;

name user
oticket 0
fshare 100
delete_time 1358000970
default_project NONE 

Setting up memory allocation

This is simple in practice, but it has a couple of issues to be aware of. I did this using the popular h_vmem method. However, I may change this at some point. The reason being is that it assumes that h_vmem is both what we want to use, and what the limit is. This may not be the case. e.g. If you have a job that initialises for a few mins, peaks at 4GB, but then only uses 2G for the next 2 weeks, then it'd be a waste of resources for an eight core machine with 16GB ram (e.g. the nodes I have on the IMAPS machine). For now this will be the case until I can give it some serious thought.

First you need to make sure that h_vmem is a consumable resource;

[[email protected] ~]# qconf -mc

#name shortcut type relop requestable consumable default urgency
#----------------------------------------------------------------------------------------
..
h_vmem h_vmem MEMORY <= YES YES 0 0

Now that you've done that, you need to add this resource as a complex to each node like so. (clearly you can write a script to do it to every node).

[[email protected] ~]# qconf -me node001

hostname node001
load_scaling NONE
complex_values h_vmem=16G
user_lists NONE
xuser_lists NONE
projects NONE
xprojects NONE
usage_scaling NONE
report_variables NONE

As you can see, I've added h_vmem=16G to node001. This is the amount of consumable memory that can be allocated.

WARNING: Once you do this, h_vmem HAS to be set on all jobs, otherwise they will fail. To combat this for the forgetful, lets add a default value by editing the sge_request file. So locate and open the file in your editor;

[[email protected] ~]# which qsub
/cvos/shared/apps/sge/6.1/bin/lx26-amd64/qsub

[[email protected] ~]# vi /cvos/shared/apps/sge/6.1/default/common/sge_request

Add to the bottom the following;

# default memory limit 
-l h_vmem=2G

And this will now give a 2G limit to every job unless otherwise stated.

Issue with IDL and h_vmem

So there is an issue with this method, for some reason IDL won't start even when specifying a very large amount of memory. This is all to do with the h_stack flag. To stop this being an issue, add the following line to the sge_request file;

# default stack size (otherwise IDL and Matlab fail to start)
-l h_stack=128m

Clearly if the stack size is not enough for some programs, then users can specify a larger stack using the -l h_stack flag. I did have one user running some perl code that needed 512mb stack space.

Allowing reservations

qconf -msconf

Change max_reservation from 0 to a number, in this case I've chosen 32

Change default_duration from INFINITY to something very long,

Finding that a node isn't accepting jobs

If you discover that a node isn't accepting jobs, here is what it may be;

qstat -f //gives full status of nodes

If you see;

[[email protected]]# qstat -f
queuename qtype resv/used/tot. load_avg arch states
----------------------------------------------------------------
[email protected] BIP 0/9/32 12.78 lx26-amd64 
[email protected] BIP 0/0/32 0.0 lx26-amd64 d

Then we know that node02 is disabled. To bring it back on we basically re-enable a queue, which goes through and enables all nodes in that queue;

[[email protected]]# qmod -e queue.q

Queue queue "queue-node01.q" has been enabled by [email protected]

root - queue "queue-node02.q" is already enabled

You can also of course disable a queue;

[[email protected]]# qmod -d queue.q

NB: This needs to be run on your master node as root.

Changing nodes in a queue

OK, this is simple. Just type;

qconf -mq queue.q

This will bring up a vi like editor. Then change the second line;

hostlist node05.cluster node06.cluster node07.cluster

...to whatever nodes you wish to have on it.

Jobs pending

So you notice that there are lots of things in the queue pending. To check what why a job isn't running type;

qacct -j JOBNUMBER

And it will tell you why.

If you see lots of;

queue "all.q" dropped because it is temporarily not available queue "all.q" dropped
because it is temporarily not available queue "all.q" dropped because it is 
temporarily not available

Then type:

qstat -f

This will tell you the status of nodes. If they are in E status it will also tell you why. Usually that a job caused it to stop. e.g.

queue queue.q marked QERROR as result of job 123456's failure at host at node001

You can move them out of error by typing;

qmod -cq all.q

Move queue

If you would like to move a job from one queue to another, you can do this by;

qalter -q all.q 173143

Where all.q is the queue you wish to move it to, and 173143 is the job-id number.

Getting some stats

Basically I wanted to measure the performance of various job submissions with different thread counts. Each run producted an output and error file in the form file_threadnumber.o12345, where 12345 is the job number and threadnumber is the number of threads the program used. This, quite large, one liner, does the following;

Lists the *.o* files
Gets the job number
Gets the job details and searches for vmem, start and end times
Removes the line return so that everything remains on one line
Sorts the whole lot by thread number
Splits up the times into seconds, calculates the time taken and prints everything.

ls -1 *.o* | awk '{split($1,a,"_"); split(a[3],b,"."); split(b[2],c,"o"); printf "Threads "b[1]" "; system("qacct -j "c[2]" | grep \"vmem\\|start_time\\|end_time\" | tr -d \"\\n\""); print ""}' | sort -nk 2 | awk '{split($7,a,":"); split($12,b,":"); print $1" "$2" "(b[1]*60*60+b[2]*60+b[3])-(a[1]*60*60+a[2]*60+a[3])" "$14}'

Probably a bit long winded, but who doesn't like a good one-liner...

Further Reading
Things I have found useful;

http://ait.web.psi.ch/services/linux/hpc/merlin3/sge/admin/

Home