Short-read alignment on the Raspberry Pi

This week I invested a little bit of spare cash in the Raspberry Pi.  Now that there’s no waiting time for these, I bought mine from Farnell’s Element 14 site, complete with a case, a copy of Raspbian on SD card and a USB power supply.  Total costs, about 50 quid.

First impressions are that it is a great little piece of hardware. I’ve always considered playing with an Arduino, but the Pi fits nicely into my existing skill set.  It did get connected to the TV briefly just to watch a tiny machine driving a 37″ flatscreen TV via HDMI.  I’m sure it’s just great, if your sofa isn’t quite as far away from the TV as mine is. So with sshd enabled on the Pi it is currently sat on the mantlepiece, blinking lights flashing, running headless.

The first thing it occurred to me to do was to do some benchmarking.  What I was interested in is the capacity of the machine to do real world work.  I’m an NGS bioinformatician so the the obvious thing to do was to throw some data at it through some short-read aligners.

I’m used to human exome data, or RNA-Seq data that generally encompasses quite a few HiSeq lanes, and used to processing them in large enough amounts that I need a few servers to do it.  I did wonder however whether the Pi might have enough grunt for smaller tasks, such as small gene panels, or bacterial genomes.  Primarily this is because I’ve got a new project at work which uses in solution hybridisation and sequencing to identify pathogens in clinical samples, and it occurred to me that the computing requirements probably aren’t the same as what I’m used to.

The first thing I did was to take some data from wgsim generated from an E.coli genome to test out paired-end alignment on 100bp reads.

Initially I thought I would try to get Bowtie2 working, on the grounds that I wasn’t really intending to do anything other than read mapping and I am under the impression it’s still faster than BWA.  BWA does tend to be my go-to aligner for mammalian data.  However I quickly ran into the fact that there is no armhf build of bowtie2 in the Raspbian repository.  Code downloaded I was struggling to get it to compile from source, and in the middle of setting up a cross-compiling environment so I could do the compilation on my much more powerful EeePC 1000HE(!) it occurred that someone might have been foolish enough to try this before.  And they had.  The fact is that bowtie2 requires a CPU with an SSE instruction set – i.e. Intel.  So whilst it might work on the Atom CPU in the EeePC it’s a complete non starter on the ARM chip in the Pi.

Bowtie1 however is in the Rasbpian repository.  And I generated 1×10^6 reads as a test dataset after seeing that it was aligning the 1000 read dataset from bowtie with some speed.  This took 55 minutes.

I then picked out a real-world E.coli dataset from the CLC Bio website.  Generated on the GAIIx, these are 36bp PE reads, around 2.6×10^6 of them.

BWA 0.6.2 is also available from the Raspbian repos (which is more up to date than the version in the Xubuntu distro I notice, probably because Raspbian is tracking the current ‘testing’ release, Wheezy).

So I did a full paired end alignment of this real world data, making sure both output to SAM.  I quickly ran out of space on my 4GB SD card, so all data was written out to an 8GB attached USB thumb drive.

Bowtie1 took just over an hour to align this data (note reads and genome for alignment are from completely different E.coli strains)

Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Seeded quality full-index search: 01:01:31
# reads processed: 2622382
# reads with at least one reported alignment: 1632341 (62.25%)
# reads that failed to align: 990041 (37.75%)
Reported 1632341 paired-end alignments to 1 output stream(s)
Time searching: 01:01:32
Overall time: 01:01:32

I was a little surprised that actually BWA managed to do this a little faster (please note aligners were run with default options).  I only captured the start and end of this process for BWA.

Align start: Sat Jan 26 22:36:06 GMT 2013
Align end: Sat Jan 26 23:29:31 GMT 2013

Which brings the total alignment time for BWA to 53 minutes and 25 seconds.

Anyway it was just a little play to see how things stacked up.  I think it’s fantastic that a little machine   like the Pi has enough power to do anything like this.  It’s probably more of a comment on the fact that the people behind the aligners have managed to write such efficient code that this can be done without exceeding the 512Mb of RAM.  Bowtie memory usage was apparently lower than BWA though during running tests.

I always thought that the ‘missing aspect’ of DIYbio was getting people involved with bioinformatics, instead the community seemed desperate to follow overly ambitious plans to get involved in synthetic biology.  And it seemed to me that DIYbio should sit in the same amateur space that amateur astronomy does (i.e. within the limitations of equipment that you can buy without having to equip a laboratory).  And for a low cost entry into Linux, with enough grunt to play with NGS tools and publicly available data, it’s hard to fault the very compact Raspberry Pi. Now I just need to see exactly where the performance limits are!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.