Tag Archives: geek

Short-read alignment on the Raspberry Pi

This week I invested a little bit of spare cash in the Raspberry Pi.  Now that there’s no waiting time for these, I bought mine from Farnell’s Element 14 site, complete with a case, a copy of Raspbian on SD card and a USB power supply.  Total costs, about 50 quid.

First impressions are that it is a great little piece of hardware. I’ve always considered playing with an Arduino, but the Pi fits nicely into my existing skill set.  It did get connected to the TV briefly just to watch a tiny machine driving a 37″ flatscreen TV via HDMI.  I’m sure it’s just great, if your sofa isn’t quite as far away from the TV as mine is. So with sshd enabled on the Pi it is currently sat on the mantlepiece, blinking lights flashing, running headless.

The first thing it occurred to me to do was to do some benchmarking.  What I was interested in is the capacity of the machine to do real world work.  I’m an NGS bioinformatician so the the obvious thing to do was to throw some data at it through some short-read aligners.

I’m used to human exome data, or RNA-Seq data that generally encompasses quite a few HiSeq lanes, and used to processing them in large enough amounts that I need a few servers to do it.  I did wonder however whether the Pi might have enough grunt for smaller tasks, such as small gene panels, or bacterial genomes.  Primarily this is because I’ve got a new project at work which uses in solution hybridisation and sequencing to identify pathogens in clinical samples, and it occurred to me that the computing requirements probably aren’t the same as what I’m used to.

The first thing I did was to take some data from wgsim generated from an E.coli genome to test out paired-end alignment on 100bp reads.

Initially I thought I would try to get Bowtie2 working, on the grounds that I wasn’t really intending to do anything other than read mapping and I am under the impression it’s still faster than BWA.  BWA does tend to be my go-to aligner for mammalian data.  However I quickly ran into the fact that there is no armhf build of bowtie2 in the Raspbian repository.  Code downloaded I was struggling to get it to compile from source, and in the middle of setting up a cross-compiling environment so I could do the compilation on my much more powerful EeePC 1000HE(!) it occurred that someone might have been foolish enough to try this before.  And they had.  The fact is that bowtie2 requires a CPU with an SSE instruction set – i.e. Intel.  So whilst it might work on the Atom CPU in the EeePC it’s a complete non starter on the ARM chip in the Pi.

Bowtie1 however is in the Rasbpian repository.  And I generated 1×10^6 reads as a test dataset after seeing that it was aligning the 1000 read dataset from bowtie with some speed.  This took 55 minutes.

I then picked out a real-world E.coli dataset from the CLC Bio website.  Generated on the GAIIx, these are 36bp PE reads, around 2.6×10^6 of them.

BWA 0.6.2 is also available from the Raspbian repos (which is more up to date than the version in the Xubuntu distro I notice, probably because Raspbian is tracking the current ‘testing’ release, Wheezy).

So I did a full paired end alignment of this real world data, making sure both output to SAM.  I quickly ran out of space on my 4GB SD card, so all data was written out to an 8GB attached USB thumb drive.

Bowtie1 took just over an hour to align this data (note reads and genome for alignment are from completely different E.coli strains)

Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Seeded quality full-index search: 01:01:31
# reads processed: 2622382
# reads with at least one reported alignment: 1632341 (62.25%)
# reads that failed to align: 990041 (37.75%)
Reported 1632341 paired-end alignments to 1 output stream(s)
Time searching: 01:01:32
Overall time: 01:01:32

I was a little surprised that actually BWA managed to do this a little faster (please note aligners were run with default options).  I only captured the start and end of this process for BWA.

Align start: Sat Jan 26 22:36:06 GMT 2013
Align end: Sat Jan 26 23:29:31 GMT 2013

Which brings the total alignment time for BWA to 53 minutes and 25 seconds.

Anyway it was just a little play to see how things stacked up.  I think it’s fantastic that a little machine   like the Pi has enough power to do anything like this.  It’s probably more of a comment on the fact that the people behind the aligners have managed to write such efficient code that this can be done without exceeding the 512Mb of RAM.  Bowtie memory usage was apparently lower than BWA though during running tests.

I always thought that the ‘missing aspect’ of DIYbio was getting people involved with bioinformatics, instead the community seemed desperate to follow overly ambitious plans to get involved in synthetic biology.  And it seemed to me that DIYbio should sit in the same amateur space that amateur astronomy does (i.e. within the limitations of equipment that you can buy without having to equip a laboratory).  And for a low cost entry into Linux, with enough grunt to play with NGS tools and publicly available data, it’s hard to fault the very compact Raspberry Pi. Now I just need to see exactly where the performance limits are!

SuperMondays – the oxymoron of face to face geek social networking

So this evening I went to my first SuperMondays event.  What is SuperMondays you ask?  Well it’s a social networking event for geeks in the North East.

One of the things I’ve always been vaguely jealous of is the amount of these kinds of events that seem to exist in the USA – there’s a meetup for everything whether you’re interested in tech, science, hacking, or publishing.  People get together, talks are given, people interact over food or a coffee (or a beer if you’re lucky).

I used to go to 2600 and alt.ph.uk meetings back in my impressionable younger days, so outside of scientific conferences this is the first opportunity I’ve taken to sit in a room with a bunch of like minded people outside of my day to day work  to chew the fat on tech for an awfully long time.  This months theme (for the meetings are most definitely monthly) was databases.  Now I can’t get terribly excited about databases per se – SQL is fugly, I prefer MySQL over PostgreSQL for ease of use rather than functionality and these days if I could do it in SQLite I probably would, but nevertheless there was a really nice series of three talks in this themed session.

Ross Cooney (SuperMondays organiser extraordinaire and @rosscooney on Twitter) gave a speedy history of the database world, and a quick reminder of the things I have already forgotten about databases after not doing a lot of db development recently (like what ACID stands for – no it’s not an HTML compliance test, or a drug (you crazy Berkeley hippies)) and introduced the other two speakers for the evening.

David Lavery followed next (@dlavery62) with a review of both SimpleDB from Amazon Web Services and Google BigTable two cloud offerings for the post-RDBMS database world.  I particularly enjoyed the SimpleDB part of the talk, anything delivered via a RESTful interface (don’t bother trying to convince me it’s not really RESTful, I could not care less) looks like a good thing to me after trying to deal with the SOAP webservices world last year.

The final talk was of a far more academic slant with David Livingstone of Northumbria University who presented RAQUEL which is an open source implementation of some of the ideas in The Third Manifesto, which appears at first glance to be an ‘RDBMS done right’ according to modern relational theory (and not affected by legacy cruft from current popular SQL implementations).  Part middleware, part programming language, part educational tool I would like to have heard a little more about the implementation here.  We were treated to a lot of syntactical details (which had me in mind of a cross of SQL, Perl and R and therefore maybe not something you would want to necessarily spend all day doing), but they’ve only just released this to the world and are looking for people to engage and interact with their foray into OSS development.  It certainly generated the most questions from the gathered geeks!

After these a roadmap for the future SuperMondays was presented.  Although this was my first SuperMonday event, it was in fact their 12th.  It may have started in a (very nice!) restaurant in Newcastle a year ago around a table, but there were maybe 80 people in the theatre tonight which suggests it is going from strength to strength.  Newly incorporated as a Community Interest Company (saving buckets of paperwork over being a charitable organisation) the future for SuperMondays looks very bright indeed.  Very much looking forward to the next one!

Yeah, there’s no oxymoron of a face to face geek event, but if you only saw the tagline in your RSS reader maybe you read a little further because of it ;)  I should also say cheers to the Newcastle ARCSOC students who I had a couple of drinks with afterwards too (depriving myself of further SuperMondays sandwiches in the process), it was nice to see you all again!

You can also find SuperMondays on Twitter (@supermondays) and on Facebook too!