OGT NGS Team seeking a placement student

1330361778Edit: We found ourselves a student, and we’ve welcomed Agatha Treveil to the team!

Every year we take on a placement/sandwich student from a UK University and have them work within the NGS Team in the Computational Biology group at Oxford Gene Technology.

This year is no different, and we’re looking for someone to start somewhere in the July/August 2014 window to work with us for a year.

Our previous students have come from a number of institutions, and one has not only won a Cogent Life Sciences skills award for “Placement of the Year”, but has also joined the NGS Team after graduating. Traditionally we’ve recruited from Natural Sciences, Biomedical Sciences and Biochemistry undergraduate courses – no previous bioinformatics or programming skills are required, just a willingness to deal with lots of DNA and RNA-Seq data in a commercial environment. The post involves exome sequencing, targeted resequencing and RNA-Seq analysis (whole transcriptome, small RNA) and would be coming in at an exciting point as we start to broaden our NGS services.

This is very much a ‘hands on’ position, and we specialise in taking in biologists and having them leave as bioinformaticians.

Here’s the full text of the advert:

Computational Biology (salary approx £14K p.a)
We are looking for a candidate who has (or would like to develop) programming skills and an interest in biology; is keen to learn new techniques, gain experience in a biotechnology company, and who has a particular interest in the interface of numerical methods and life sciences.

The placement will potentially involve projects in all of our scientific areas, but will begin with a focus on Next Generation Sequencing analysis where the student will learn pipeline development and data analysis. The student will gain experience working with clinical and academic scientists, and an understanding of the data processing aspects of the most exciting experimental technologies currently being applied in biological and clinical research and practice.

To apply for this position, please send your CV and covering letter (clearly stating you wish to apply for the industrial placement and which position you are applying for) to hr@ogt.com or by post to:
HR, Oxford Gene Technology, Begbroke Science Park, Begbroke Hill, Woodstock Road, Begbroke, Oxfordshire, OX5 1PF

Closing date for entries 30 April 2014
For further information about OGT please visit http://www.ogt.com/

Two posts available at Oxford Gene Technology

So we have two positions currently open at OGT – one in the Computational Biology group and one as a Product Manager for Next Generation Sequencing.  The post reports to the Directory of Strategic Marketing at OGT and is aimed at an experienced product manager, preferably with experience in the NGS space.  You can read about that position by following the link.

The position within the Computational Biology group is not a full-time NGS position (and therefore you may be pleased to know, would not report directly to me!).  The previous post holder was responsible for a great deal of the array analysis that is required to develop microarray products (a core part of OGT’s business) but was called upon to support NGS work when required.  Candidates with a good statistical grounding are being sought, and experience with microarray data analysis is advantageous.  The full post description can be found by following the link.

CV’s for both posts to be directed to hr@ogt.com

Short-read alignment on the Raspberry Pi

This week I invested a little bit of spare cash in the Raspberry Pi.  Now that there’s no waiting time for these, I bought mine from Farnell’s Element 14 site, complete with a case, a copy of Raspbian on SD card and a USB power supply.  Total costs, about 50 quid.

First impressions are that it is a great little piece of hardware. I’ve always considered playing with an Arduino, but the Pi fits nicely into my existing skill set.  It did get connected to the TV briefly just to watch a tiny machine driving a 37″ flatscreen TV via HDMI.  I’m sure it’s just great, if your sofa isn’t quite as far away from the TV as mine is. So with sshd enabled on the Pi it is currently sat on the mantlepiece, blinking lights flashing, running headless.

The first thing it occurred to me to do was to do some benchmarking.  What I was interested in is the capacity of the machine to do real world work.  I’m an NGS bioinformatician so the the obvious thing to do was to throw some data at it through some short-read aligners.

I’m used to human exome data, or RNA-Seq data that generally encompasses quite a few HiSeq lanes, and used to processing them in large enough amounts that I need a few servers to do it.  I did wonder however whether the Pi might have enough grunt for smaller tasks, such as small gene panels, or bacterial genomes.  Primarily this is because I’ve got a new project at work which uses in solution hybridisation and sequencing to identify pathogens in clinical samples, and it occurred to me that the computing requirements probably aren’t the same as what I’m used to.

The first thing I did was to take some data from wgsim generated from an E.coli genome to test out paired-end alignment on 100bp reads.

Initially I thought I would try to get Bowtie2 working, on the grounds that I wasn’t really intending to do anything other than read mapping and I am under the impression it’s still faster than BWA.  BWA does tend to be my go-to aligner for mammalian data.  However I quickly ran into the fact that there is no armhf build of bowtie2 in the Raspbian repository.  Code downloaded I was struggling to get it to compile from source, and in the middle of setting up a cross-compiling environment so I could do the compilation on my much more powerful EeePC 1000HE(!) it occurred that someone might have been foolish enough to try this before.  And they had.  The fact is that bowtie2 requires a CPU with an SSE instruction set – i.e. Intel.  So whilst it might work on the Atom CPU in the EeePC it’s a complete non starter on the ARM chip in the Pi.

Bowtie1 however is in the Rasbpian repository.  And I generated 1×10^6 reads as a test dataset after seeing that it was aligning the 1000 read dataset from bowtie with some speed.  This took 55 minutes.

I then picked out a real-world E.coli dataset from the CLC Bio website.  Generated on the GAIIx, these are 36bp PE reads, around 2.6×10^6 of them.

BWA 0.6.2 is also available from the Raspbian repos (which is more up to date than the version in the Xubuntu distro I notice, probably because Raspbian is tracking the current ‘testing’ release, Wheezy).

So I did a full paired end alignment of this real world data, making sure both output to SAM.  I quickly ran out of space on my 4GB SD card, so all data was written out to an 8GB attached USB thumb drive.

Bowtie1 took just over an hour to align this data (note reads and genome for alignment are from completely different E.coli strains)

Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Seeded quality full-index search: 01:01:31
# reads processed: 2622382
# reads with at least one reported alignment: 1632341 (62.25%)
# reads that failed to align: 990041 (37.75%)
Reported 1632341 paired-end alignments to 1 output stream(s)
Time searching: 01:01:32
Overall time: 01:01:32

I was a little surprised that actually BWA managed to do this a little faster (please note aligners were run with default options).  I only captured the start and end of this process for BWA.

Align start: Sat Jan 26 22:36:06 GMT 2013
Align end: Sat Jan 26 23:29:31 GMT 2013

Which brings the total alignment time for BWA to 53 minutes and 25 seconds.

Anyway it was just a little play to see how things stacked up.  I think it’s fantastic that a little machine   like the Pi has enough power to do anything like this.  It’s probably more of a comment on the fact that the people behind the aligners have managed to write such efficient code that this can be done without exceeding the 512Mb of RAM.  Bowtie memory usage was apparently lower than BWA though during running tests.

I always thought that the ‘missing aspect’ of DIYbio was getting people involved with bioinformatics, instead the community seemed desperate to follow overly ambitious plans to get involved in synthetic biology.  And it seemed to me that DIYbio should sit in the same amateur space that amateur astronomy does (i.e. within the limitations of equipment that you can buy without having to equip a laboratory).  And for a low cost entry into Linux, with enough grunt to play with NGS tools and publicly available data, it’s hard to fault the very compact Raspberry Pi. Now I just need to see exactly where the performance limits are!

Job opening at OGT for an NGS Computational Biologist

We have an opening at Oxford Gene Technology in the Computational Biology Team.  We’re looking for someone with NGS experience, either a Masters graduate with at least a years experience of NGS analysis (WGS/WES/RNA-Seq) or an early career post-doc whose PhD had a significant NGS component.

The full details of the post can be found at the link below:


One thing I can promise is that OGT is a friendly place to work, we’re an SME with around 65 employees, but have a truly global reach.  The Computational Biology group will have 8 members when this position is filled, and is therefore a significant part of the company.  There’s lots to do and plenty to get involved in.  We would like to fill this post quickly so if you fit the bill then please send CV’s to hr@ogt.co.uk

Thoughts on a year in industry

A year ago I left the safe environs of academia and decided to move to industry.  I said farewell to my final salary pension, my Mac-centric mode of life, my newly-purchased house and the place that had been my home for the last 7 years to go return to Oxford and enter a world dictated by the cold logic of business.

Why did I move?

I think quite a few people were wondering this at the time.  Aside from the fact that 2011 had started as a most abysmal year (personal issues, not professional) there were a number of factors leading to my departure, but on the face of it the move may have seemed rash.  The Bioinformatics Support Unit at Newcastle University was running very well, publications were flowing, costs were being recovered to the satisfaction of the Faculty.  I had a great set of friends, colleagues and co-workers.

One of the things about bioinformatics support work is that it is, by it’s very nature, diverse.  This is great for not getting bored day to day, but not so great if you want to specialise in a field.  My work was split mostly between arrays and NGS (and mainly arrays) alongside the financial management of the Unit, student supervision (several PhD students and Masters students) and a dozen or so odd little projects that come your way in that kind of job.

My heart however has always been with genomics.  My favourite part of my PhD was always the sequencing.  In the hot-lab, up to my elbows in acrylamide and isotopes, all for the joy of pulling the autoradiograph film from the developer and spending the next couple of hours typing it into DNAStar before applying whatever gene/exon/splice-site prediction software I had committed to that day.  The future, to me, looks like it’s going to be heavily flavoured with NGS.

I had decided a long time ago never to enter industry, the by-product of a difficult year at Glaxo as a sandwich student.  I hated the feeling of being the smallest cog in a giant, impenetrable, deeply impersonal, multinational pharma.  From the people who I saw there, struggling with their own academia to industry transitions, to daily pickets from animal rights groups, to people who on handing in their notice, were marched from their offices to be dispatched from the premises without even a chance to pick up their personal belongings.  It didn’t seem like it was such a great place to be.

In late 2010 I started to get approaches from recruiters, all the positions were with NGS firms, or NGS related firms.  Some still around, some now counting down the days to their demise.  After a couple of months of weighing up whether I wanted to commit to the jump, the perfect job advert crossed my desk.  For the first time, I phoned a recruiter.  And that job was with OGT.

A year later, I thought it might be nice to summarise what I thought of the change.

What does the role entail?

No longer ‘bioinformatician at large’ I now have responsibility for developing  and returning the data to academic and commercial customers from our NGS analysis pipelines. We have built an extensive exome analysis pipeline which analyses not just exome samples, but also does comprehensive trio analysis and analyses cancer samples. A lot of data passes through this pipeline, and I couldn’t have done it without my fantastic sandwich student David Blaney, who I hope has had a much better year out  in industry than I did.  We’ve built an RNA-Seq pipeline too, shortly to be launched as a service.

I’m involved in a number of grant programmes internally, from solid tumour cancer diagnostics for stratified medicine, to pathogen screening and host/pathogen interactions – all from an NGS perspective.  We have a Genomics Biomarkers team as well, and they obviously have an increasing need for NGS approaches.

So what is the same?

Well I’m still doing bioinformatics. Arguably I’m doing more bioinformatics than in my previous role. I still get to interact with customers, although this took a while to be direct, rather than mediated via the sales team. I think you have to earn a certain amount of trust when entering a new role, but having done nothing but talk to customers for 7 years, I didn’t initially appreciate that there might be good procedural reasons for having an intermediate layer of communications with customers.

This is still one of the most satisfying parts of the role, delivering results and analyses back to researchers or commercial customers is great. Especially when you’re getting great feedback back about the quality of the data, and the findings from it.  Even better when they come up at a conference, shake your hand and tell you about the papers that have been submitted.

This is one thing about doing a lot of exome sequencing work for rare diseases – you get a lot of diagnostic power, and consequently a lot of hits. My name still goes on papers, we have just had a paper accepted that comes out in the AJHG in August and favourable noises from a pre-submission enquiry with a very high-impact journal for another.  Both exhibiting (we believe) absolutely novel classes of discovery from exome data.

What is different?

I talk to people from a much wider background at work. No longer talking to just biologists and computer scientists and fellow bioinformaticians, I now get to talk to enthusiastic people in the sales and marketing departments. I’m now much more intimately connected to the lab again, thanks to the both the R&D and services work.

It helps that OGT has a touch over 60 employees, it’s small enough to feel genially personable. I actually get to talk to the VP’s and CEO. Reguarly.

I get to travel more. This was something of a self-imposed rule at Newcastle – when you’re managing your own finances, trips to conferences don’t do much for the balance sheet. They simply don’t generate any revenue. Now the reasons have a much more financial focus, if I go away, I go away with one of the sales team. We do roadshows, conferences. I am now one of those people who stands on the company booth and talks to you, rather than the person who goes to a conference to listen to talks. However you are there to generate leads, not listen to talks. The cost of going must be balanced against the gains from the leads.

This is another aspect that has been very different. I have had an increasing interest in the business side of the life sciences for some time, but lacking any practical experience. This is now changing, I now understand how a business operates, what the margins need to be on a sale, the balance between selling products and selling services.

Because of the size of OGT I get exposure to this, I doubt it would happen in a larger company. I get involved in product development, I help to write product profiles, I’ve developed, and continue to develop, marketing materials for the website. These are all new skills for me, and I love to learn.

Another thing I’ve noticed is the makeup of the company is very different to academia. I work in a phenomenally talented group of computational biologists, who are skilled in software design, software development and all facets of bioinformatics analysis.  But not everyone has a PhD. Not everyone has a biology background.  And these are things I took for granted  in academia. If anything I have become more and more convinced that a PhD is of little consequence, especially for people who, like me, have switched discipline after getting it.  My colleagues are people who have worked in the more quantitative fields of accounting or investment banking, but retooled for bioinformatics, and have done so with aplomb.

Social networking changes

I think most people who interact with me online will have noticed that I don’t blog, tweet or participate in BioStar as much anymore.  I spend a lot of time on SeqAnswers, and my RSS reader is now top heavy with NGS related blogs, but participation is down.

There  are just commercial pressures which mean I can’t always blog about what I’m doing, and believe me there are some things at work I do under CDA/NDA that I would really love to talk about, but it’s not that I can’t blog about it, I can’t even talk to you about it over a pint of beer.  This is something I have had to accept about the commercial environment.  The IT policy at work is incredibly strict, to maintain the ISO information security standards that we have.  I’ve learned to adapt to this, and the Windows-centric environment.

The biggest issue though? Inability to get to papers.  Oh how I took for granted the access to papers I had at Newcastle.  I just want to say a big thank you to everyone who has sent me a paper on request in the last year, you have been invaluable to me, and it is deeply appreciated.

Was it worth it?

Absolutely.  Life at OGT is hectic, pressured but deeply rewarding.  I have the focus that I wanted, but with the diversity of a new set of challenges.  I think I’ve been very lucky to settle into a company that is the perfect size and makeup to transition gently from academia into the commercial world.  It might not be for everyone, but I will say I wish I had done it sooner.  I harboured doubts about industry, but they were predicated on my experiences with a giant company.  Sitting now in a position that is in a long-established  SME that is on a sound financial footing (as opposed to giant multinational, or precarious start-up), I wonder what I was concerned about.

Bioinformatician post at NEBC, CEH Wallingford

This came through on a mailing list from the group I used to work for whilst I was at CEH in Oxford, and I thought it might be of interest:


We are recruiting for a new bioinformatician to join the NEBC group at
CEH Wallingford (nebc.nerc.ac.uk).  This is an open-ended position
available immediately.  We think this role might be ideally suited to an
ambitious new postdoc with experience of biological data analysis (esp.
high-throughput sequencing, metagenomics) and programming knowledge in
Linux.  If you know of anyone who fits this description please pass this
mail on to them.  The deadline for applications is 3rd April.

The successful candidate will primarily be working on data analysis,
development of new tools and scripts, and developing documentation and
training resources as part of our NBAF-W remit
(nbaf.nerc.ac.uk/nbaf-wallingford).  There will also be the opportunity
in time to branch out and become involved in the range of group
activities including involvement in megasequencing projects
(www.microb3.eugenomicobservatories.org), data standards (gensc.org,
mibbi.orgenvironmentontology.org) data sharing (biosharing.org,
isa-tools.org), cloud computing platforms (cloudbiolinux.org) and more.

Full information on the role, requirements, how to apply, and links to
further information are all on the CEH website:


Please feel free to contact us directly with any queries related to the
role (admin@nebc.nerc.ac.uk).  Queries regarding the application process
should be directed to SSC.

Software Developer position open at Oxford Gene Technology

OGT has a position open for a Software Developer to work in the Computational Biology group.  Full details are from the careers page on the OGT website.

From the page the requirements are:

“We are looking for a highly motivated and innovative individual with experience in Java-based software development. The ideal candidate would combine Java and PHP programming experience with database development and administration skills. Knowledge of bioinformatics tools would be a distinct advantage. The successful candidate will be working alongside a Senior Software Developer who is overseeing the implementation of OGT’s CytoSure Interpret Software and Laboratory Information Management System (LIMS).”

The Computational Biology group is quite diverse and there’s 8 of us at the moment, dealing with everything from array designs, in-solution capture designs, microarray analysis (CGH/expression), next-generation sequencing, LIMS development and application development.  So if you perhaps fancy the idea of becoming the 9th member, please apply via our website.

An enjoyment of cryptic crosswords is not essential but helpful, training can be provided on arrival (it was for me!).

Book review: “My Beautiful Genome” by Lone Frank

After the keenly observed industry watching of Kevin Davies book “The $1000 Genome”, I decided to get a more consumerist view of DTC genetic testing and what it means at a more personal level.

The books blurb suggests it is “Sharp and funny”, but at its heart this is a biographical tale of whether it’s genetics or environment that makes you who you are. Steering clear of the medical interventionism tub-thumping, it’s more a tale of whether there is satisfaction to be gained, or insights had into your mental makeup via genetics.

The best thing I can say about this book, is that it is the reason I sent my sample off to 23andMe to be analysed. It no longer seemed sensible for me to hold out against the imperative to learn more about myself. Maybe that’s a valuable enough output.

The writing is conversational, but for me, the book is at it’s best when the sources in it are quoted verbatim. And there are many. I found myself poring over these more avidly than the rest of the text. I can’t say the humour came through for me, it’s a little dark in places, and not being a reader of biographies, maybe I wasn’t prepared for the  confessional content.

Lone Frank is lucky though, as the book goes far beyond what you or I might be able to order for a couple of hundred dollars, or more, in terms of online testing. She manages to wheedle access to far more diagnostic and actionable tests than I suspect we could manage. Not to mention enviable access to genomics pioneers, including Watson himself.

There is coverage of the ‘deep geneaology’ which I imagine is going to drive a lot of people towards personal genomics as an extension of their compulsive rifling through parish records, and a way to get a handle on family trees when the paper trails run cold.

There is much less of a focus on the underlying technology, it is the testing landscape is what is explored here, from making sure Jewish couples have healthy babies, to the genetic snake-oil of love matching via your genes (or HLA subtypes).

The second half of the book is really where the behavioural strand beds in. There’s plenty of talk of discoveries that have graced the tabloid press – “infidelity genes” and the like. To be fair, the science is well explained, and definitely layman accessible, but perhaps there is too much of the popularist and speculative bent in the later chapters. And I think the intersection of psychology and genetics is perhaps not quite advanced enough yet to stand up to much scruitiny. At least, not how it is presented here (it is not my field or forte!).

This segues into a necessary, and even discussion on modern eugenics practice, and a timely reminder that it may still be a dirty word, but it continues unabated – now just in the hands of parents and not governments.

On the whole this is a book I’d give to someone outside of work to kind of explain the field I’m in. I don’t regret spending the money on the book, or the time I invested in reading it (it didn’t take long compared to my last read!), but perhaps aimed at a more general readership unless you’re interested in the biographical ramifications of having your DNA tested, maybe not one for the bookshelf, what was a chapter of “The $1000 Genome” is spread across 300 pages here.

Book review: “The $1000 Genome” by Kevin Davies

So in my quest to do a bit of reading around the industry I now find myself in, I’ve lined up a few books.  I actually started “The $1000 Genome” in the weeks prior to my interview at OGT, and the fact that I only finished it a couple of weeks ago (5 months later) should not be taken as a reflection on the quality of the book.

I think one thing people will be asking is whether a book written in 2010 is still relevant at the tail end of 2011 in such a fast-moving industry, and I think it’s a testament to Kevin Davies writing that it is.

I haven’t read either of Kevin’s previous two books, “Cracking the Genome” (no explanation required as to what that might be about [but note that the UK title of this book is “The Sequence”]) and “Breakthrough” on the race to find the breast cancer gene, I probably will purchase these when the existing book backlog is cleared.

One thing this book has in spades is an excellent history to how we got where we are today, both in terms of the personalities and the companies that have drive the NGS revolution.  Consequently some about familiar names such as 23andMe and the small sequencing startups that were swallowed by the industry biogiants may be familiar, but the book charts them from setup to acquisition and the movement of key staff between them.  For me alone the history of NGS and emergent personal genomics is probably worth the cover price of the book alone.  There is also no skimping on the next ‘next-generation’ contenders.

Also well documented is the rivalry between the main DTC companies, 23andMe, deCODE and Navigenics, and it’s interesting to see how they stratify in terms of panels offered, risk calculation and how focused they are on ‘actionable’ information.  It’s also worth delving into the longer term research-led strategies of these companies, and the regulatory hurdles they are already embroiled in.

It’s actually quite poignant how quickly we’ve moved from sequencing a reference genome, to sequencing an individual person’s genome, to having dozens, and then hundreds of full genomes.  This was brought home in a telecon I had this week with a research institute who figured they had sequenced 160 genomes in 2010/2011.  As with all science what was once a Nature paper becomes quickly routine when NGS hardware is ramping up capacity as much as it is.  This is also strongly highlighted in the book.

The final section deals with the likely arrival of genome-led P4 medicine, the sequencing X-Prize and wraps up just how close we are to the $1000 genome.  The book is actually quite light on price as a driver, and prefers to point out what could be done when it’s cheap enough to do so.  With excursions into the authors own genomic landscape and thorough referencing throughout, it’s a book I can happily recommend to anyone in the field, or with a passing interest in it.

Notes from the Next-Generation Sequencing Congress

Yesterday I attended the Next Generation Sequencing Congress at the Edwardian Radisson Hotel at Heathrow. The meeting was quite small (300 people perhaps) and quite vendor heavy and bioinformatics light. An interesting mix. The day was split into two streams, which I switched between frequently.

What is presented below is my notes from the meeting. This was not an attempt to liveblog the event, they have been written up today. They reflect my personal biases as to which bits of the talks I was paying the most attention to, may be riddled with inaccuracies and misquotes and are not to be taken as verbatim reports of the talks. If anyone feels they may have been misquoted or misrepresented by anything below, please let me know and I will amend this as soon as I can.

From a personal perspective a few things were highlighted.

Firstly I do not see how the 454FLX system and/or Ion Torrent can possibly consider themselves the de-facto choice for clinical resequencing. The error profiles of these machines just do not lend themselves to a discipline that needs accuracy most of all, but has been sold machines on the basis that ‘long reads’ were the best way to replicate what had previously done by Sanger sequencing.

Secondly Galaxy is gaining a lot of ground as part of the analytical toolbox. Like others at the conference I’m not sure this is the way forward. I do wonder how much analysis is blindly pushed through Galaxy on default settings by naive researchers without a thought to what is being done, because data does come out of the end.

Thirdly, sequencing 100’s of exomes doesn’t always lead you to causal genes..

Notes below:

Using Next Generation Sequencing to Identify Recurrent Mutational Events in Human Cancers

Steven Jones, Professor, Associate Director and Head, Bioinformatics, BC Cancer Agency

Sadly I arrived right at the end of this talk, but caught enough to find out that their SNP calling pipeline is Samtools and SNVMix. SNVMix is an SNV caller for cancer samples to address specific statistical issues not addressed by standard SNV calling tools.

SureSelectXT: Focus your Sequencing on DNA that matters

Darren Marjenberg, Agilent Technologies

30x coverage was still quoted as being minimum requirement.  Claimed that SureSelect can detect indels of 38bp.  Talked about the v4 and v5 exome kits which are complete redesigns and are said to address some of the issues seen with the 50Mb kit. Also said that costs are 50% down with the new kits, and introduced their focused kinome kit.  They are developing an FFPE protocol but there is a working one already published (http://genomebiology.com/1755-8794/4/68).  They have also stratified the custom targets kit sizes into 1Kb-199Kb, 200Kb-499Kb, 500Kb-1.49Mb, 1.5Mb-2.99Mb, 3Mb-6.9Mb, and beyond – so smaller target sizes are now catered for.  They also quoted this paper (http://www.pnas.org/content/early/2010/06/23/1007983107) for cancer panel resequencing with custom kits citing excellent allelic balance (60/40 quoted as being the maximum deviation) and even said that the relative simplistic approach in this paper for identifying CNV’s was successful in this panel.  There were also slides on RNA target enrichment developed with Joshua Levin from the Broad Institute (http://genomebiology.com/2009/10/10/R115).

Targeted Amplicon Resequencing on Illumina NGS

James Hadfield, Head of Genomics Core Facility, CRUK

CRUK are running HiSEq and MiSeq but not Ion Torrent.  Had good words to say about the Nextera library prep kits (http://www.epibio.com/nextera/nextera.asp).  A cautionary note was added about making sure that your genes and regions of interest are covered by the capture kits you’re using.  Their cancer resequencing panel is 627 targets and includes targets of somatic and germline mutation, as well as targets with no current clinical intervention options.  Trialled long-range PCR with TP53 then fragmenting prior to library prep.  This then  moved to testing Raindance to GAIIX for 4.5K exons.  It seems they’re currently using Fluidigm (http://www.fluidigm.com/) to HiSeq2000.  Fliudigm takes in 48 cDNA samples and 48 sets of assays (primer pairs) to create a 2304 well assay plate. The suggestion was it might be possible to plex up to 1500 samples per lane.  There was also good correlation between Fluidigm and Sanger follow up.  They’ve also trialled a Nextera long range PCR approach with the MiSeq where a 12 sample turnaround can be done in a week.  He was also very positive about 23andMe style visualisation/reporting of genetic data in a clinical context.

Single Molecule, Real-Time Sequencing on the PacBio RS platform: Technology and Applications

Deepak Singh, Sr. Director Sales, Pacific Biosciences Europe

Having never seen a PacBio presentation before this was quite interesting. They sequence at around 1bp/sec and with the C1 chemistry achieves an average read length of 1.5kb with the 95th%ile around 3.5kb.  The UK installation that currently exists has reported read lengths up to 16kb.  Machines have a built in blade centre for data processing.  Machine reportedly does not suffer GC bias issues.  The procedure for sample prep is essentially DNA fragmentation, end-repair, ligation of the circularising adapters.  The circularising set up means that complementary strands are sequenced in the same run.  The SMRTCell loading system has a 30’ minimum run time and is loaded serially.  SMRTCell max mappable reads = 45Mb. The loading hopper cannot be filled completely and left to its own devices as reagents do not last for two days prior to loading.  Larger inserts are sequenced ones, smaller ones multiple times as they pass through the polymerase, this sounds good for scaffolding de-novo assemblies and error correction from multiple pass short reads can be applied to the longer reads.  Easier to detect gene fusions and deletions with long reads.  Not capable of WGS yet, so targeted applications are best.  Improvements are going to come from brighter dyes, so less laser power is required, and polymerase degradation will decrease.  Also only 33% of ZMW’s are filled with a single polymerase, 33% have 2 or more and 33% have none, so technically only operate at 1/3rd of potential capacity.  C2 chemistry will offer 2.5-3kb read average lengths and 95th%ile read lengths of 6-8kb.

What’s New? Putting Variants from Whole Genome or Whole Exome resequencing in biological context

Frank Schacherer, COO, BIOBASE

BIOBASE argue that HGMD is the best tool for identifying novelty in variant analysis.  All BIOBASE offerings are human curated.  Highlighted utility in cancer analysis due to the number of variants uncovered.   Highlighted a typical cancer analysis pipeline of taking a variant list, dropping these to coding variants, then uncommon variants, then non-germline variants, and characterising the remaining somatic variants with SIFT, PolyPhen, MutationTaster, applying GO annotations and doing pathway analysis.  Neatly uploaded HGMD into Galaxy to analyse Watson genome.  HGMD data in wide use, from 1000Genomes to Cartagenia, Avadis NGS, CLC Bio, Alamut.  The human annotation shows its worth from SNPs that are initially reported as disease causing but later found to be high prevalence in the population (e.g. 1000Genomes data).  These are flagged by BIOBASE and eventually removed as not being clinically relevant.  There was a suggestion that HGMD is going to be essential for personal genome assessment.

NGS: A deep look into the transcriptome

John Castle, Computational Medicine, TRON, Gutenberg University of Mainz

Their HiSeq is installed on a vibration free table, because the emergency medical helicopter that lands on the roof of their institute played havoc with their runs.  Highlighted the utility of RNA-Seq in gene expression analysis as you can get zero counts back from an experiment, whereas microarrays always report noise/some signal.  Interestingly made use of unaligned reads to assay viral load in samples (SARS in this case) and also to look for virulence mutations in the viral as opposed to human reads.  Specific amplification protocols developed to remove amplification of reads from globin or rRNA – to get more bang for your sequencing buck.  Highlighted that really for clinical use samples need to be received, sequenced and analysed in DAYS for successful clinical intervention.  Use a Galaxy based LIMS system.  Most interestingly they even run duplicate experiments for their exome resequencing studies, duplication even for exome sequencing should be done ‘as a matter of course’.

Next Generation Sequencing Case Studies in Drug Discovery and Development

Jessica Vamathevan, Principal Scientist, Computational Biology GSK

Also use BIOBASE products.  Use NGS for examination of viral titres in samples.  Incorporate profiles of polymorphisms in viral load in gathering information about responses to drugs during clinical trials.  Used NGS to examine viral population diversity during drug studies, especially to get a handle on drug resistance development.  End up with 4000 reads per time-point.  Use phylogenetic tree analyses to trace provenance of infection and viral mutation in HIV studies by patient clustering. Even possible to tell which subpopulation of virus may have been passed from one person to another even if viral population very diverse in transmitting individual. Layer on depth of sequencing information into phylogenetic trees using ‘pplacers’ and ‘guppy’.

Translational Genome Sequencing and Bioinformatics: The Medical Genome Project

Joquain Dopazo, Director of Bioinformatics and Genomics, CIPF

The initial challenge was to sequence exomes from well characterised, phenotyped patients and compare them to phenotypically control individuals (300 samples).  He considers 1000Genomes data not sufficient for a control group as they are not adequately phenotyped and collecting a local pool of controls means that population specific information becomes readily available in the course of the study.  The pipeline uses a GPU optimised BFAST for alignment – reducing run times to 5 hours per sample so on an 8CPU machine, 200 million reads (or 20-30 exomes) can be processed a week.  Highlighted the problem that exome sequencing throws up ‘too many’ variants, their filtering strategies did not seem to highlight single gene causative mutations and comparisons of familial groups failed to identify causal genes in the diseases of interest.  In fact they have no causal genes from 200 patient exomes.  Consequently have developed pathway based approaches to try and match up diseases and potentially causative genes to provide a story across the spectrum of the families involved in a given disease.

BRCA1/2 Sequencing on the Roche GS-FLX System – an evaluation of the first year

Genevieve Michils, Laboratory for Molecular Diagnostics, University of Leuven

Sequencing to 40x for diagnostics, using AVA for variant calling.  Have a robust multiplexing system to get a minimum of 25x coverage.  After processing 500 samples in 22 runs, 150 mutations detected of which 15 were ‘in or near homopolymer regions’.  Breakdown is €530/patient.  QC involves rejecting reads that cover region only in one direction, if necessary backing up missing areas with Sanger sequencing.  Homopolymer issues are worse as BRCA genes have plenty of them.  Homopolymer error bias is not the same in forward and reverse directions.  Trying to use SEQNEXT (http://www.jsi-medisys.de/products.html) which converts reads to Sangeresque ‘peaks’ but nevertheless  the homopolymers  lead to false positive variant detection.  “In a diagnostic context this is not efficient”. Trying to go back to the raw data to develop a statistical model to identify ‘abnormal’ profiles in homopolymer read regions but still follow up everything with Sanger sequencing afterwards anyway.

Towards Complete Quality-Assured Next-Generation Genetic Tests

Prof Harry Cuppens, Centre for Human Genetics, KULeuven

Primary work is on CFTR mutations, and thinks all couples should be screened for carrier status.  Clinicians are not interested in non-actionable rare mutations.  Highlighted a number of issues that need solving:

  • Robust equimolar multiplex amplifications
  • Economical pooling of samples
  • Quality assured protocols
  • Automated protocols
  • Accurate homopolymer calling

Not a fan of the DTC genetics testing companies protocols for sample handling and believes that there are so many steps involved the chances of errors are too high. The solution for this is to barcode samples at the earliest possible place in the sequencing process.

NGS Bioinformatics Support and Research Challenges

Mick Watson, Directory of ARK-Genomics, The Roslin Institute

Currently has 7 bioinformaticians, 6lab staff.  HiSeq, GAIIx and array work mainly in agriculturally important animal genomics.  We are in the “Age of Bioscience” so is this the most exciting time to be a biologist?  Highlighted the long history of bioinformatics from Fischer, to Dayhoff, Ledley, Bennet and Kendrew in the 50s and 60s.  Was critical of hypothesis free large sequencing projects.  Highlighted that bioinformatics often fails to follow through from turning information into knowledge for research scientists and this needs to be a priority not an afterthought.  Discussed the makeup of the bioinformatics community – coders, statisticians, data miners, database developers.  An interesting point about Galaxy is that he believes this is “moving the problems into a point and click interface”.  If you don’t understand parameterisation and use of the command line, then you won’t understand it in Galaxy either.  The greatest challenge of the future will be analysing individual genome plasticity.  Bioinformaticians have always worried about the size of the data.  From AB1 trace files, to array image data to MAGE-ML and now we worry about sequence data, but history shows we have coped before, and someone else will solve the problems.  Also noted that there is a dearth of EXPERIENCED NGS bioinformaticians, so look to recruit people with some experience and train them up.

Using Galaxy to provide a NGS Analysis Platform

Hans-Rudolf Hotz, Bioinformatics Support, Friedrich Miescher Institute for Biomedical Research

The core offers its services for free.  The expectation from biologists is the magic red button for analysis which when pressed turns raw data into Nature papers.  Is Galaxy the solution?  Galaxy captures provenance of data, modules can be constructed into workflows, analytical solutions from bioinformaticians and statisticians can be supplied directly to the end user, removing analytical load from the core (at the expense of system administrative load for loading Galaxy with relevant software).