Tag Archives: genome

15th International Conference on Human Genome Variation – Meeting report

Last week I was lucky enough to attend the HGV2014 meeting at the Culloden Hotel in Belfast. It was my first trip to Northern Ireland and my first attendance at an HGV meeting.  The meeting is small and intimate, but had a great wide-ranging programme, and I would heartily recommend attending if you get the chance and have an interest in clincal or human genomics.

Have a look at the full programme here: http://hgvmeeting.org/

Here’s a link to my write-ups for each session  (where I had notes that I could reconstruct!):

  1. Interpreting the human variome
  2. The tractable cancer genome
  3. Phenomes, genomes and archaeomes
  4. Answering the global genomics challenge
  5. Improving our health: Time to get personal
  6. Understanding the evolving genome
  7. Next-gen ‘omics and the actionable genome

 

HGV2014 Meeting Report, Session 6: “UNDERSTANDING THE EVOLVING GENOME”

Caveats: I have not taken notes in every talk of every session, a lack of notes for a particular speaker does not constitute disinterest on my part, I simply took notes for the talks that were directly related to my current work. If I have misquoted, misrepresented or misunderstood anything, and you are the speaker concerned, or a member of the team involved in the work, please leave a comment on the post, and I will rectify the situation accordingly.

6.1    Yves Moreau, University of Leuven, Belgium: “Variant Prioritisation by genomic data fusion”

 

An essential part of the prioritization process is the integration of phenotype.

Critical paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3083082/ “Exome sequencing and disease-network analysis of a single family implicate a mutation in KIF1A in hereditary spastic paraparesis”

Yves introduced “Endeavour” which takes a gene list and matches it to the disease of interest and ranks them, but this requires phenotypic information to be ‘rich’. Two main questions need to be addressed 1) What genes are related to a phenotype? And 2) Which variants in a gene are pathogenic? Candidate gene prioritization is not a new thing, and has a long history in microarray analysis. Whilst it’s easy to interrogate things like pathway information, GO terms and literature it is much harder to find relevant expression profile information or functional annotation and existing machine learning tools do not really support these data types.

Critical paper: http://www.ncbi.nlm.nih.gov/pubmed/16680138 “Gene prioritization through genomic data fusion.”

Critical resource: http://homes.esat.kuleuven.be/~bioiuser/endeavour/tool/endeavourweb.php

Endeavour can be trained, rank according to various criteria and then merge ranks to provide ordered statistics

Next eXtasy was introduced, another variant prioritization tool for non-synonymous variants given a specific phenotype.

Critical resource: http://homes.esat.kuleuven.be/~bioiuser/eXtasy/

Critical paper: http://www.nature.com/nmeth/journal/v10/n11/abs/nmeth.2656.html “eXtasy: variant prioritization by genomic data fusion”

eXtasy allows variants to be ranked by effects on structural change in the protein, association in a case/control or GWAS study, evolutionary conservation.

The problem though is one of multiscale data integration – we might know that a megabase region is interesting through one technique, a gene is interesting by another technique, and then we need to find the variant of interest from a list of variants in that gene.

They have performed HGMD to HPO mappings (1142 HPO terms cover HGMD mutations). It was noted that Polyphen and SIFT are useless for distinguishing between disease causing and rare, benign variants.

eXtasy produces rankings for a VCF file by taking the trained classifier data and using a random forest approach to rank. One of the underlying assumptions of this approach is that any rare variant found in the 1kG dataset is benign as they are meant to be nominally asymptomatic individuals.

These approaches are integrated into NGS-Logistics a federated analysis of variants over multiple sites which has some similarities to the Beacon approaches discussed previously. NGS-Logistics is a project looking for test and partner sites

Critical paper: http://genomemedicine.com/content/6/9/71/abstract

Critical resource: https://ngsl.esat.kuleuven.be

However it’s clear what is required as much as a perfect database of pathogenic mutations is also a database of benign ones – both local population controls for ethnicity matching, but also high MAF variants, rare variants in asymptomatic datasets.

6.2    Aoife McLysaght, Trinity College Dublin: “Dosage Sensitive Genes in Evolution and Disease”

 

Aiofe started by saying that most CNVs in the human genome are benign. The quality that makes a CNV pathogenic is that of gene dosage. Haploinsufficiency (where half the product != half the activity) affects about 3% of genes in a systematic study in yeast. This is going to affect certain classes of genes, for instance those where concentration dependent effects are very important (morphogens in developmental biology for example).

This can occur through mechanisms like a propensity towards low affinity promiscuous aggregation of protein product. Consequently the relative balance of genes can be the problem where it affects the stoichiometry of the system.

This is against the background of clear genome duplication over the course of vertebrate evolution. This would suggest that dosage sensitive genes should be retained after subsequent genome chromosomal rearrangement and loss. About 20-30% of the genes can be traced back to these duplication events and they are enriched for developmental genes and members of protein complexes. These are called “ohnologs”

What is interesting is that 60% of these are never associated with CNV events or deletions and duplications in healthy people and they are highly enriched for disease genes.

Critical paper: http://www.pnas.org/content/111/1/361.full “Ohnologs are overrepresented in pathogenic copy number mutations”

6.3    Suganthi Balasubramanian, Yale: “Making sense of nonsense: consequence of premature termination”

Under discussion in this talk was the characterization of Loss of Function (LoF) mutations. There’s a lot of people who prefer not to use this term and would rather describe them as broken down into various classes which can include

  • Truncating nonsense SNVs
  • Splice disrupting mutations
  • Frameshift indels
  • Large structural variations

The average person carries around a hundred LoF mutations of which around 1/5th are in a homozygous state.

It was commented that people trying to divine information from e.g. 1kG datasets had to content with lots of sequencing artefacts or annotation artefacts when assessing this.

Critical paper: http://www.sciencemag.org/content/335/6070/823 “A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes”

Critical resource: http://macarthurlab.org/lof/

In particular the introduction of stop codons in a transcript are hard to predict. Some of the time this will be masked by splicing events or controlled by nonsense-mediated decay which means they may not be pathogenic at all.

Also stops codons in the last exon of a gene may not be of great interest as they are unlikely to have large effects on protein conformation.

The ALOFT pipeline was developed to annotate loss of function mutations. This uses a number of resources to make predictions including information about NMD, protein domains, gene networks (shortest path to known disease genes) as well as evolutionary conservation scores (GERP), dn/ds information from mouse and macaque and a random forest approach to classification. A list of benign variants is used in the training set including things like homozygous stop mutations in the 1kG dataset which are assumed to be non-pathogenic. Dominant effects are likely to occur in haploinsufficient genes with an HGMD entry.

Book review: “The $1000 Genome” by Kevin Davies

So in my quest to do a bit of reading around the industry I now find myself in, I’ve lined up a few books.  I actually started “The $1000 Genome” in the weeks prior to my interview at OGT, and the fact that I only finished it a couple of weeks ago (5 months later) should not be taken as a reflection on the quality of the book.

I think one thing people will be asking is whether a book written in 2010 is still relevant at the tail end of 2011 in such a fast-moving industry, and I think it’s a testament to Kevin Davies writing that it is.

I haven’t read either of Kevin’s previous two books, “Cracking the Genome” (no explanation required as to what that might be about [but note that the UK title of this book is “The Sequence”]) and “Breakthrough” on the race to find the breast cancer gene, I probably will purchase these when the existing book backlog is cleared.

One thing this book has in spades is an excellent history to how we got where we are today, both in terms of the personalities and the companies that have drive the NGS revolution.  Consequently some about familiar names such as 23andMe and the small sequencing startups that were swallowed by the industry biogiants may be familiar, but the book charts them from setup to acquisition and the movement of key staff between them.  For me alone the history of NGS and emergent personal genomics is probably worth the cover price of the book alone.  There is also no skimping on the next ‘next-generation’ contenders.

Also well documented is the rivalry between the main DTC companies, 23andMe, deCODE and Navigenics, and it’s interesting to see how they stratify in terms of panels offered, risk calculation and how focused they are on ‘actionable’ information.  It’s also worth delving into the longer term research-led strategies of these companies, and the regulatory hurdles they are already embroiled in.

It’s actually quite poignant how quickly we’ve moved from sequencing a reference genome, to sequencing an individual person’s genome, to having dozens, and then hundreds of full genomes.  This was brought home in a telecon I had this week with a research institute who figured they had sequenced 160 genomes in 2010/2011.  As with all science what was once a Nature paper becomes quickly routine when NGS hardware is ramping up capacity as much as it is.  This is also strongly highlighted in the book.

The final section deals with the likely arrival of genome-led P4 medicine, the sequencing X-Prize and wraps up just how close we are to the $1000 genome.  The book is actually quite light on price as a driver, and prefers to point out what could be done when it’s cheap enough to do so.  With excursions into the authors own genomic landscape and thorough referencing throughout, it’s a book I can happily recommend to anyone in the field, or with a passing interest in it.