Tag Archives: genotype


Caveats: I have not taken notes in every talk of every session, a lack of notes for a particular speaker does not constitute disinterest on my part, I simply took notes for the talks that were directly related to my current work. If I have misquoted, misrepresented or misunderstood anything, and you are the speaker concerned, or a member of the team involved in the work, please leave a comment on the post, and I will rectify the situation accordingly.

3.1    Peter Robinson, Humboldt University, Berlin: “Effective diagnosis of genetic disease by computational phenotpye analysis of the disease associated genome”

Peter focused on the use of bioinformatics in medicine, specifically around the use of ontologies to describe phenotypes and look for similarities between diseases. It is important to capture the signs, symptoms and behavioural abnormalities of a patient in PRECISE language to be useful.

The concept here is ‘deep phenotyping’ – there’s almost nothing here in terms of too much information about clinical presentation, but it must be consistent to enable a basis for computational comparison and analysis.

HPO (The Human Phenotype Ontology) was introduced, saying that in many ways it is indebted to OMIM (Online Mendelian Inheritance in Man).

He felt strongly that the standard exome with 17k genes was ‘useless’ in a diagnostic context, when there are 2800 genes associated with 5000 disorders, covering a huge spectrum of presenting disease. Consequently he does not recommend screening the exome as a first line test, but encourages the use of reduced clinical exomes. This allows, especially, higher coverage for the same per-sample costs and suggested that the aim should be to have 98% of the target regions covered to >20x.

Pathogenic mutations that are clearly identified are clearly the easiest thing to call from this kind of dataset, but OMIM remains the first point of call for finding out the association of a mutation to a condition. And OMIM is not going to be of much help finding information on a predicted deleterious mutation in a random chromosomal ORF.

Specifically they take VCF files and annotate them with HPO terms as well as the standard suite of Mutation Taster, Polyphen and SIFT

A standard filtering pipeline should get you down to 50 to 100 genes of interest and then you can do a phenotype comparison of the HPO terms you have collected from the clinical presentation and the HPO terms annotated in the VCF. This can give you a ranked list of variants.

This was tested by running 10k simulations of such a process with spiked in variants from HGMD into an asymptomatic individuals VCF file. The gene ranking score depends on a variant score for deleteriousness and a phenotype score for the match to the clinical phenotype. In the simulation 80% of the time, the right gene was at the top of the list.

This approach is embodied in PhenIX: http://compbio.charite.de/PhenIX/

This has led to the development of a clinical bioinformatics workflow where the clinician supplies the HPO terms and runs the algorithm. Information is borrows from OMIM and Orphanet in the process.

Prioritisation of variants is not a smoking gun for pathogenicity however. This needs to be backed up by Sanger sequencing validation, and co-segregation analysis within a family (if available). Effective diagnosis of disease will not lose the human component.

Exomiser was also introduced http://www.sanger.ac.uk/resources/databases/exomiser/query/ from Damien Smedley’s group at the Sanger Institute, which uses information from the mouse and zebrafish to increase the utility as there is a huge amount of phenotype data from developmental biology studies of gene knockouts in other organisms.

3.2    Dan Bradley, Trinity College, Dublin: “Ancient population genomics: do it all, or not at all”

Dan gave a great talk on the sequencing of ancient DNA to look at population data. Ancient DNA is highly fragmented, and you’re generally working with 50-70base fragments (generally worse than FFPE samples).

DNA from ancient samples actually undergoes a target enrichment step, largely to remove environmental sequence contamination, although it was noted that repetitive DNA can be problematic in terms of ruining a capture experiment.

From the ancient samples that were covered at 22x (I don’t expect that’s genome coverage, but target capture coverage) the samples were down-sampled to 1x data, and then 1kG data used to impute the likely genotypes. This actually recapitulated 99% of calls from the original 22x data, showing that this approach can be used to reconstruct ancestral population genomics information from very limited datasets, using very modern data.