Tag Archives: phenotype


Caveats: I have not taken notes in every talk of every session, a lack of notes for a particular speaker does not constitute disinterest on my part, I simply took notes for the talks that were directly related to my current work. If I have misquoted, misrepresented or misunderstood anything, and you are the speaker concerned, or a member of the team involved in the work, please leave a comment on the post, and I will rectify the situation accordingly.

4.1    Mark Lawler, QUB, Belfast: “All the world’s a stage: The Global Alliance For Genomics and Health”

Mark is part of the clinical working group for GA4GH. GA4GH has a core mission that sharing of genomics data is required before the most benefit comes from the data. GA4GH has no remit to store or generate sequencing data.

Mark introduced an initiative from the Centers of Mendelian Genomics which are NIH funded to provide sequencing for free. The sequencing is provided for diagnosed Mendelian disorders for which the genetic cause has yet to be determined for all cases. Each order is evaluated on a case by case basis and is reviewed by committee before going forwards.

Submission has to be from a healthcare professional looking after a patient and the minimum dataset returned is BAM and annotated VCF files. The data is actually owned by the submitters, and help with analysis is available if required, i.e. there are staged levels of support that can be requested from the CMG.

Critical resources: http://www.mendelian.org and http://mendeliangenomics.org

4.2    Ada Hamosh, JHU, Baltimore: “International Efforts to Indentify New Mendelian Disease Genes: the Centers for Mendelian Genomics, PhenoDB and GeneMatcher”

Following on from Mark was Ada, from one of the aforementioned CMG groups. One of the critical early points from Ada was that phenotyping should not be limited to the disease symptoms, but a much more holistic description encompassing everything about that person. This allows you to a) disambiguate between cases b) test for things you may find and c) understand what you’re looking for. So again this is a call to a deep phenotyping approach as outlined by a number of other speakers.

They use OMIM for a clinical synopsis along with HPO and LDDB (I think this is the London Dysmorphology Database). HPO seems to be gaining traction widely.

The cases that come to the CMG can be known disorders where the gene is unknown, or where the disorder is known, and the existing associated disease genes have been tested and found negative (no known underlying genetic cause).   Other information collected is birth *decade* (date is considered identifiable), age at presentation/evaluation, photographic assessments and feature selection.

The feature selection has 21 high level categories that cascades as you click on them and then selects OMIM disorders as you enter phenotype information.

They also filter against dbSNP in 3 steps – 127, 129 and 135. There is also a built in cohort analysis tool.

Critical resources: http://phenodb.net and http://phenodb.org

Phenodb.net is a demonstrator site for the CMG and presents a ‘modular tool for collection, storage and analysis of phenotype and genotype data’. This is downloadable(?) and includes things like the consent modules.

Phenodb.org is the actual clinical holding for labs.

There was some talk about the profusion of phenotypic descriptions standards – OrphaNet, HPO, SNOMED-CT, LDDB, POSSUM.   There was an ICHPT (International Consortium of Human Phenotype Terminologies) meeting where 2300 core terms were agreed across all the standards consortiums. PhenoDB already has these mappings in place for the HPO terms.

Critical resource: https://phenomecentral.org/

PhenomeCentral is a repository for secure data sharing targeted to clinicians and scientists working in the rare disorder community – effectively this is a matchmaking service where disorders are so rare that a central clearing house makes perfect sense to try to accumulate other patients with the same symptoms across international borders to further diagnosis and research.

Critical resource: https://genematcher.org/

This is a similar service that makes matching at the gene level, rather than the phenotypic level possible.

4.3    Anthony Brookes, University of Leicester “A Multi-Faceted Approach to Releasing the Value in Genome Related Data”

One of the options that is less explored in terms of data sharing is taking the question to the data, rather than pulling data from disparate sources and then running your analysis. DataSHIELD is an effort to resolve this, without handing out your data.

Critical paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2972441/ “DataSHIELD: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data”

One of the points here is that rather than distributing entire datasets, summary data can be aggregated and shared from harmonized individual-level databases. This permits researchers to focus on knowledge and visualization, removing people from the silo mentality. It is just as valuable to share understanding as underlying data.

However data sources are still disparate. Cataloguing is essential as you need to know where the data is, what metadata may be associated with a holding, descriptions etc. The Beacon system is a good example.

GWAS Central was highlighted. This provides a centralized compilation of summary level findings from genetic association studies, both large and small. Data sets are actively gathered from public domain projects, and direct data submission from the community is encouraged.

Critical resource: http://www.gwascentral.org/

Also mentioned were OmicsConnect and Dalliance to provide resources to visualize/interrogate datasets.

Critical paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3051325/ “Dalliance: interactive genome viewing on the web

These resources can interact with DAS data sources and eDAS was also introduced as a ‘gatekeeper’ for DAS information (I can find no information on this project).

Anthony then moved onto Café Variome.

Critical resource: http://www.cafevariome.org/

Café Variome is designed to sit alongside existing local databases to bring data discovery tools to that data. From the website: “We offer a complete data discovery platform based upon enabling the ‘open discovery’ of data (rather than data ‘sharing’) for example, between networks of diagnostic laboratories or disease consortia that know/trust each other and share an interest in certain causative genes or diseases.”

Café Variome can also broker data to both the research and clinical communities. Nodes can connect to each other, and there is also a central repository which can issue DOIs built on work by DataCite. Logins support ORCID identifiers for submitters of data, another step in the direction of correct, unambiguous attribution and standardisation.


Caveats: I have not taken notes in every talk of every session, a lack of notes for a particular speaker does not constitute disinterest on my part, I simply took notes for the talks that were directly related to my current work. If I have misquoted, misrepresented or misunderstood anything, and you are the speaker concerned, or a member of the team involved in the work, please leave a comment on the post, and I will rectify the situation accordingly.

3.1    Peter Robinson, Humboldt University, Berlin: “Effective diagnosis of genetic disease by computational phenotpye analysis of the disease associated genome”

Peter focused on the use of bioinformatics in medicine, specifically around the use of ontologies to describe phenotypes and look for similarities between diseases. It is important to capture the signs, symptoms and behavioural abnormalities of a patient in PRECISE language to be useful.

The concept here is ‘deep phenotyping’ – there’s almost nothing here in terms of too much information about clinical presentation, but it must be consistent to enable a basis for computational comparison and analysis.

HPO (The Human Phenotype Ontology) was introduced, saying that in many ways it is indebted to OMIM (Online Mendelian Inheritance in Man).

He felt strongly that the standard exome with 17k genes was ‘useless’ in a diagnostic context, when there are 2800 genes associated with 5000 disorders, covering a huge spectrum of presenting disease. Consequently he does not recommend screening the exome as a first line test, but encourages the use of reduced clinical exomes. This allows, especially, higher coverage for the same per-sample costs and suggested that the aim should be to have 98% of the target regions covered to >20x.

Pathogenic mutations that are clearly identified are clearly the easiest thing to call from this kind of dataset, but OMIM remains the first point of call for finding out the association of a mutation to a condition. And OMIM is not going to be of much help finding information on a predicted deleterious mutation in a random chromosomal ORF.

Specifically they take VCF files and annotate them with HPO terms as well as the standard suite of Mutation Taster, Polyphen and SIFT

A standard filtering pipeline should get you down to 50 to 100 genes of interest and then you can do a phenotype comparison of the HPO terms you have collected from the clinical presentation and the HPO terms annotated in the VCF. This can give you a ranked list of variants.

This was tested by running 10k simulations of such a process with spiked in variants from HGMD into an asymptomatic individuals VCF file. The gene ranking score depends on a variant score for deleteriousness and a phenotype score for the match to the clinical phenotype. In the simulation 80% of the time, the right gene was at the top of the list.

This approach is embodied in PhenIX: http://compbio.charite.de/PhenIX/

This has led to the development of a clinical bioinformatics workflow where the clinician supplies the HPO terms and runs the algorithm. Information is borrows from OMIM and Orphanet in the process.

Prioritisation of variants is not a smoking gun for pathogenicity however. This needs to be backed up by Sanger sequencing validation, and co-segregation analysis within a family (if available). Effective diagnosis of disease will not lose the human component.

Exomiser was also introduced http://www.sanger.ac.uk/resources/databases/exomiser/query/ from Damien Smedley’s group at the Sanger Institute, which uses information from the mouse and zebrafish to increase the utility as there is a huge amount of phenotype data from developmental biology studies of gene knockouts in other organisms.

3.2    Dan Bradley, Trinity College, Dublin: “Ancient population genomics: do it all, or not at all”

Dan gave a great talk on the sequencing of ancient DNA to look at population data. Ancient DNA is highly fragmented, and you’re generally working with 50-70base fragments (generally worse than FFPE samples).

DNA from ancient samples actually undergoes a target enrichment step, largely to remove environmental sequence contamination, although it was noted that repetitive DNA can be problematic in terms of ruining a capture experiment.

From the ancient samples that were covered at 22x (I don’t expect that’s genome coverage, but target capture coverage) the samples were down-sampled to 1x data, and then 1kG data used to impute the likely genotypes. This actually recapitulated 99% of calls from the original 22x data, showing that this approach can be used to reconstruct ancestral population genomics information from very limited datasets, using very modern data.