Caveats: I have not taken notes in every talk of every session, a lack of notes for a particular speaker does not constitute disinterest on my part, I simply took notes for the talks that were directly related to my current work. If I have misquoted, misrepresented or misunderstood anything, and you are the speaker concerned, or a member of the team involved in the work, please leave a comment on the post, and I will rectify the situation accordingly.
4.1 Mark Lawler, QUB, Belfast: “All the world’s a stage: The Global Alliance For Genomics and Health”
Mark is part of the clinical working group for GA4GH. GA4GH has a core mission that sharing of genomics data is required before the most benefit comes from the data. GA4GH has no remit to store or generate sequencing data.
Mark introduced an initiative from the Centers of Mendelian Genomics which are NIH funded to provide sequencing for free. The sequencing is provided for diagnosed Mendelian disorders for which the genetic cause has yet to be determined for all cases. Each order is evaluated on a case by case basis and is reviewed by committee before going forwards.
Submission has to be from a healthcare professional looking after a patient and the minimum dataset returned is BAM and annotated VCF files. The data is actually owned by the submitters, and help with analysis is available if required, i.e. there are staged levels of support that can be requested from the CMG.
4.2 Ada Hamosh, JHU, Baltimore: “International Efforts to Indentify New Mendelian Disease Genes: the Centers for Mendelian Genomics, PhenoDB and GeneMatcher”
Following on from Mark was Ada, from one of the aforementioned CMG groups. One of the critical early points from Ada was that phenotyping should not be limited to the disease symptoms, but a much more holistic description encompassing everything about that person. This allows you to a) disambiguate between cases b) test for things you may find and c) understand what you’re looking for. So again this is a call to a deep phenotyping approach as outlined by a number of other speakers.
They use OMIM for a clinical synopsis along with HPO and LDDB (I think this is the London Dysmorphology Database). HPO seems to be gaining traction widely.
The cases that come to the CMG can be known disorders where the gene is unknown, or where the disorder is known, and the existing associated disease genes have been tested and found negative (no known underlying genetic cause). Other information collected is birth *decade* (date is considered identifiable), age at presentation/evaluation, photographic assessments and feature selection.
The feature selection has 21 high level categories that cascades as you click on them and then selects OMIM disorders as you enter phenotype information.
They also filter against dbSNP in 3 steps – 127, 129 and 135. There is also a built in cohort analysis tool.
Phenodb.net is a demonstrator site for the CMG and presents a ‘modular tool for collection, storage and analysis of phenotype and genotype data’. This is downloadable(?) and includes things like the consent modules.
Phenodb.org is the actual clinical holding for labs.
There was some talk about the profusion of phenotypic descriptions standards – OrphaNet, HPO, SNOMED-CT, LDDB, POSSUM. There was an ICHPT (International Consortium of Human Phenotype Terminologies) meeting where 2300 core terms were agreed across all the standards consortiums. PhenoDB already has these mappings in place for the HPO terms.
Critical resource: https://phenomecentral.org/
PhenomeCentral is a repository for secure data sharing targeted to clinicians and scientists working in the rare disorder community – effectively this is a matchmaking service where disorders are so rare that a central clearing house makes perfect sense to try to accumulate other patients with the same symptoms across international borders to further diagnosis and research.
Critical resource: https://genematcher.org/
This is a similar service that makes matching at the gene level, rather than the phenotypic level possible.
4.3 Anthony Brookes, University of Leicester “A Multi-Faceted Approach to Releasing the Value in Genome Related Data”
One of the options that is less explored in terms of data sharing is taking the question to the data, rather than pulling data from disparate sources and then running your analysis. DataSHIELD is an effort to resolve this, without handing out your data.
Critical paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2972441/ “DataSHIELD: resolving a conflict in contemporary bioscience—performing a pooled analysis of individual-level data without sharing the data”
One of the points here is that rather than distributing entire datasets, summary data can be aggregated and shared from harmonized individual-level databases. This permits researchers to focus on knowledge and visualization, removing people from the silo mentality. It is just as valuable to share understanding as underlying data.
However data sources are still disparate. Cataloguing is essential as you need to know where the data is, what metadata may be associated with a holding, descriptions etc. The Beacon system is a good example.
GWAS Central was highlighted. This provides a centralized compilation of summary level findings from genetic association studies, both large and small. Data sets are actively gathered from public domain projects, and direct data submission from the community is encouraged.
Critical resource: http://www.gwascentral.org/
Also mentioned were OmicsConnect and Dalliance to provide resources to visualize/interrogate datasets.
Critical paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3051325/ “Dalliance: interactive genome viewing on the web”
These resources can interact with DAS data sources and eDAS was also introduced as a ‘gatekeeper’ for DAS information (I can find no information on this project).
Anthony then moved onto Café Variome.
Critical resource: http://www.cafevariome.org/
Café Variome is designed to sit alongside existing local databases to bring data discovery tools to that data. From the website: “We offer a complete data discovery platform based upon enabling the ‘open discovery’ of data (rather than data ‘sharing’) for example, between networks of diagnostic laboratories or disease consortia that know/trust each other and share an interest in certain causative genes or diseases.”
Café Variome can also broker data to both the research and clinical communities. Nodes can connect to each other, and there is also a central repository which can issue DOIs built on work by DataCite. Logins support ORCID identifiers for submitters of data, another step in the direction of correct, unambiguous attribution and standardisation.