SPUR 2019: Developing Strategies and Tools to Mine Clinical Variant Assertions


As sequencing technologies have advanced; sequencing data has shown the extent of polymorphism in the human population. Understanding which of these DNA changes impact an individual’s clinical phenotype (cause disease), and which are silent is necessary for the interpretation of genetic tests. ClinVar is an international, submission-driven archive of variant-condition-interpretations hosted by the National Center for Biotechnology Information (NCBI). ClinVar is increasingly becoming the central repository of interpreted genomic variants; as of July 2018, 997 submitters had contributed 427,882 unique variants and 11099 conditions to ClinVar. Sharing variants and associated supporting evidence in the ClinVar database enables the transparent review of data by users and supports clinical variant interpretation. Submissions to ClinVar represent the first time that clinical labs and other submitters have shared and compared their variant interpretations, and this reveals both conformity and discrepancy in variant interpretation. In some cases the disagreement in interpretation is clinically significant – pathogenic versus benign. This growing and evolving database relies on submitters to resolve discrepancies and update findings when knowledge changes. It also provides the starting point for expert curation of variants and genes. My group developed ClinVar Miner; a tool that enables deep exploration of the ClinVar dataset. The goal of ClinVar Miner is to enable management of the upstream and downstream processes related to submitting to and using the data. ClinVar Miner as a counterpoint to facilitate use of ClinVar data.

This SPUR project is funded by a supplement to the National Library of Medicine Training grant T15LM007124-22; Wendy Chapman (PI), Julio Facelli (co-I).

Student Role

The student will be involved in a project that builds tools to better analyze and understand the clinical assertions made about genetic variants. The student will develop strategies to help a user interact with the system and get the desired output. Users include genetic counsellors, medical directors and bioinformatics analysts.

Student Learning Outcomes & Benefits

This project is flexible depending on the skillset of the student. The student will explore a use-case, and develop a strategy to begin to implement a solution. Outcomes could be detailed workflow descriptions, data integration, database schema updates, python scripts, visualizations and statistical analysis.

Karen Eilbeck

Biomedical Informatics
School of Medicine

My goal is to prepare students to become independent thinkers and bioinformatics researchers. I have mentored undergraduate summer students, PhD and masters students, postdocs and a medical student. My graduated students have gone on to pursue further academic degrees, and to scientific employment. I employ a programmer whose job is implementation of tools in a robust way. The programmer gets everyone up to speed with versioning tools like git. I use a combination of lab meetings and one on one meetings. I do not micro-manage projects, but expect sincere engagement and participation. I encourage creativity. My group is very supportive, where senior students will help junior students learn the ropes. The team works together to solidify the projects, but everyone has an individual role within the team.