[see also this short slide presentation]
We aim to apply ReComp concepts and methods to specific concrete problems in genomics and metagenomics.
NGS (WES/WGS) analysis pipelines have multiple “moving parts” that
evolve and change over time: software tools as well as reference
datasets.
Regarding genomics, we want to address the following practical problem.
Suppose we maintain a population of patients along with their analysis
results (variants, variant interpretation, ….) obtained in the past,
under known pipeline configuration.
Any change in any of the libraries, software packages, or reference
datasets used in any of these pipelines is likely to have some impact on
element of the population. This can be measured for instance in terms
of likelihood of changes in a patient’s diagnosis.
In the event of a change, we could blindly re-analyse the entire
population. However this is likely to be inefficient when the change has
low impact on the population. Thus, we would like to develop techniques
for predicting and estimating the extent of the impact, so that re-analysis can be prioritised given a fixed budget.
Is this a real problem? Here is a sketch of a simple variant interpretation workflow that makes use of OMIM GeneMap and CLinVar to determine pathogenicity of variants that are relevant for a patient with a given phenotype:
[Learn more about ReComp in the context of the Simple Variant Interpretation pipeline]
And here below is a chart to show the impact of updated to ClinVar, during a short window of less than a year, on a very small cohort of patients, grouped by phenotype. The circles indicate that variants with uncertain pathogenicity have been either added or removed, while the X indicates that one or more off the variants have been identified as deleterious. These are high impact changes, which may affect a patient’s diagnosis.