ReComp and the Simple Variant Interpretation (SVI) workflow

[see also this slide presentation]

Here is a sketch of the SVI workflow:


We use SVI, a small-scale, “small data” workflow, to test a phenotype hypothesis made by clinicians/researchers based on annotated patient genetic variants and two reference databases: OMIM GeneMap and NCBI ClinVar. The outputs of the tool are as good as the accuracy of the reference databases and, therefore, it is crucial to continuously monitor changes in the databases and re-evaluate hypotheses. But if for a single patient and one update of the reference database, testing hypothesis with SVI is relatively quick, there’s clear scalability and cost issue if the same process is applied across a large cohort of patients and if database changes are tracked as soon as they appear.

Doing this naively we face a clear cost/accuracy trade-off: either we allow for every update of a database to trigger re-evaluation of the hypothesis (re-computation of SVI) across all patients, or we delay the re-evaluation and wait until the database updates accumulate. In the former case our response is the most accurate but we are likely to waste a lot of resources doing continuous refresh. In the latter case we allow for more changes to collect, and so increase the chances that a single re-evaluation leads to new/better results, yet it is for the price of a delayed update.

ReComp tries to address this trade-off by finding ways to estimate the impact of changes in the inputs of a computational process (the reference databases in the case of SVI) and maintain the accuracy of the response while minimising the cost of continuous re-evaluation.