{"id":73,"date":"2020-08-14T16:59:10","date_gmt":"2020-08-14T15:59:10","guid":{"rendered":"https:\/\/blogs.ncl.ac.uk\/recomp\/?page_id=73"},"modified":"2020-08-14T17:04:33","modified_gmt":"2020-08-14T16:04:33","slug":"long-summary","status":"publish","type":"page","link":"https:\/\/blogs.ncl.ac.uk\/recomp\/long-summary\/","title":{"rendered":"Long Summary"},"content":{"rendered":"\n<p>As the cost of allocating computing resources to data-intensive tasks\n continues to decrease, large-scale data analytics becomes ever more \naffordable, continuously providing new insights from vast amounts of \ndata. Increasingly, predictive models that encode knowledge from data \nare used to drive decisions in a broad range of areas, from science to \npublic policy, to marketing and business strategy. The process of \nlearning such actionable knowledge relies upon information assets, \nincluding the data itself, the know-how that is encoded in the \nanalytical processes and algorithms, as well as any additional \nbackground and prior knowledge. Because these assets continuously change\n and evolve, models may become obsolete over time, leading to poor \ndecisions in the future, unless they are periodically updated.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Focus of the project<\/h4>\n\n\n\n<p>This project is concerned with the need and opportunities for \nselective recomputation of resource-intensive analytical workloads. The \ndecision on how to respond to changes in these information assets \nrequires striking a balance between the estimated cost of recomputing \nthe model, and the expected benefits of doing so. In some cases, for \ninstance when using predictive models to diagnose a patient\u2019s genetic \ndisease, new medical knowledge may invalidate a large number of past \ncases. On the other hand, such changes in knowledge may be marginal or \neven irrelevant for some of the cases. It is therefore important to be \nable, firstly, to determine which past results may potentially benefit \nfrom recomputation, secondly, to determine whether it is technically \npossible to reproduce an old computation, and thirdly, when this is the \ncase, to assess the costs and relative benefits associated with the \nrecomputation.<\/p>\n\n\n\n<p><strong>The project investigates the hypothesis<\/strong> that, based \non these determinations, and given a budget for allocating computing \nresources, it should be possible to accurately identify and prioritise \nanalytical tasks that should be considered for recomputation.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Technical approach<\/h4>\n\n\n\n<p>Our approach considers three types of meta-knowledge that are associated with analytics tasks, namely:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Knowledge of the history of past results, that is, the provenance \nmetadata that describes which assets were used in the computation, and \nhow;<\/li><li>Knowledge of the technical reproducibility of the tasks; and<\/li><li>Cost\/benefit estimation models.<\/li><\/ol>\n\n\n\n<p>Element (1) is required to determine which prior outcomes may \npotentially benefit from changes in information assets, while \nreproducibility analysis (2) is required to determine whether an old \nanalytical task is still functional and can actually be performed again,\n possibly with new components and on newer input data.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">A general framework<\/h4>\n\n\n\n<p>As the first two of these elements are independent of the data \ndomain, we aim to develop a general framework that can then be \ninstantiated with domain-specific models, namely for cost\/benefit \nanalysis, to provide decision support for prioritising and then carrying\n out resource-intensive recomputations over a broad range of analytics \napplication domains.<\/p>\n\n\n\n<p>Both (1) and (2) entail technical challenges, as systematically \ncollecting the provenance of complex analytical tasks, and ensuring \ntheir reproducibility, requires instrumentation of the data processing \nenvironments. We plan to experiment with workflows, a form of high level\n programming and middleware technology, to address both these problems.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Validation of the approach<\/h4>\n\n\n\n<p>To show the flexibility and generality of our framework, we will test\n and validate it on two, very different case studies where decision \nmaking is driven by analytical knowledge, namely in genetic diagnostics,\n and policy making for Smart Cities.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As the cost of allocating computing resources to data-intensive tasks continues to decrease, large-scale data analytics becomes ever more affordable, continuously providing new insights from vast amounts of data. Increasingly, predictive models that encode knowledge from data are used to drive decisions in a broad range of areas, from science to public policy, to marketing [&hellip;]<\/p>\n","protected":false},"author":3062,"featured_media":0,"parent":0,"menu_order":-1,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-73","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/blogs.ncl.ac.uk\/recomp\/wp-json\/wp\/v2\/pages\/73","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.ncl.ac.uk\/recomp\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/blogs.ncl.ac.uk\/recomp\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.ncl.ac.uk\/recomp\/wp-json\/wp\/v2\/users\/3062"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.ncl.ac.uk\/recomp\/wp-json\/wp\/v2\/comments?post=73"}],"version-history":[{"count":2,"href":"https:\/\/blogs.ncl.ac.uk\/recomp\/wp-json\/wp\/v2\/pages\/73\/revisions"}],"predecessor-version":[{"id":75,"href":"https:\/\/blogs.ncl.ac.uk\/recomp\/wp-json\/wp\/v2\/pages\/73\/revisions\/75"}],"wp:attachment":[{"href":"https:\/\/blogs.ncl.ac.uk\/recomp\/wp-json\/wp\/v2\/media?parent=73"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}