{"id":191,"date":"2024-03-12T23:23:57","date_gmt":"2024-03-12T23:23:57","guid":{"rendered":"https:\/\/blogs.ncl.ac.uk\/nova\/?page_id=191"},"modified":"2024-09-30T20:27:54","modified_gmt":"2024-09-30T19:27:54","slug":"visualising-data-profiles-and-analysis-pipelines","status":"publish","type":"page","link":"https:\/\/blogs.ncl.ac.uk\/nova\/visualising-data-profiles-and-analysis-pipelines\/","title":{"rendered":"Visualising data profiles and analysis pipelines"},"content":{"rendered":"\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-1 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<p>Analysts and researchers face major hurdles understanding the quality of their data and the knock-on consequences of the choices they make during one stage of data processing on those that follow. Data visualisation offers many benefits that could help analysts and researchers to overcome those hurdles. This project investigate how visualisation techniques are and should be exploited for key aspects of data profiling.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"200\" height=\"104\" src=\"https:\/\/blogs.ncl.ac.uk\/nova\/files\/2024\/03\/DataProfilingProject2-200x104-1.png\" alt=\"\" class=\"wp-image-182\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<p>This project, funded by the Alan Turing Institute, aims to characterise the way in which analysts and researchers profile data and design data processing pipelines. This is important in order to understand the limitations of current profiling and pipeline design methods, the barriers that analysts and researchers face, and the ways in which visualisation techniques could be transformative. The project engage with public and private sector analysts and researchers, aiming to identify quick wins, share best practice and develop a research agenda for the adoption of visualisation techniques in data profiling and pipeline design. The primary measure of success will be organisations beginning to adopt the techniques that are proposed, to make their profiling and pipeline design more rigorous and efficient. This is a catalyst for more scalable and higher quality data science.<\/p>\n\n\n\n<p>The use of good-quality data to inform decision making is entirely dependent on robust processes to ensure it is fit for purpose. Such processes vary between organisations, and between those tasked with designing and following them. In this project 53 data analysts from many industry sectors were surveyed, 24 of whom also participated in in-depth interviews, about computational and visual methods for characterizing data and investigating data quality. Through this a list of data profiling tasks and visualization techniques was compiled which is more comprehensive than those previously published. Furthermore, the results highlight the diversity of profiling tasks, identify unusual practice and exemplars of visualization, andprovide recommendations about formalizing processes and creating rulebooks.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"230\" src=\"https:\/\/blogs.ncl.ac.uk\/nova\/files\/2024\/03\/DataProfilingProject-1204x271-1-1024x230.png\" alt=\"\" class=\"wp-image-193\" srcset=\"https:\/\/blogs.ncl.ac.uk\/nova\/files\/2024\/03\/DataProfilingProject-1204x271-1-1024x230.png 1024w, https:\/\/blogs.ncl.ac.uk\/nova\/files\/2024\/03\/DataProfilingProject-1204x271-1-300x68.png 300w, https:\/\/blogs.ncl.ac.uk\/nova\/files\/2024\/03\/DataProfilingProject-1204x271-1-768x173.png 768w, https:\/\/blogs.ncl.ac.uk\/nova\/files\/2024\/03\/DataProfilingProject-1204x271-1-1200x270.png 1200w, https:\/\/blogs.ncl.ac.uk\/nova\/files\/2024\/03\/DataProfilingProject-1204x271-1.png 1204w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px\" \/><\/figure>\n\n\n\n<p><a href=\"https:\/\/www.turing.ac.uk\/research\/research-projects\/visualising-data-profiles-and-analysis-pipelines\" data-type=\"URL\" data-id=\"https:\/\/www.turing.ac.uk\/research\/research-projects\/visualising-data-profiles-and-analysis-pipelines\" target=\"_blank\" rel=\"noreferrer noopener\">Project webpage<\/a><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Publications<\/h2>\n\n\n\n<div class=\"wp-block-query is-layout-flow wp-block-query-is-layout-flow\"><ul class=\"wp-block-post-template is-layout-flow wp-block-post-template-is-layout-flow\"><li class=\"wp-block-post post-673 post type-post status-publish format-standard hentry category-publication tag-13 tag-data-profiling tag-sara\">\n<div class=\"entry-content wp-block-post-content is-layout-flow wp-block-post-content-is-layout-flow\">\n<p>Ruddle, R., Cheshire, J. &amp; Johansson Fernstad, S. (2024). A Practical Guide to Characterising Data and Investigating Data Quality. <em>University of Leeds<\/em>. <a href=\"https:\/\/doi.org\/10.5518\/1481\">https:\/\/doi.org\/10.5518\/1481<\/a><\/p>\n<\/div>\n<\/li><li class=\"wp-block-post post-657 post type-post status-publish format-standard hentry category-publication tag-12 tag-data-profiling tag-sara\">\n<div class=\"entry-content wp-block-post-content is-layout-flow wp-block-post-content-is-layout-flow\">\n<p>Ruddle, R. A., Cheshire, J., &amp; Johansson Fernstad, S. (2023). Tasks and Visualizations Used for Data Profiling: A Survey and Interview Study. <em>IEEE Transactions on Visualization and Computer Graphics<\/em>. <a href=\"https:\/\/doi.org\/10.1109\/TVCG.2023.3234337\">https:\/\/doi.org\/10.1109\/TVCG.2023.3234337<\/a><\/p>\n<\/div>\n<\/li><\/ul><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">Team<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Prof Roy Ruddle (principal investigator), University of Leeds<\/li>\n\n\n\n<li><a href=\"https:\/\/blogs.ncl.ac.uk\/nova\/dr-sara-johansson-fernstad\/\" data-type=\"page\" data-id=\"83\">Dr Sara Johansson Fernstad<\/a> (co-investigator), Newcastle University<\/li>\n\n\n\n<li>Prof James Cheshire (co-investigator), University College London<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Analysts and researchers face major hurdles understanding the quality of their data and the knock-on consequences of the choices they make during one stage of data processing on those that follow. Data visualisation offers many benefits that could help analysts and researchers to overcome those hurdles. This project investigate how visualisation techniques are and should &hellip; <a href=\"https:\/\/blogs.ncl.ac.uk\/nova\/visualising-data-profiles-and-analysis-pipelines\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Visualising data profiles and analysis pipelines&#8221;<\/span><\/a><\/p>\n","protected":false},"author":4185,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-191","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/blogs.ncl.ac.uk\/nova\/wp-json\/wp\/v2\/pages\/191","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.ncl.ac.uk\/nova\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/blogs.ncl.ac.uk\/nova\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.ncl.ac.uk\/nova\/wp-json\/wp\/v2\/users\/4185"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.ncl.ac.uk\/nova\/wp-json\/wp\/v2\/comments?post=191"}],"version-history":[{"count":3,"href":"https:\/\/blogs.ncl.ac.uk\/nova\/wp-json\/wp\/v2\/pages\/191\/revisions"}],"predecessor-version":[{"id":799,"href":"https:\/\/blogs.ncl.ac.uk\/nova\/wp-json\/wp\/v2\/pages\/191\/revisions\/799"}],"wp:attachment":[{"href":"https:\/\/blogs.ncl.ac.uk\/nova\/wp-json\/wp\/v2\/media?parent=191"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}