Paolo Missier | Applied Data Science and Engineering, with a topping of Data Provenance

The W3C Provenance Working Group defines provenance as the “information about entities, activities and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness”. The PROV family of specifications [1], Data Model [2], and OWL ontology (PROV-O) embody this broad concept into a set of W3C Recommendations for use by organisations across data and applications domains .

[1] https://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
[2] https://www.w3.org/TR/prov-dm/
[3] https://www.w3.org/TR/2013/REC-prov-o-20130430/

Since 2013, when the model was finalised, a number of organisations around the world have indeed adopted, embedded and in some cases, extended PROV as part of their metadata vocabulary, to add value to data repositories by adding support for data exploration and discovery, quality assurance, and more.

PROV is, by design, a domain-agnostic data model and ontology, but it also provides well-defined extension points that can be used to seamlessly add and integrate domain-specific provenance concepts into the core model.
Here we list a small selection of these extensions and the organisations that are responsible for them.

Disclaimer: List below is very partial, in progress, and periodically updated.

List of entries

The GDPR Provenance ontology (Trinity College Dublin)
Health Level Seven International (HL7) Fast Healthcare Interoperability Resources (FHIR), part of NHS Digital UK
The ProvONE ontology (DataONE, USA)
ProvCaRE (USA)
Management System Provenance Ontology (CSIRO)
The Agreements Ontology (CSIRO)
The Decision Provenance (DecPROV) ontology (CSIRO)
DBPedia DataId
The organisation ontology
The P-Plan ontology
PAV: Provenance, Authoring and Versioning
Donto: Dataset ontology
The Stream Annotation ontology
AutomationML Ontology (aml)
The Allotrope Foundation Data Format (ASDF)

GDPRov (pronounced GDPR-Prov) is a linked data ontology for expressing provenance of consent and data lifecycles with a view towards documenting compliance. GDPRov is an OWL2 ontology that extends PROV-O and P-Plan to model the provenance. https://openscience.adaptcentre.ie/ontologies/GDPRov/docs/ontology
Domain: EU Data protection regulations
Credit: Trinity College Dublin
Latest updated: 2019

Health Level Seven International (HL7) Fast Healthcare Interoperability Resources (FHIR), part of NHS Digital UK

From the description on the page linked here below: “Provenance statement indicates clinical significance in terms of confidence in authenticity, reliability, and trustworthiness, integrity, and stage in lifecycle (e.g. Document Completion – has the artifact been legally authenticated), all of which may impact security, privacy, and trust policies.”
“The Provenance resource is based on the W3C Provenance specification , and mappings are provided. The Provenance resource is tailored to fit the FHIR use-cases for provenance more directly”
Domain: Healthcare
https://www.hl7.org/fhir/provenance.html
Credit: HL7 , NHS Digital
Latest update: v4.0.1

The ProvOne ontology for scientific workflow provenance

ProvONE, a standard for scientific workflow provenance representation. ProvONE is defined as an extension of the W3C recommended standard PROV, aiming to capture the most relevant information concerning scientific workflow computational processes, and providing extension points to accommodate the specificities of particular scientific workflow systems.
https://purl.dataone.org/provone-v1-dev
Credit: DataONE. Working Group Chairs: Prof. B. Ludaescher (UI, USA) and Prof. P. Missier (Newcastle University, UK)
Latest updated: 2016

ProvCaRE (USA)

The Provenance for Clinical and Healthcare Research (ProvCaRe) framework aims to model, extract, and analyze provenance information to support reproducibility of research studies. The ProvCaRe framework includes the S3 Model that extends the World Wide Web Consortium (W3C) PROV specifications to model provenance metadata describing Study Method, Study Tools, Study Data in a research study. The S3 Model is formalized in the ProvCaRe ontology.
https://provcare.case.edu/#about
Domain: Healthcare
Credit: S. Sahoo, funded by a NIH “Big Data to Knowledge” grant
Latest update: 2019

Paper reference: Satya S. Sahoo, Joshua Valdez, Matthew Kim, Michael Rueschman, Susan Redline, ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata, International Journal of Medical Informatics, Volume 121, 2019, Pages 10-18, ISSN 1386-5056, https://doi.org/10.1016/j.ijmedinf.2018.10.009.

PROV extensions at CSIRO (Australia)

CSIRO, Australia’s national science agency, maintains a number of small PROV extensions, as follows.

Management System Provenance Ontology (PROMS):

A small specialisation of the PROV ontology used to structure, bundle validate and ultimately manage reports of provenance generated by workflow-like systems. https://promsns.org/def/proms/proms.html
Domain: system management
Credit: Nicholas J. Car
Latest updated: 2016

The Agreements Ontology

Designed to model ‘agreements’ which are social contracts that include licenses, laws, contracts, Memoranda of Understanding, standards and definitional metadata. Its purpose is to support data sharing by making explicit the relationships between agreements and data and agreements and Agents (people and organisations).
https://promsns.org/def/agr/agr.html
https://lov.linkeddata.es/dataset/lov/vocabs/agr
Domain : human decisions
Credit: Nicholas J. Car
Latest updated: 2016

The Decision Provenance (DecPROV) ontology

The Decision ontology (DO) provides basic means for describing decision and decision making.
DeCPROV is a re-worked version of the DO that aligns it with the PROV ontology. This ontology is for modelling decisions and thus the causes for actions or the use or generation of things. It allows for a better understanding of why something might have taken place, have been used or produced than the more generic PROV ontology, on which it is mainly based, does.
https://promsns.org/def/decprov/decprov.html
Domain of impact:document human decisions
Credit: Nicholas J. Car

DBPedia DataId

The DBpedia DataID vocabulary is a meta-data system for detailed descriptions of datasets and their different manifestations, as well as relations to agents like persons or organizations, in regard to their rights and responsibilities.
The model integrates DCAT, VoID, Prov-O and FOAF. Extensions can be made for typical use cases.
https://wiki.dbpedia.org/projects/dbpedia-dataid
Credit: Leipzig University, Dbpedia
Domain: web / data management

The Organization Ontology

Org, the Core organization ontology, is a vocabulary for describing organisational structures, specialisable to a broad variety of types of organization. It is aimed at supporting linked data publishing of organizational information across a number of domains. It is designed to allow domain-specific extensions to add classification of organizations and roles, as well as extensions to support neighbouring information such as organizational activities.
Org builds upon the PROV-O Provenance Vocabulary [prov-o].
https://www.w3.org/TR/vocab-org/.
Credit: Government Linked Data Working Group
Latest update: 2014

The P-Plan ontology

P-Plan is an OWL2 ontology developed to describe abstract scientific workflows as plans and link them to their past executions. P-Plan extends the W3C PROV-O Ontology, which encodes the W3C PROV data model [PROV-DM]. PROV-DM describes the provenance of objects (prov:Entities) as a record of assertions about the steps (prov:Activities) that generated them and the entities used in those steps. Provenance describes past execution, but does not offer a vocabulary to express the plan that the execution was supposed to follow.
As an example, provenance vocabularies are appropriate for describing assays once they are executed, but are not designed to describe protocols. Therefore, in addition to the provenance record, it is often desirable to publish the plan that was followed during the execution. This would allow the provenance record to include what was envisioned would happen prior to the execution.
https://www.opmw.org/model/p-plan/
Domain: Scientific workflows
Credit: Leipzig University (http://aksw.org/Projects.html)
Latest updated: 2013

PAV: Provenance, Authoring and Versioning

PAV is a lightweight ontology for tracking Provenance, Authoring and Versioning. PAV specializes the W3C provenance ontology PROV-O in order to describe authorship, curation and digital creation of online resources.
https://pav-ontology.github.io/pav/
Domain: provenance of digital creations and of online resources
Credit: Paolo Ciccarese, Novartis NIBR. Stian Soiland-Reyes, University of Manchester
Latest update: 2015

Donto: Dataset ontology

The data.gov.au Dataset Ontology is an OWL ontology designed to describe the characteristics of datasets published on data.gov.au.
The Dataset Ontology contains elements which describe the publication, update, origin, governance, spatial and temporal coverage and other contextual information about the dataset. The ontology also covers aspects of organisational custodianship and governance.
By using this ontology publishers increase discoverability and enable the consumption of this information in other applications/systems as Linked Data. It further enables decentralised publishing of catalogs and facilitates federated dataset search across sites, e.g. in datasets that are published by the States.
https://data.gov.au/data/dataset/data-gov-au-dataset-ontology/resource/dc586d4f-a3a5-4e00-abb4-128277356bed
Domain: Government Open Data, Data Management
Credit: Australian Government
Latest update: 2017

The Stream Annotation ontology

Representing IoT data streams is an important requirement in semantic stream data applications, as well as in knowledge-based environments for Smart Cities. This ontology aims to semantically represent the features of a data stream defining the specifications of an information model on top of Semantic Sensor Networks (SSN), PROV-O and TimeLine Ontologies, and involves connections with the Complex Event Processing Ontology and Quality Ontology.
http://iot.ee.surrey.ac.uk/citypulse/ontologies/sao/sao
Domain: Internet of Things / Smart Citites
Credit: University of Surrey
Latest update: 2016

AutomationML Ontology (aml)

A vocabulary to represent the XML-based AutomationML Standard – IEC 62714.AutomationML, with additional support for the aspects of multisensor fusion and integration is proposed to create a complete knowledge-based solution for manufacturing and automation domains. Reference: Gonçalves, E., Freitas, A., & Botelho, S. (2019). An AutomationML Based Ontology for Sensor Fusion in Industrial Plants. Sensors (Basel, Switzerland), 19(6), 1311. https://doi.org/10.3390/s19061311
http://i40.semantic-interoperability.org/automationml/Documentation/index.html
Domain: manufacturing and automation
Credit: TU Vienna, Fraunhofer IAIS
Lastest update: 2016

The Allotrope Foundation Data Format (ADF)

The Allotrope Data Format Audit Trail and Electronic Signatures Specification [ADF-A] is a specification on how to use audit trails in the Allotrope Data Format [ADF] in a standardized way. The specification is both about the Allotrope Audit Trail Ontology and the Audit Trail API that is part of the [ADF] APIs.
http://docs.allotrope.org/TR/adf-audit/ADF%20Audit%20Trail.html#bib-AF
Domain: Capturing data from laboratory analytical processes
Credit: The Allotrope foundation https://www.allotrope.org
Latest update: 2017

Applied Data Science and Engineering, with a topping of Data Provenance

Paolo Missier, School of Computing

All posts by Paolo Missier

About Paolo Missier

W3C PROV: some interesting extensions to the core standard

List of entries

Health Level Seven International (HL7) Fast Healthcare Interoperability Resources (FHIR), part of NHS Digital UK

The ProvOne ontology for scientific workflow provenance

ProvCaRE (USA)

PROV extensions at CSIRO (Australia)

Management System Provenance Ontology (PROMS):

The Agreements Ontology

The Decision Provenance (DecPROV) ontology

DBPedia DataId

The Organization Ontology

The P-Plan ontology

PAV: Provenance, Authoring and Versioning

Donto: Dataset ontology

The Stream Annotation ontology

AutomationML Ontology (aml)

The Allotrope Foundation Data Format (ADF)

About Paolo Missier

List of entries

The GDPR Provenance ontology

Health Level Seven International (HL7) Fast Healthcare Interoperability Resources (FHIR), part of NHS Digital UK

The ProvOne ontology for scientific workflow provenance

ProvCaRE (USA)

PROV extensions at CSIRO (Australia)

Management System Provenance Ontology (PROMS):

The Agreements Ontology

The Decision Provenance (DecPROV) ontology

DBPedia DataId

The Organization Ontology

The P-Plan ontology

PAV: Provenance, Authoring and Versioning

Donto: Dataset ontology

The Stream Annotation ontology

AutomationML Ontology (aml)

The Allotrope Foundation Data Format (ADF)