The dataset()
sub-class extends the data frames with
various metadata, including provenance metadata.
provenance(iris_dataset)
#> $started_at
#> [1] "2024-01-27 09:00:53 GMT"
#>
#> $ended_at
#> [1] "2024-01-27 09:00:53 GMT"
#>
#> $wasAssocitatedWith
#> [1] "doi:10.5281/zenodo.10473154"
#>
#> $wasInformedBy
#> [1] "https://doi.org/10.1111/j.1469-1809.1936.tb02137.x"
Let’s add the R programming language to the wasInformedBy field with the subject heading of the Library of Congress, which is easy to understand for humans and machines alike: R (Computer program language)
provenance(iris_dataset) <- list(
wasInformedBy = "http://id.loc.gov/authorities/subjects/sh2002004407"
)
Let us review the new provenance metadata:
provenance(iris_dataset)$wasInformedBy
#> [1] "https://doi.org/10.1111/j.1469-1809.1936.tb02137.x"
#> [2] "http://id.loc.gov/authorities/subjects/sh2002004407"
Write it to triples:
provenance_df <- as.data.frame(
lapply(provenance(iris_dataset), function(x) x[[1]])
)
provenance_df <- id_to_column(provenance_df, "eg:dataset-1")
provenance_df$started_at <- xsd_convert(provenance_df$started_at)
provenance_df$ended_at <- xsd_convert(provenance_df$ended_at)
dataset_to_triples(provenance_df, idcol="rowid")
#> s p
#> 1 eg:dataset-11 started_at
#> 2 eg:dataset-11 ended_at
#> 3 eg:dataset-11 wasAssocitatedWith
#> 4 eg:dataset-11 wasInformedBy
#> o
#> 1 "2024-01-27T10:00:53Z"^^<xs:dateTime>
#> 2 "2024-01-27T10:00:53Z"^^<xs:dateTime>
#> 3 doi:10.5281/zenodo.10473154
#> 4 https://doi.org/10.1111/j.1469-1809.1936.tb02137.x
PROV
The dataset package follows the PROV Data Model [PROV-DM] and expresses the provenance metadata using the PROV Ontology (PROV-O) expresses using the OWL2 Web Ontology Language (OWL2) [OWL2-OVERVIEW]. It provides a set of classes, properties, and restrictions that can be used to represent and interchange provenance information generated in different systems. The PROV Document Overview describes the overall state of PROV, and should be read before other PROV documents.