Skip to contents

The dataset package extension to the R statistical environment aims to ensure that the most important R object that contains a dataset, i.e. a data.frame or an inherited tibble, tsibble or data.table contains important metadata for the reuse and validation of the dataset contents. We aim to offer a novel solution to support individuals or small groups of data scientists working in various business, academic or policy research functions who cannot count on the support of librarians, knowledge engineers, and extensive documentation processes.

The dataset package extends the concept of tidy data and adds further, standardized semantic information to the user’s dataset to increase the (re-)use value of the data object.

Descriptive metadata

print(as_dublincore(iris_dataset), 'Bibtex')
#> @Misc{,
#>   title = {Iris Dataset},
#>   author = {Edgar Anderson},
#>   identifier = {https://doi.org/10.5281/zenodo.10396807},
#>   publisher = {American Iris Society},
#>   year = {:tba},
#>   language = {en},
#>   relation = {:unas},
#>   format = {application/r-rds},
#>   rights = {:unas},
#>   description = {The famous (Fisher's or Anderson's) iris data set.},
#>   type = {DCMITYPE:Dataset},
#>   datasource = {:unas},
#>   coverage = {:unas},
#> }
print(as_datacite(iris_dataset), 'Bibtex')
#> @Misc{,
#>   title = {Iris Dataset},
#>   author = {Edgar Anderson},
#>   identifier = {https://doi.org/10.5281/zenodo.10396807},
#>   publisher = {American Iris Society},
#>   year = {1935},
#>   date = {:tba},
#>   language = {en},
#>   alternateidentifier = {:unas},
#>   relatedidentifier = {:unas},
#>   format = {application/r-rds},
#>   version = {0.1.0},
#>   rights = {:unas},
#>   description = {The famous (Fisher's or Anderson's) iris data set.},
#>   geolocation = {:unas},
#>   fundingreference = {:unas},
#> }

Provenance metadata

See more vignette("provenance", package = "dataset")

provenance(iris_dataset)
#> $started_at
#> [1] "2024-01-27 09:00:53 GMT"
#> 
#> $ended_at
#> [1] "2024-01-27 09:00:53 GMT"
#> 
#> $wasAssocitatedWith
#> [1] "doi:10.5281/zenodo.10473154"
#> 
#> $wasInformedBy
#> [1] "https://doi.org/10.1111/j.1469-1809.1936.tb02137.x"

Structural metadata

## Only the first variable is printed:
DataStructure(iris_dataset)[[1]]
#> $name
#> [1] "Sepal.Length"
#> 
#> $label
#> [1] "The sepal length of iris specimen in centimeters."
#> 
#> $type
#> [1] ""
#> 
#> $range
#> [1] "xsd:decimal"
#> 
#> $comment
#> [1] ""
#> 
#> $concept
#> $concept$heading
#> [1] ""
#> 
#> $concept$schemeURI
#> [1] ""
#> 
#> $concept$valueURI
#> [1] ""
#> 
#> 
#> $defintion
#> $defintion$schemeURI
#> [1] ""
#> 
#> $defintion$valueURI
#> [1] ""