Skip to contents

Add metadata conforming the DataCite Metadata Schema to datasets, i.e. structured R data.frame or list objects, for an accurate and consistent identification of a resource for citation and retrieval purposes.



  titleType = NULL,
  Identifier = NULL,
  Publisher = NULL,
  PublicationYear = "THIS",
  Subject = NULL,
  Type = "Dataset",
  Contributor = NULL,
  Date = NULL,
  Language = NULL,
  AlternateIdentifier = NULL,
  RelatedIdentifier = NULL,
  Format = NULL,
  Version = NULL,
  Rights = NULL,
  Description = NULL,
  Geolocation = NULL,
  FundingReference = NULL,
  overwrite = TRUE



An R object of type data.frame, or inherited data.table, tibble; alternatively a well structured R list.


The name(s) or title(s) by which a resource is known. May be the title of a dataset or the name of a piece of software. Similar to dct:title.
See dataset_title for adding further titles.


For a single Title defaults to NULL. Otherwise you can add a Subtitle, an Alternative Title and an Other Title. See dataset_title.


The main researchers involved in producing the data, or the authors of the publication, in priority order. To supply multiple creators, repeat this property.


The Identifier is a unique string that identifies a resource. For software, determine whether the identifier is for a specific version of a piece of software, (per the Force11 Software Citation Principles, or for all versions. Similar to dct:title in dublincore.


The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role. For software, use Publisher for the code repository. Mandatory in DataCite, and similar to dct:publisher. See publisher.


The year when the data was or will be made publicly available in YYYY format.See publication_year.


Recommended for discovery. Subject, keyword, classification code, or key phrase describing the resource. Similar to dct:subject.
Use subject to properly add a key phrase from a controlled vocabulary and create structured Subject objects with subject_create.


Defaults to Dataset. The DataCite resourceType definition refers back to dcm:type. The Type$resourceTypeGeneral is set to Dataset, while the user can set a more specific Type$resourceType value. See resource_type.


Recommended for discovery. The institution or person responsible for collecting, managing, distributing, or otherwise contributing to the development of the resource.


Recommended for discovery in DataCite. Similar to dct:date in dublincore.


The primary language of the resource. Allowed values are taken from IETF BCP 47, ISO 639-1 language code. See language.


An identifier or identifiers other than the primary Identifier applied to the resource being registered. This may be any alphanumeric string which is unique within its domain of issue. May be used for local identifiers. AlternateIdentifier should be used for another identifier of the same instance (same location, same file).


Recommended for discovery. Similar to dct:relation.


Technical format of the resource. Similar to dct:format.


Free text. Suggested practice: track major_version.minor_version. See version.


Any rights information for this resource. The property may be repeated to record complex rights characteristics. Free text. See rights.


Recommended for discovery. All additional information that does not fit in any of the other categories. May be used for technical information. A free text. Similar to dct:description.


Recommended for discovery. Spatial region or named place where the data was gathered or about which the data is focused. See geolocation.


Information about financial support (funding) for the resource being registered.


If pre-existing metadata properties should be overwritten, defaults to TRUE.


An R object with at least the mandatory DataCite attributes.


DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for research data and other research outputs. Organizations within the research community join DataCite as members to be able to assign DOIs to all their research outputs. This way, their outputs become discoverable and associated metadata is made available to the community.
DataCite then develops additional services to improve the DOI management experience, making it easier for our members to connect and share their DOIs with the broader research ecosystem and to assess the use of their DOIs within that ecosystem. DataCite is an active participant in the research community and promotes data sharing and citation through community-building efforts and outreach activities.

The ResourceType property will be by definition "Dataset". The Size attribute (e.g. bytes, pages, inches, etc.) will automatically added to the dataset.

See also

Other metadata functions: dublincore(), related_item()


my_iris <- datacite_add(
   x = iris,
   Title = "Iris Dataset",
   Creator = person(family = "Anderson", given = "Edgar", role = "aut"),
   Publisher = "American Iris Society",
   PublicationYear = 1935,
   Geolocation = "US",
   Language = "en")

#> $names
#> [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
#> $Title
#> $Title$Title
#> [1] "Iris Dataset"
#> $Creator
#> [1] "Edgar Anderson [aut]"
#> $Identifier
#> [1] NA
#> $Publisher
#> [1] "American Iris Society"
#> $Issued
#> [1] 1935
#> $publication_year
#> [1] 1935
#> $Type
#> $Type$resourceType
#> [1] "Dataset"
#> $Type$resourceTypeGeneral
#> [1] "Dataset"
#> $Description
#> [1] NA
#> $Geolocation
#> [1] "US"
#> $Language
#> [1] "eng"
#> $Rights
#> [1] NA
#> $Size
#> [1] "11.34 kB [11.08 KiB]"