Skip to contents

You need the latest development version of declared.

remotes::install_github('dusadrian/declared')

The survey class will be derived from the dataset class.

This documentation is not updated yet to the development version of the [dataset] package.

obs_id <- c("Saschia Iemand", "Jane Doe", 
            "Jack Doe", "Pim Iemand", "Matti Virtanen" )
sex <- declared ( c(1,1,0,-1,1), 
                  labels = c(Male = 0, Female = 1, DK = -1), 
                  na_values = -1)
geo <- c("NL-ZH", "IE-05", "GB-NIR", "NL-ZH", "FI1C")
difficulty_bills <- declared (
  c(0,1,2,-1,0), 
  labels = c(Never = 0, Time_to_time = 1, Always = 2, DK = -1)
  )
age_exact <- declared (
  c( 34,45,21,55,-1), 
  labels = c( A = 34,A = 45,A  = 21, A= 55, DK = -1)
)
listen_spotify <- declared (
  c(0,1,9,0,1),
  labels = c( No = 0, Yes = 1,Inap = 9), 
  na_values = 9
)
raw_survey <- data.frame ( 
  obs_id = obs_id, 
  geo = geo, 
  listen_spotify = listen_spotify,
  sex = sex,
  age_exact = age_exact, 
  difficulty_bills = difficulty_bills
)

survey_dataset  <- dataset( x= raw_survey,
                            title = "Tiny Survey", 
                            author = person("Jane", "Doe")
                            )
dataset_bibentry(survey_dataset)
#> Doe J (2024). "Tiny Survey."
dublincore(survey_dataset)

It is a good practice to define valid, but not present labels in declared, because in the retrospective harmonization workflow they may be concatenated (binded) together with further observations that do have the currently not used label.

In this example, the DK or declined label is not in use.

# This is not valied in declared
listen_spotify <- declared(
  c(0,1,9,0,1),
  labels = c( No = 0, Yes = 1,Inap = 9, DK =-1), 
  na_values = c(9, -1)
  )
print(listen_spotify)
#> <declared<numeric>[5]>
#> [1]     0     1 NA(9)     0     1
#> Missing values: 9, -1
#> 
#> Labels:
#>  value label
#>      0    No
#>      1   Yes
#>      9  Inap
#>     -1    DK
c(listen_spotify, declared(
  c(-1,-1,-1),
  labels = c( No = 0, Yes = 1,Inap = 9, DK =-1)
  ))
#> <declared<numeric>[8]>
#> [1]      0      1  NA(9)      0      1 NA(-1) NA(-1) NA(-1)
#> Missing values: -1, 9
#> 
#> Labels:
#>  value label
#>     -1    DK
#>      0    No
#>      1   Yes
#>      9  Inap
summary(listen_spotify)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#>     0.0     0.0     0.5     0.5     1.0     1.0       1
dc_tiny_survey <- dublincore(
  title = "Tiny Survey", 
  creator = person("Daniel", "Antal"), 
  identifier = 'example-1', 
  publisher = "Example Publishing", 
  subject = "Surveys", 
  language = "en")

The survey class inherits elements of the dataset class, but it will be more strictly defined. I am considering to make declared every single column except for the obs_id. Even numeric types with Inap and DK would map nicely to CL_OBS_STATUS SDMX codes that make missing observation explicit, and try to categorize them.

print(dc_tiny_survey, "Bibtex")

@Misc{, title = {Tiny Survey}, author = {Daniel Antal}, identifier = {example-1}, publisher = {Example Publishing}, year = {:tba}, language = {en}, relation = {:unas}, format = {:unas}, rights = {:tba}, type = {DCMITYPE:Dataset}, datasource = {:unas}, coverage = {:unas}, }

Is the summary method implemented for declared? Both dataset and survey will need new print and summary methods.

summary(survey_dataset)
#> Doe J (2024). "Tiny Survey."
#> Further metadata: describe(survey_dataset)
#>     obs_id              geo            listen_spotify      sex      
#>  Length:5           Length:5           Min.   :0.0    Min.   :0.00  
#>  Class :character   Class :character   1st Qu.:0.0    1st Qu.:0.75  
#>  Mode  :character   Mode  :character   Median :0.5    Median :1.00  
#>                                        Mean   :0.5    Mean   :0.75  
#>                                        3rd Qu.:1.0    3rd Qu.:1.00  
#>                                        Max.   :1.0    Max.   :1.00  
#>                                        NA's   :1      NA's   :1     
#>    age_exact    difficulty_bills
#>  Min.   :-1.0   Min.   :-1.0    
#>  1st Qu.:21.0   1st Qu.: 0.0    
#>  Median :34.0   Median : 0.0    
#>  Mean   :30.8   Mean   : 0.4    
#>  3rd Qu.:45.0   3rd Qu.: 1.0    
#>  Max.   :55.0   Max.   : 2.0    
#> 

The survey (should) contain the entire processing history from creation, and optionally the DataCite schema for publication created with datacite_add(). A similar dublincore_add function uses the Dublin Core metadata definitions.

Eventually, a connection to the packages zen4R will make sure that the correctly described dataset can get a Zenodo record, receive a DOI, the DOI recorded in the object, and upload to Zenodo.