Content from Introduction
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- Why should I create a publication package?
- What are the elements of a publication package?
Objectives
- Recognize the importance of research transparency and data archiving
- Explain the components of a publication package
Why create a publication package?
Compliance with guidelines and policies
First and foremost, the inevitable reason to create a publication package is that is a way to comply with (inter)national guidelines and policies for good academic practice:
Guidelines and policies
- All researchers in the Netherlands should adhere to the Netherlands Code of Conduct for Research Integrity, which describes that it should be clear to others what data the research was based on, how the data were obtained, what and how results were achieved, and that the steps in the research process must be verifiable.
- Similarly, the European Code of Conduct for Research Integrity requires that researchers share their results in an open, honest, transparent, and accurate manner and that they preserve all data, metadata, protocols, code, software, and other research materials appropriately.
- The Guideline for the archiving of academic research for Faculties of Behavioural and Social Sciences in the Netherlands describes how this should be achieved by creating a so-called publication package for each publication.
- At a University level, the Research Data Management Policy of Erasmus University Rotterdam dictates that data must be stored in a correct, complete, unadulterated and reliable manner, and whenever possible, available for subsequent use.
- Additionally, there are also very similar requirements from funders (see for example the NWO and ERC policies on research data management) and journals (see for example the PLOS and Nature portfolio journals policies on data availability)
The conclusion that follows from the (non-exhaustive) list of guidelines and policies above is that as a researcher, you are required to clearly document your whole research process, store it in a safe place and make it publicly available whenever possible (as open as possible and as closed as necessary). By creating a publication package for your published research results, you will end up with a structured bundle detailing everything that is needed to verify and replicate the results published in a specific manuscript.
Discussion
Questions to discuss with your peers:
Which of the above policies and guidelines are familiar to you?
To what extent do you currently comply with those guidelines?
Which extra steps do you need to take to increase compliance?
Making your life easier
Publication packages also yield many benefits for yourself and your (direct) colleagues:
Benefits for you and your colleagues
Benefits for your future self
Imagine you are going to reuse your data or rerun an analysis in a week, a month, a year, or even in 10 years time. Then it is very important that you will organize and document your project thoroughly, because you will not remember all details about the project.
And be aware: your past self doesn’t answer emails! Well-documented data, code and other materials help you to remember and understand all the details even many years later (but it might be useful sooner as well).
Benefits for your collaborators and for re-usability
Well-documented projects also help others to use the data, verify the results and build further on your findings.
When you collaborate with others in a research project, good documentation and metadata will save you countless emails and meetings to explain the details about the project. This is also the case when you are planning to make your data, code and other materials available for re-use. In that case, you want your project components to be self-explanatory, in such a way that others can use it independently.
Video
For those of you who like cringe movies, this video is a great illustration of the importance of a well-documented and archived publication package.
A data management horror story by Karen Hanson, Alisa Surkis and Karen Yacobucci. This is what shouldn’t happen when a researcher makes a data sharing request! Topics include storage, documentation, and file formats.
The contents of a publication package
In the infographic above, the contents of a publication package as described in the Guideline for the archiving of academic research for Faculties of Behavioural and Social Sciences in the Netherlands are summarized. For your convenience, we also list the components below in textual form:
Checklist
-
Manuscript or publication
- Must include a brief description of the problem definition, research design, data collection (sampling, selection and representativeness of informants) and methods used
-
Materials used
- Include instructions, procedures, the design of the experiment and stimulus materials (interview guide, questionnaires, surveys, tests) necessary to replicate the research
-
Raw data files
- Provide the most direct registration of behaviour or reactions of participants. Think of unfiltered export files of surveys, EEG measurements, recordings or transcripts. If needed, include all de-identification steps taken
-
Preprocessing computer code
- Include code (such as Atlas.Ti/SPSS/JASP syntax files, R scripts, etc.) describing the steps taken to process raw data into analysis data, including brief explanations of the steps in English
-
Processed data files
- Provide data (either raw or processed) that were eventually analysed when preparing the article (e.g. a data file after transforming variables, after applying selection, etc.). If the raw data was analysed directly, step 3 suffices
-
Analysis computer code
- Include code describing the steps taken to process the analysis data into the results reported in the manuscript, including brief explanations of the steps in English
-
Data management plan
- Provide a copy of the most recent version of your data management plan
-
Readme file
- Provide a clear readme describing who was involved in the project, when the data was collected, which documents and files can be found where and how to interpret them
-
Ethics documentation
- Documents related to the ethical approval (e.g. approval letter, blank consent form)
In the next part of the workshop, we will look into the different components of a publication package in more detail.
The EUR publication package example that you downloaded to your computer (see data sets section on the setup page) provides examples for all of the components. Additionally, in most cases you will hopefully have some components ready at hand (e.g., a data management plan) and you can immediately add it to your draft publication package.
Key Points
- Create a publication package to comply with (inter)national policies
- Document research in a publication package to make your life easier
- The nine elements of a publication package include data, code, materials and documentation
Content from Prepare your package - I. Documentation
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- Which documents are needed in a publication package for my research project?
- How do I document my package in such a way that is understandable for others?
Objectives
- Assign all relevant research documentation to the publication package of your own research project
- Apply best practices for file names and file formats in your publication package
Instead of chronologically adding the components according to their numbering in the list of publication package components, we will first gather all documentation that is needed for your package in this part of the workshop. Hopefully, most of these documents are already available somewhere on your system (except probably for the readme file). In that case, you can quickly start building your package by gathering those files, perhaps focusing mostly on improving file names and file formats.
Project folder
First, we need a place to save all the components of the publication package in one place.
Steps to take
- Create a folder with a clear name for the research project (use the three principles for file naming described in this presentation)
- Optionally, you can create a small folder structure with sub folders if you prefer (for example, such as used in the EUR publication package example)
Key Points
- Add sufficient documentation to the publication package in the form of a data management plan, manuscript, readme file, and ethics documentation
- Save the files using clear file names and in sustainable file formats
Content from Data management plan
Last updated on 2024-11-19 | Edit this page
The first component that we will add to the package is number 7 in our list of publication package components: the data management plan.
Steps to take
- You should simply provide a copy of the most recent version of your data management plan.
- Make sure it is saved in a sustainable
file format. This can be a .pdf or .odt file. If you have your most
recent version in dmponline, you can download it to your computer in
pdf or an alternative format using the
Download
tab. - Provide the document with a good file name (use the three
principles for file naming described in this presentation) and save
it in the
documentation
folder. - It is also a good moment to take a look at the contents of your data management plan: is it still up to date? Do you need to take more steps to put it into practice?
Example file
See the documentation/dmp_eur-pp_v1.pdf
file from the EUR
publication package example repository on Zenodo:
Content from Manuscript or publication
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- How do I include a description of the problem definition, research design, data collection and methods used in my publication package?
Objectives
- Include the published (or accepted) manuscript or publication in your package
Let’s now continue chronologically with number 1 in our list of publication package components.
Steps to take
- According to the instructions in the Guideline for the archiving of academic research for Faculties of Behavioural and Social Sciences in the Netherlands (p. 8) you should include the published (or accepted) manuscript or publication in your package.
- Additionally, it is stated that you “must include a brief
description of the problem definition, research design, data collection
(sampling, selection and representativeness of informants) and methods
used. An electronic version of the published manuscript will generally
suffice.”
- Check that your manuscript contains this information.
- Make sure the manuscript is saved in a sustainable file format, most likely a .pdf.
- In case your manuscript is not yet finished or accepted, wait with including the manuscript until the publication is accepted and/or finalized.
Example file
See the manuscript_rsos_20230401.pdf
file from the EUR
publication package example repository on Zenodo (note that this is
a mock publication)
Content from Readme file
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- How do I write a readme in such a way that my project is understandable for others?
Objectives
- Add a clear readme to your publication package
- The readme should make it clear when and where the research took place, where to find specific files, and how to interpret them
Steps to take
- According to the instructions in the Guideline
for the archiving of academic research for Faculties of Behavioural and
Social Sciences in the Netherlands (p.9) you should include a
“readme file (metadata) describing which documents and files can be
found where and how they should be interpreted”. A specific list of
information that the readme file should contain is also provided:
- Name of the person who stored the documents or files
- Division of roles among authors, indicating at least who analysed the data
- Date on which the manuscript was accepted, including reference
- Date/period of data collection
- Names of people who collected the data
- If relevant: addresses of field locations where data were collected and contact persons (if any)
- Whether or not an ethical assessment took place before the research, and, if relevant, study reference from and statements made by the Ethics Review Committee
- Whether the data is made open or not and if not, a valid reason for not opening up the data
- Make sure you make the readme file in plain text, using a text editor, like Notepad/TextEdit/Vim, not Word (save as .txt). Alternatively, if you feel comfortable with Markdown, you can use the Markdown format (.md)
Example file
See the README.txt
file from the EUR
publication package example repository on Zenodo:
Other examples that you can use to get started with a readme:
The Cornell guide to writing “readme” style metadata is a very helpful resource that includes a good readme template
Colleagues from Leiden University provide a specific readme template based on the Guideline for archiving for Faculties of Behavioural and Social Sciences in the Netherlands
README exercise
Share your draft README with a colleague or with your neighbor during the workshop.
Ask your peer to read through your README
-
Can they answer the following questions based on the document:
Is it clear when and where the research took place?
Will they know where to find specific files when aiming to reproduce results?
Do they know what specific software to use?
Which improvements do they suggest to make the README as clear as possible?
Content from Ethics documentation
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- Which documents related to ethical approval are needed in a publication package for my research project?
Objectives
- Assign all relevant ethical documentation to the publication package of your own research project
- Apply best practices for file names and file formats in your publication package
Steps to take
- You should provide the documents related to the ethical approval. Think of the approval letter from the ethical committee, a blank consent form, and the ethics application text for your project.
- Make sure the files are saved in a sustainable file format. This can be a .pdf or .odt file.
- Provide the documents with a good file name (use the three
principles for file naming described in this presentation) and save
it in the
documentation
folder.
Example files
See the documentation/ethics_approval_letter.pdf
and
documentation/informed_consent_form.pdf
file from the EUR
publication package example repository on Zenodo:
Key Points
- Add sufficient documentation to the publication package in the form of a data management plan, manuscript, readme file, and ethics documentation
- Save the files using clear file names and in sustainable file formats
Content from Prepare your package - II. Materials, data, code
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- Which materials, data, and code are needed to prepare a publication package for my research project?
- What are best practices for organizing data and code in a publication package?
- How do I document my package in such a way that is understandable for others?
Objectives
- Assign all relevant materials, data, and code to the publication package of your own research project
- Apply best practices for file names and file formats in your publication package
Now that we have gathered all the documentation of the project, the next step is to collect all the materials, data, and code that were used.
Key Points
- Include materials, data and code that are needed to reproduce or replicate your research in the publication package
- Describe data and code clearly, to make sure that everything is self-explanatory
- Save the files using clear file names and in sustainable file formats
Content from Materials used
Last updated on 2024-11-21 | Edit this page
Overview
Questions
- Which materials necessary to replicate the research should be included in the publication package for my research project?
Objectives
- Include all instructions, procedures, experiment design and stimulus materials in your publication package
- Apply best practices for file names and file formats
- Clearly describe all files and procedures
In this step you need to include instructions, procedures, the design of the experiment and stimulus materials (interview guide, questionnaires, surveys, tests) necessary to replicate the research.
Steps to take
- According to the instructions in the Guideline
for the archiving of academic research for Faculties of Behavioural and
Social Sciences in the Netherlands (p.8) you should include:
- “The instructions, procedures, the design of the experiment and stimulus materials (interview guide, questionnaires, surveys, tests) that can reasonably be deemed necessary in order to replicate the research. The materials must be available in the language in which the research was conducted. The publication package must be in English.”
- Make sure all files are saved in a sustainable
file format, and that the files are properly
named). In case you work with sub folders, save the files in the
materials
folder. - Make sure that all files and procedures are clearly described and self-explanatory
Example files
See the codebook and the questionnaire in the materials
folder from the EUR publication package example repository on
Zenodo:
Other examples you can think of:
-
Protocols for interviews or focus groups.
The SOPs4RI project has made the protocol for their focus group study available on their OSF page.
Hoogsteder (2020) has shared their interview protocol on the DANS Data Station Life Sciences.
-
Stimulus materials for experiments:
- The Gamebots project has shared their experimental stimuli used in Jastrzab et al. (2024), including videos of different robots as game-partners in a Rock-Paper-Scissors game, on the Open Science Framework.
Content from Raw data files
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- How do I add the raw data to my publication package?
Objectives
- Add the raw data files to your publication package
- Apply best practices for file and variable names and file formats
Steps to take
- According to the instructions in the Guideline
for the archiving of academic research for Faculties of Behavioural and
Social Sciences in the Netherlands (p.8) you should provide:
- The raw data files, which are “the unedited data that are collected
within the framework of a research project (…) providing the most direct
registration of the behaviour or reactions of test
subjects/respondents”. Examples given:
- Registrations derived from experimental research (e.g., unfiltered export file of an online survey or raw time series for an EEG measurement, e-dat files for an E-Prime behaviour experiment)
- Survey data from questionnaires completed within the framework of research (including longitudinal research), collected by the researcher themselves or by an external fieldwork organization
- (Transcripts of) video material collected within the framework of qualitative research (open interviews, observations)
- Notes taken within the framework of qualitative research or research using source or media material
- In case you de-identified the data, you also need to include documentation of the steps taken to de-identify the data. Note that only personal data such as contact details or other variables not needed for the actual research should be removed for de-identification. All personal data that is part of the research data should be retained in the publication package for archiving (later you should of course remove identifiers before publication of the data in a public repository).
- The raw data files, which are “the unedited data that are collected
within the framework of a research project (…) providing the most direct
registration of the behaviour or reactions of test
subjects/respondents”. Examples given:
- If the raw data files have been accessibly stored in an external data repository (such as a DANS Data Station), making reference to the files in this archive will suffice.
- Make sure all files are saved in a sustainable
file format such as .csv, and that the files and variables are properly
named) and clearly described. Save the files in the
data
folder.
Example file
See the safi_raw.csv
file in the data
folder from the EUR publication package example repository on
Zenodo:
Data exercise
Share a (de-identified) copy of your raw data file with a colleague or with your neighbor during the workshop.
Can they open the file without the need for any specialized software?
-
Is it clear to them what all the variables are?
- If not, is there another file, such as a codebook or README in which the variable names are clearly explained?
Which improvements do they suggest to make the data file as clear as possible?
Content from Preprocessing computer code
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- How do I include preprocessing computer code in my publication package in such a way that is understandable for others?
Objectives
- Include computer code describing the steps taken to process the raw data into analysis data in your publication package
- Consider using tools such as Quarto, R Markdown, or Jupyter notebooks to share code and narrative text in one document
Steps to take
- You should include computer code (for example Atlas.ti, SPSS/JASP syntax file, MATLAB analysis scripts, R code) describing the steps taken to process the raw data into analysis data. This should include brief explanations of the steps in English, for example a brief description of the steps taken in the qualitative analysis of primary research data (themes, domains, taxonomies, components).
- There are many ways to include computer code in your publication package, depending on the analysis tools you use. Tools like Quarto, R markdown, or Jupyter notebooks are a great way to share code and narrative text in one document. This will make it much easier to clearly describe the steps that were taken to process the data.
- A bonus option would be to have your preprocessing and analysis code checked for reproducibility by others. You can consider submitting your data and code to ReproHack or CODECHECK. Even if you don’t, it would be helpful to take into account their guidelines: both initiatives emphasize that documentation of your code is key!
Example files
See the preprocessing_safi.qmd
and
preprocessing_safi.html
file in the scripts
folder from the EUR publication package example repository on
Zenodo. The .qmd file is a Quarto markdown document, in which R code and
documentation are combined. It produces a readable html file that can
also be included in the publication package. See the html file
below:
Other examples you can think of:
-
Descriptions of steps taken to process qualitative data.
- Hanzon (2019) has shared a description of the color coding of their interviews on the DANS Data Station Social Science and Humanities as well as the color-coded version of the anonymized interview transcripts (in Dutch).
Content from Processed data files
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- How do I add the processed data to my publication package?
Objectives
- Add the processed data files to your publication package
- Apply best practices for file and variable names and file formats
Steps to take
- You need to provide the data files that were eventually analysed when preparing the article. Examples are the data file after transforming variables and after applying selections. This means that in this step you should provide the outcome file from the two previous steps: the result of the preprocessing of the raw data.
- If the raw data file was directly analysed, you do not need to provide any extra files in this step.
- Make sure all files are saved in a sustainable
file format such as .csv, and that the files are properly
named). Save the files in the
data
folder.
Example file
See the safi_processed-for-plotting.csv
file in the
data_output
folder from the EUR
publication package example repository on Zenodo:
Content from Analysis computer code
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- How do I include analysis computer code in my publication package in such a way that is understandable for others?
Objectives
- Include computer code describing the analysis data into the results reported in the manuscript in your publication package
- Consider using tools such as Quarto, R Markdown, or Jupyter notebooks to share code and narrative text in one document
Steps to take
- You should include computer code (for example syntax files from SPSS/JASP, Atlas.ti, Matlab, R; syntaxes of tailored software) describing the steps taken to process the analysis data into results in the manuscript. This should include brief explanations of the steps in English.
- Just as with the preprocessing computer code, for the analysis code it is very helpful to use tools like Quarto, R markdown, or Jupyter notebooks.
- Again, it is highly recommended to have your preprocessing and analysis code checked for reproducibility by others, or at the least check guidelines from initiatives such as ReproHack or CODECHECK. Keep in mind that documentation of your code is key!
Example files
See the analysis_safi.qmd
and
analysis_safi.html
file in the scripts
folder
from the EUR publication package example repository on
Zenodo. The .qmd file is a Quarto markdown document, in which R code and
documentation are combined. It produces a readable html file that can
also be included in the publication package. See the html file
below:
Other examples you can think of:
-
Descriptions of steps taken to analyse qualitative data.
- Zuber, Strach, & Pérez-Chiqués (2023) have shared a detailed description of the data coding and analysis procedure (including coded excerpts) on the Qualitative Data Repository for their project consisting of interviews, participant observation, and focus groups.
Code exercise
Share a copy of your analysis computer code or syntax with a colleague or with your neighbor during the workshop.
Can they open the file without the need for any specialized software?
Is it clear to them what is needed to analyze the data?
Bonus question: are they able to rerun your analysis independently?
Which improvements do they suggest to make the data file as clear as possible?
Key Points
- Include materials, data and code that is needed to reproduce or replicate your research in the publication package
- Describe data and code clearly, to make sure that everything is self-explanatory
- Save the files using clear file names and in sustainable file formats