Preprocessing computer code
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- How do I include preprocessing computer code in my publication package in such a way that is understandable for others?
Objectives
- Include computer code describing the steps taken to process the raw data into analysis data in your publication package
- Consider using tools such as Quarto, R Markdown, or Jupyter notebooks to share code and narrative text in one document
Steps to take
- You should include computer code (for example Atlas.ti, SPSS/JASP syntax file, MATLAB analysis scripts, R code) describing the steps taken to process the raw data into analysis data. This should include brief explanations of the steps in English, for example a brief description of the steps taken in the qualitative analysis of primary research data (themes, domains, taxonomies, components).
- There are many ways to include computer code in your publication package, depending on the analysis tools you use. Tools like Quarto, R markdown, or Jupyter notebooks are a great way to share code and narrative text in one document. This will make it much easier to clearly describe the steps that were taken to process the data.
- A bonus option would be to have your preprocessing and analysis code checked for reproducibility by others. You can consider submitting your data and code to ReproHack or CODECHECK. Even if you don’t, it would be helpful to take into account their guidelines: both initiatives emphasize that documentation of your code is key!
Example files
See the preprocessing_safi.qmd
and
preprocessing_safi.html
file in the scripts
folder from the EUR publication package example repository on
Zenodo. The .qmd file is a Quarto markdown document, in which R code and
documentation are combined. It produces a readable html file that can
also be included in the publication package. See the html file
below:
Other examples you can think of:
-
Descriptions of steps taken to process qualitative data.
- Hanzon (2019) has shared a description of the color coding of their interviews on the DANS Data Station Social Science and Humanities as well as the color-coded version of the anonymized interview transcripts (in Dutch).