Best practices for scientific software development
If you’re reading this guide, it’s probably because you write scientific code and are curious about how you can get better at it. But what does it mean to be “good at coding”? What might comes to mind is writing concise code, or code that runs quickly, or code that is clear. We believe that this is just one piece of the puzzle.
The goals of this guide are to teach you both how to write better code and how to get better at the process of writing code. This means adopting habits, or best practices, while writing scientific software that ensure your projects are well-documented, reproducible, and extensible. Scientists often write code, but they are generally not professionally-trained software developers. Adopting some of the ways in which professional developers work can make your work more efficient and robust. This guide collects such practices and discusses them in a way that is specific to working in science.
Guide organization
Changing how you work can be a daunting task. Depending on how experienced you are with programming, this guide may offer you a lot of information to take in at once. To make the learning process less intimidating, this guide is organized in a cumulative way.
We start with best practices that we consider to be absolutely fundamental, and thus important to assimilate into your day-to-day coding practice. Once these practices become second nature, we encourage you to continue building out your skills by explore the best practices for developing projects. While working on a project, you may realize that you want to re-use some project code, or that it might be of use to others, so you decide to package it. As a project or package matures, you may consider deploying it and maintaining it in the longer term. We include a section of extras to collect additional content linked to throughout the main guide. Use the sidebar to navigate the guide and choose your own adventure!
Wherever possible, we have tried to make confluent recommendations that reduce friction if and when your coding projects evolve.
A note on languages
A lot of the advice in this guide is not specific to any coding language. However, many of the resources and libraries we point to are for R and Python as these are two very widely-used languages in quantitative disciplines.
Acknowledgements
This guide was written by and for scientific software developers and scientists in the Public Health Agency of Canada (PHAC), though its advice may be of use more widely. This project was led by Irena Papst (PHAC Modelling Hub). The guide was authored by Irena Papst, Brennan Chapman (Modelling Hub) and Aaron Petkau (National Microbiology Laboratory). Parts of this guide were based on previous writing by Eric Enns, Eric Marinier, Deep Sidhu, Phil Mabon, and Aaron Petkau from the PHAC National Microbiology Laboratory.