Characterizing the Roles of Contributors in Scientific OSS Projects

2 minute read

Recently some colleagues and I have been trying to understand the roles of contributors in open source scientific software projects. We studied 7 well-known scientific software projects that were hosted on GitHub (including Chaste, Khmer, and Genn). For each selected project, we identified the roles played by different contributors by analyzing each projects’ documentation, websites, and other readily available sources. Using a mix of quantitative and qualitative data, we curate some findings that we believe are interesting for the broad Medium population. Here they are.

  • Senior researchers tend to be the most active and prolific contributors in terms of commits and file creation. In four of the seven projects we studied, faculty and staff contributors were responsible for half or more of commits made to the project (with an average commit share of 72%). In five projects, senior members were also responsible for the majority of files created and, by that measure, the resulting project structure. This influence over the overall direction of the software project was also evident in the fact that senior researchers were the most likely to have interacted with files related to the build system, project metadata, and developer documentation.

  • Junior contributors, especially graduate students, are critical drivers of new features as well as supporting activities like test creation. On average, junior contributors were responsible for 42% of commits across all projects we studied; in one case, juniors were responsible for nearly 100% of all commit activity. The majority of these commits came from graduate students, who had the longest contribution periods among juniors (with 1.72 years of commit activity compared to 0.98 years for postdocs and 4 months for undergraduates). Similar to senior contributors, junior contributors are significantly involved in creating new features, improving existing capabilities, and fixing bugs.

  • An open-source model facilitates external contributions, but the results are mixed. On one hand, an open- source model makes it easier to attract third parties to help grow and maintain the software. However, the software is also made for and by members of a relatively niche and intensely preoccupied community. In the majority of projects we studied, third party contributors tended to be domain expert users who were only active for one day. We also note, however, that these same contributors are more likely to offer defect-correcting commits, which is highly valuable.

If you want to know more about this study, you can read the preprint available here.

Updated: