Where to find software development related data sets

Randula Koralage
2 min readJul 5, 2020

--

These days we are massively working for finding datasets to obtain some data on Software Development activities. I decided to share some good resources with you that I found.

UPDATE 20th December 2021

Please find my own dataset and the java app I used to fetch the data. This contains data of 4 open-source projects.
Agile Scrum Sprint Velocity DataSet

Fetching Sprint Data Project

01. Promise dataset

This is a well-known database for SE research data.

02. JIRA Social Repository Dataset

Dataset extracted from the Jira ITS of four popular open source ecosystems i.e., the Apache Software Foundation, Spring, JBoss and CodeHaus communities. This contains more than 1K projects, containing more than 700K issue reports and more than 2 million issue comments.

03. Data Analysis in Software Engineering (DASE) book

This Data Analysis in Software Engineering (DASE) book/notes will try to teach you how to do data science with R in Software Engineering. It contains a complete description of data, repositories, data mining process, and ML techniques.

This includes some other reference to data.

04. SEAnalytics dataset

This data set contains data from 9 repositories for agile sprints, story points, and delayed issues.

--

--

Randula Koralage
Randula Koralage

No responses yet