Where to find software development related data sets
These days we are massively working for finding datasets to obtain some data on Software Development activities. I decided to share some good resources with you that I found.
UPDATE 20th December 2021
Please find my own dataset and the java app I used to fetch the data. This contains data of 4 open-source projects.
Agile Scrum Sprint Velocity DataSet
01. Promise dataset
This is a well-known database for SE research data.
02. JIRA Social Repository Dataset
Dataset extracted from the Jira ITS of four popular open source ecosystems i.e., the Apache Software Foundation, Spring, JBoss and CodeHaus communities. This contains more than 1K projects, containing more than 700K issue reports and more than 2 million issue comments.
03. Data Analysis in Software Engineering (DASE) book
This Data Analysis in Software Engineering (DASE) book/notes will try to teach you how to do data science with R in Software Engineering. It contains a complete description of data, repositories, data mining process, and ML techniques.
This includes some other reference to data.
04. SEAnalytics dataset
This data set contains data from 9 repositories for agile sprints, story points, and delayed issues.