Skip to main content

Data Repository

The Broad Center is building a foundational data repository that will serve as an aggregator of key data on school system management, leadership, governance, and organizations. These type of data have not been extensively or comprehensively collected or maintained previously, contributing to a barrier to entry for researchers interested in exploring questions around K-12 education systems. The data repository is available via GitHub.

The datasets currently included in the data repository are:

  • Superintendent Research Dataset: This is a public panel dataset tracking school superintendents in most US school districts. It is designed to be easy to merge with other education datasets. TBC at SOM acknowledges Sam Stemper for his foundational work in aggregating this dataset.
  • District structure and strategy: Individual level data from employee records at 87 public school districts in the United States, shared by Broad network member William Eger (TBR 2017-19), Assistant Superintendent of Finance & Operations, Ravenswood City School District as part of his dissertation research. The data include both raw numbers and classification of different employee types and positions within the district organizational chart, as classified by Eger in his research.
  • School capital investments: This is a public dataset linking school capital investments to student and school district outcomes from Biasi et al. (2025).

Citation Details: The Broad Center encourages researchers to utilize and add to the data in the repository. Please cite per the guidance found on the data repository site.

Questions: If you have questions about the data, please contact The Broad Center at: broad.data@yale.edu