← Back to all projects

Machine Learning Research (ACM CCS '23)

"Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security Conferences

University of Florida — Spring 2024

Conducted hands-on reproducibility testing for a comprehensive study measuring the state of computational reproducibility in machine learning security research across Tier 1 security conferences from 2013-2022.

Background

Reproducibility is fundamental to scientific advancement, yet many fields have faced reproducibility crises. Computer Security has a unique advantage in creating computational artifacts (code, data, figures) that should facilitate reproducibility. However, no comprehensive study had measured the actual state of reproducibility in the security community, particularly for machine learning papers.

Research Approach

As a research assistant, I performed hands-on testing and reproduction of nearly 750 machine learning papers from Tier 1 security conferences (CCS, S&P, USENIX Security, NDSS). This involved systematically attempting to run provided codebases, reproduce results, and document the success rate of computational reproducibility across a decade of research.

Key Findings

  • No statistically significant improvement in artifact availability after Artifact Evaluation Committees were introduced
  • Artifacts that passed through evaluation committees worked at higher rates than those that didn't
  • Identified five common problems affecting reproducibility in security research
  • Demonstrated that significant progress is still needed in computational reproducibility

My Contributions

  • Executed and tested codebases from hundreds of published papers
  • Documented reproduction success rates and failure modes
  • Analyzed patterns in computational reproducibility across different time periods
  • Contributed to data collection for the largest reproducibility study in security research

Impact

  • Published at CCS '23: ACM SIGSAC Conference on Computer and Communications Security
  • First comprehensive measurement of reproducibility in security research
  • Provided data-driven recommendations for improving reproducibility
  • Established baseline metrics for future reproducibility studies

Technologies Used

PythonMachine LearningCybersecurityResearch MethodologyCode ReproductionData Analysis