Machine Learning Research (ACM CCS '23)

"Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security Conferences

University of Florida — Spring 2024

Conducted hands-on reproducibility testing for a comprehensive study measuring the state of computational reproducibility in machine learning security research across Tier 1 security conferences from 2013-2022.

Background

Reproducibility is fundamental to scientific advancement, yet many fields have faced reproducibility crises. Computer Security has a unique advantage in creating computational artifacts (code, data, figures) that should facilitate reproducibility. However, no comprehensive study had measured the actual state of reproducibility in the security community, particularly for machine learning papers.

Research Approach

As a research assistant, I performed hands-on testing and reproduction of nearly 750 machine learning papers from Tier 1 security conferences (CCS, S&P, USENIX Security, NDSS). This involved systematically attempting to run provided codebases, reproduce results, and document the success rate of computational reproducibility across a decade of research.

Key Findings

No statistically significant improvement in artifact availability after Artifact Evaluation Committees were introduced
Artifacts that passed through evaluation committees worked at higher rates than those that didn't
Identified five common problems affecting reproducibility in security research
Demonstrated that significant progress is still needed in computational reproducibility

My Contributions

Executed and tested codebases from hundreds of published papers
Documented reproduction success rates and failure modes
Analyzed patterns in computational reproducibility across different time periods
Contributed to data collection for the largest reproducibility study in security research

Impact

Published at CCS '23: ACM SIGSAC Conference on Computer and Communications Security
First comprehensive measurement of reproducibility in security research
Provided data-driven recommendations for improving reproducibility
Established baseline metrics for future reproducibility studies

Technologies Used

PythonMachine LearningCybersecurityResearch MethodologyCode ReproductionData Analysis