Rogue Scores: Evaluation Reproducibility Project
Over 2,000 language modeling papers may report incorrect scores.
Paper
— Paper
Rogue Scores: Evaluation Reproducibility Project
Over 2,000 language modeling papers may report incorrect scores.
Note
— Note
Evaluation Software Errors in the ACL 2023 Proceedings
Incorrect or irreproducible model scores found in 15% of papers.
[Updated]
Note
[Updated] — Note
Evaluation Software Errors in the EMNLP 2023 Proceedings
Incorrect or irreproducible model scores found in 10% of papers.
Supplement
— Supplement
Reproducing the Results of Rogue Scores
Step-by-step guide for reproducing the paper.
Conference
— Conference
Presenting Rogue Scores at ACL 2023 in Toronto
Frontenac Ballroom, July 10 from 11:00 to 12:30
Memes
— Memes
Call For Memes: Virtual Meme Session at ACL 2023
Inviting all attendees to submit their research/cat memes.
Talk
— Talk
Invited Speaker at Northeastern University GPT Workshop
On future, ethics, and limitations of large language models.
Website
— Website
Website Launch
This website now exists — click to visit!