2023·07·06Paper2023·07·06 — PaperRogue Scores: Evaluation Reproducibility ProjectOver 2,000 language modeling papers may report incorrect scores.
2023·08·05Note2023·08·05 — NoteEvaluation Software Errors in the ACL 2023 ProceedingsIncorrect or irreproducible model scores found in 15% of papers.
[Updated]2023·12·18Note2023·12·18 [Updated] — NoteEvaluation Software Errors in the EMNLP 2023 ProceedingsIncorrect or irreproducible model scores found in 10% of papers.
2023·08·22Supplement2023·08·22 — SupplementReproducing the Results of Rogue ScoresStep-by-step guide for reproducing the paper.
2023·07·10Conference2023·07·10 — ConferencePresenting Rogue Scores at ACL 2023 in TorontoFrontenac Ballroom, July 10 from 11:00 to 12:30 EDT.
2023·07·09Memes2023·07·09 — MemesCall For Memes: Virtual Meme Session at ACL 2023Inviting all attendees to submit their research/cat memes.
2023·06·22Talk2023·06·22 — TalkInvited Speaker at Northeastern University GPT WorkshopOn future, ethics, and limitations of large language models.