Exploring the landscape of recommender systems evaluation: practices and perspectives

Christine Bauer*, Eva Zangerle, Alan Said

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Recommender systems research and practice are fast-developing topics with growing adoption in a wide variety of information access scenarios. In this article, we present an overview of research specifically focused on the evaluation of recommender systems. We perform a systematic literature review, in which we analyze 57 papers spanning six years (2017–2022). Focusing on the processes surrounding evaluation, we dial in on the methods applied, the datasets utilized, and the metrics used. Our study shows that the predominant experiment type in research on the evaluation of recommender systems is offline experimentation and that online evaluations are primarily used in combination with other experimentation methods, e.g., an offline experiment. Furthermore, we find that only a few datasets (MovieLens, Amazon review dataset) are widely used, while many datasets are used in only a few papers each. We observe a similar scenario when analyzing the employed performance metrics—a few metrics are widely used (precision, normalized Discounted Cumulative Gain, and Recall), while many others are used in only a few papers. Overall, our review indicates that beyond-accuracy qualities are rarely assessed. Our analysis shows that the research community working on evaluation has focused on the development of evaluation in a rather narrow scope, with the majority of experiments focusing on a few metrics, datasets, and methods.
Original languageEnglish
Article number11
Number of pages31
JournalACM Transactions on Recommender Systems
Volume2
Issue number1
Early online date28 Oct 2023
DOIs
Publication statusPublished - 7 Mar 2024

Keywords

  • recommender systems
  • evaluation
  • systematic literature review

Fields of Science and Technology Classification 2012

  • 102 Computer Sciences

Cite this