NEWSREEL was first organised as a campaign-style evaluation lab of CLEF 2014. The lab consisted of two tasks that focused on benchmarking news recommendations. While an overview of these tasks is given below, a more detailed description of NEWSREEL 2014 can be found in the proceedings of CLEF 2014. The performance results were presented during a half day workshop in Sheffield. An overview of the submissions of participating teams is provided in the working notes of CLEF 2014.


  1. Task 1: Predict interactions in an offline dataset.

    Due to the organization of the Netflix challenge, evaluation of recommendation algorithms is dominated by offline evaluation scenarios. Addressing this evaluation scenario, we provided an offline evaluation task. The dataset used for this task consisted of all updates and interactions of ten domains, focusing on local news, sports, business, technology, and national news, respectively. Before releasing the dataset, we identified fifteen time slots of 2–6 hours of length for each of the ten domains and removed all (user, item)-pairs within these slots. The task was to predict the interactions that occurred during these time periods. Predictions were considered successful if the predicted (user, item)-pair actually occurred in the data set. In the evaluation, all partitions were treated separately, the winning contribution was determined by aggregating the results from the individual partitions. Participants were not asked to provide running code, but instead had to provide files with their predictions.

  2. Task 2: Recommend news articles in real-time.

    In the second task, participants got the chance to benchmark recommendation algorithms in a living lab. Lab registration started in November 2013 and closed in May 2014. Once registered for CLEF, participating teams received an account on the Open Recommendation Platform. After providing a server address and after registering an algorithm in the dashboard, they were constantly receiving requests for recommendations. The platform was constantly online, thus leaving the participants various months time to fine-tune their algorithms. In order to compare the different participating teams, we defined three evaluation periods of two weeks duration each during which we recorded the numbers of clicks, numbers of requests and the click- through rate (CTR). The evaluation periods were scheduled in early February 2014, early April 2014 and late May 2014.


  • Frank Hopfgartner, TU Berlin
  • Andreas Lommatzsch, TU Berlin
  • Benjamin Kille, TU Berlin
  • Torben Brodt, plista GmbH
  • Tobias Heintz, plista GmbH

Steering Committee

  • Pablo Castells, Universidad Autónoma de Madrid
  • Paolo Cremonesi, Politechnico di Milano
  • Hideo Hoho, University of Tsukuba
  • Udo Kruschwitz, University of Essex
  • Joemon M. Jose, University of Glasgow
  • Mounia Lalmas, Yahoo! Labs
  • Martha Larson, TU Delft
  • Jimmy Lin, University of Maryland
  • Vivien Petras, Humboldt University
  • Domonkos Tikk, Gravity R&D and Óbuda University