In collaboration with plista, we provide a new data set for NewsREEL 2017. This data set captures interactions on eight publishers in February 2016. The recorded stream of events includes two million notifications, 58 thousand item updates, and 168 million recommendation requests. Participants may use the data set to conduct experiments and optimise their recommendation algorithms for the online evaluation.
Participants register at the CLEF 2017 registration form to gain access to the data set. Make sure to check the boxes related to NewsREEL. Upon successful registration, you will receive further information on how to download the data set. As the data set is approximately 15GB in size, you need to prepare your system accordingly.
You may find additional information about the characteristics of the data in the ORP documentation or in our publication:
Benjamin Kille, Frank Hopfgartner, Torben Brodt, Tobias Heintz. The plista Dataset. In NRS’13: Proceedings of the News Recommendation Workshop and Challenge, held in conjunction with ACM RecSys, Hong Kong, China, ACM ICPS, pp. 14-22, 10 2013.
@inproceedings{Kille:ThePlistaDataset:2013,
author = {Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias},
title = {The plista Dataset},
booktitle = {NRS’13: Proceedings of the International Workshop and Challenge on News Recommender Systems},
year = {2013},
month = {10},
pages = {14–22},
location = {Hong Kong, China},
publisher = {ACM},
series = {ICPS}
}
Please note that by participating in CLEF NewsREEL, you automatically accepting a non-disclosure agreement with plista. The data set can be used for research purposes. Distribution and commercial use are prohibited.
The following figures show some insights related to the data set. We note that an individual publisher accounts for most of the interactions. In addition, we observe that phone usage dominates all other devices. Visitors are more active on Mondays as opposed to Saturdays. There is merely limited activity in the night time.