Task 1: Benchmarking News Recommendation in a Living Lab.
Dear adventurer,
welcome to the exciting task of recommending news to real users. We will guide you on your way to select the most appropriate contents for visitors of news websites. These visitors are in desperate need of suggestions. Overwhelming amounts of news threaten their moods and monkey about with their valuable time. Hurry, they need your help!
The first step on your journey is signing up to CLEF NewsREEL. Please, show your determination to rescue the visitors by following this two step procedure:
Sign-Up Procedure
The registration process for task 1 involves two steps:
- Visit the official CLEF registration website, complete the registration from, and make sure to check the NewsREEL box.
- Visit the Open Recommendation Platform (ORP) and create an account for your team.
Welcome to the Open Recommendation Platform (ORP)! Our allies at plista created this passage for your to interact with visitors struggling with an ocean of news. You must use the passage wisely to fullfil your tasks.
First, you have to assemble your equipment to serve your duty as an advisor. There are two major tools you need to collect:
- server to communicate with ORP (hardware)
- recommendation algorithm to provide recommendations (software)
Choose your server carefully! The passage to the visitors is long. If you do not manage to arrive in time, they will be lost! Typically, they expect to receive your suggestions in 200ms.
Hardware Requirements
Consequently, we assume that your system provides at least the following configuration:
- 1 GB RAM
- 2 cores
- 25 GB hard drive
Note that you may interact with ORP using a minor configuration. As long as your system responds in a timely fashion to ORP’s requests, you are fine. According to our experiences, the configuration above represents a reasonable choice.
Network latency is a troublesome companion. If your journey starts far from ORP your message may lag behind. In such circumstances, we will support you. Just let us know, and we will grant you access to a virtual machine close to ORP.
Software Requirements
You second tool is your recommendation algorithm. We view a system in three distinct components:
- message handling
- control
- recommending
Each of them is dedicated to a specific task. Message handling receive messages, forwards them to control, and finally returns the recommendations to ORP. Control extracts the type of messages. Subsequently, it either stores the obtained information or requests recommendations in case ORP asked for them. Recommending is at the system’s centre. The recommendation engine returns a list of item IDs based on the request. You may implement each on your own. In case, you would like some guidance, we suggest, to have a look at:
It provides all three components. We assume that you have your server accessible. In addition, we suppose you have maven and git installed.
There are three steps to start your system:
Clone the repository
git clone https://github.com/plista/orp-sdk-java.git
You should obtain the following structure:
Change to the orp-sdk-java
subdirectory
cd orp-sdk-java
Start the recommendation server
mvn clean install exec:java -Dexec.mainClass="de.dailab.plistacontest.client.Client" &
Please note the ampersand at the end to assure that the session has been ended.
The implemented recommendation algorithm represents a baseline. It has been found to be a quite challenging baseline. It reflects two significant factors: popularity and recency.
In order to implement your own ideas you ought to provide two basic functionalities:
- updating data upon incoming notifications
- return lists of recommended items
Of course, you may implement your own solution in the language of your choice. We seek to support you by listing the types of messages your system will encounter. You will find a detailed description of the various attributes in the documentation. We focus on essential parts. Note that all data is encoded as JSON objects.
Impression: As visitors load news articles, ORP forwards you details about these interactions:
{
"type": "impression",
"context": {
"simple": {
"57": 3302817060,
"29": 17332,
"27": 1677,
"85": 5,
"4": 1148116,
"56": 1138207,
"69": 1851422,
"63": 1840689,
"62": 1918111,
"83": 70,
"77": 150,
"52": 1,
"14": 33331,
"39": 20856,
"16": 48811,
"7": 2862848,
"88": 11,
"91": 2864860,
"81": 2103169,
"24": 1,
"76": 640,
"84": 254822718632194,
"44": 1851485,
"82": 0,
"47": 654013,
"75": 1919903,
"6": 10,
"74": 1919860,
"17": 48985,
"40": 1618692,
"35": 315003,
"22": 70209,
"31": 0,
"5": 317849,
"19": 153353,
"68": 1851453,
"67": 1928642,
"13": 2,
"9": 26885,
"23": 11,
"49": 22,
"80": 429,
"37": 2969052,
"59": 1275566,
"42": 0,
"25": 0,
"18": 5
},
"lists": {
"8": [
18841,
18842,
48511
],
"10": [
1763,
1765,
1768,
1769,
1770,
1774
]
},
"clusters": {
"1": {
"7": 255
},
"2": [
11,
11,
61,
60,
61,
26,
21
],
"3": [
55,
28,
34,
91,
23,
21
],
"65": {
"1": 255
},
"64": {
"3": 255
},
"66": {
"12": 255
}
}
},
"recs": {
"ints": {
"3": [
210124692,
207352642,
210089421,
209716528,
209451817,
209923717
]
}
},
"timestamp": 1422525811551
}
We will now highlight essential parts of these messages. Hereby, we traverse the tree structure. We denote descending by a .
. For instance, context.simple.57
refers to the path which starts with context
and descends via simple
to 57
. The attribute context.simple.57
refers to a user. ORP tracks users with sessions derived from cookies. You may observe events where context.simple.57: 0
. In these situation, the users disallowed tracking. You cannot distinguish between all users who disallow tracking. Your recommendation algorithm should consider this issue. ORP refers to news articles as context.simple.25
. In case your system encounters events lacking this information or context.simple.25 : 0
the user either landed on a page devoid of news articles or the homepage. Your system can identify this type of events as "type" : "impression"
. ORP lets you access a variety of news portals. Your system can identify the news portal via context.simple.27
. Please, note that you must not recommend items which are not linked to the specific news portal. Your system can identify the time of the event as "timestamp"
. Note that timestamps are based on Central European Time in the nanosecond UNIX format. You system will also observe which items had been recommended as recs.ints.3
which gives a list of item identifiers.
Meaning | Reference |
User | context.simple.57 |
Item | context.simple.25 |
Publisher | context.simple.27 |
recommendation requests: Recommendation Requests ask your system to provide recommendations. A visitor has arrived at a news article. Which article should they read next?
{
"context": {
"simple": {
"57": 58728221,
"29": 17332,
"27": 1677,
"85": 5,
"4": 561984,
"56": 1138207,
"69": 1851422,
"63": 1840689,
"62": 1918111,
"83": 70,
"77": 150,
"52": 1,
"14": 33331,
"39": 20856,
"16": 48811,
"7": 2878826,
"88": 11,
"91": 2864860,
"81": 2103169,
"24": 1,
"76": 640,
"84": 171646461649582,
"44": 1851485,
"82": 0,
"47": 654013,
"75": 1919859,
"6": 952253,
"74": 1919860,
"17": 48985,
"40": 1618617,
"35": 315003,
"22": 62900,
"31": 0,
"5": 37,
"19": 153353,
"68": 1851453,
"67": 1928642,
"13": 2,
"9": 26885,
"23": 11,
"49": 22,
"80": 193,
"37": 2968945,
"59": 1275566,
"42": 0,
"41": 312
},
"lists": {
"8": [
18841,
18842,
48511
],
"10": [
1763,
1765,
1768,
1769,
1770,
1774
]
},
"clusters": {
"1": {
"7": 255
},
"2": [
11,
11,
61,
60,
61,
26,
21
],
"3": [
55,
28,
34,
91,
23,
21
],
"65": {
"1": 255
},
"64": {
"3": 255
},
"66": {
"12": 255
}
}
},
"limit": 6,
"vectors": [
3
]
}
You observe that recommendation requests conform the structure of impressions. In addition, they carry information on how many recommendations your system ought to provide "limit"
.
Your response should look like:
{
"recs":{
"ints": {
"3" : [X, Y, Z]
}
}
}
Note that X, Y, Z
should be a list of item references of the size specified in "limit"
.
click: As visitors click on recommendations, your system will receive notifications. ORP forwards this information to all connected recommendation servers.
{
"type": "click",
"context": {
"simple": {
"4": 1148116,
"5": 6,
"6": 10,
"7": 2878813,
"9": 26885,
"13": 2,
"14": 33331,
"16": 48811,
"17": 48985,
"18": 2,
"19": 153353,
"20": 300,
"22": 65109,
"23": 10,
"24": 0,
"25": 107149704,
"27": 1677,
"29": 17332,
"35": 315003,
"37": 2969052,
"39": 20856,
"40": 1788349,
"42": 12,
"44": 1851485,
"47": 654013,
"49": 22,
"52": 1,
"56": 1138207,
"57": 36416653622063325,
"59": 1275566,
"62": 1918111,
"63": 1840689,
"67": 1928668,
"68": 1851453,
"69": 1851422,
"74": 1919860,
"75": 1919903,
"76": 640,
"77": 150,
"80": 451,
"81": 2101340,
"83": 70,
"84": 245797697923377,
"85": 5,
"88": 10,
"91": 2864860
},
"lists": {
"8": [
18841,
18842,
48511
],
"11": [
2882180
]
},
"clusters": {
"1": {
"7": 255
},
"2": [
11,
11,
61,
60,
61,
26,
21
],
"3": [
55,
28,
34,
91,
23,
21
],
"51": {
"19": 255
},
"64": {
"3": 255
},
"65": {
"1": 255
},
"66": {
"12": 255
}
}
},
"recs": {
"ints": {
"3": [
210119905
]
}
},
"timestamp": 1422525811943
}
Clicks conform to impressions in their structure. In addition, the messages are marked as "type" : "click
. Note, that recs.ints.3
contains a list of a single item reference. Exactly this item has been clicked.
error: The communication with ORP requires valid responses in a short time. Thus, you may occasionally receive error messages as these requirements are not met.
{
"errcode" : 408,
"timestamp" : 1372176488290
}
Error codes include:
408
: connection timeout440
: no data returned441
: unreadable response442
: invalid format
Note that there may be additional types of errors. For instance, as you return item references which are no longer valid.
item update: Publishers continue to add new contents to their portals. As they do, ORP will inform you about articles which have been added.
{
"domainid": 418,
"created_at": "2015-01-29T09:35:28+0100",
"img": "URLtoIMG.{jpg,png,...}",
"url": "URLtoArticle.{html,php,...}",
"text": "snippet of the first up to 256 characters",
"kind": "article",
"expires_at": "2015-07-28 10:44:16",
"version": 3,
"updated_at": "2015-01-29T11:35:38+0100",
"categoryid": "2228673",
"path": "",
"flag": 0,
"id": 210126662,
"title": "Article Title"
}
Item updates provide you the necessary data to construct content-based recommenders. Note that "domainid"
corresponds to context.simple.27
in impression, click, and recommendation request objects. Similarly, "id"
corresponds to context.simple.25
.
Activating Communication
Once you have configured your system and your recommendation engine is running, it is time to switch on communications with ORP. We will now guide you through the required steps. We assume you have signed in to ORP which should like this: First, you need to tell ORP where to direct messages to. In other words, ORP has to know where your recommendation server is listening. ORP expects to receive a name and path. Note that you may have several algorithms running on different paths. Thus, you should name your algorithms. Please have in mind, that on the day of presentation this name will be put on the slide illustrating results. Second, you need to provide an address to forward the traffic to: <server>:<port>
, for instance, http://myserver.org:8080/
. Confirm your setting by clicking Save
. Your dashboard should afterwards look like this: Note that the name of your algorithm appears on top of the result view and down in the algorithm test section. If you want to test your connection to ORP, you may fire individual messages of varying types. You ought to select which type of message you want to test in the selection box labled Trigger-Type:
. Clicking Fire
ORP will send your server a message of the selected type. If you want to alter your configurations, you may click on the blue pen symbol marked with an orange exclamation mark. Finally, you should pay attention to the coloured bar left of the result visualisation. Currently it appears orange. As long as your server is running as intended, the bar should appear green. As you may fail to reply sufficiently quickly, ORP may turn off the traffic toward your recommendation engine. Do not worry too much about that. It is merely a temporary off time. You can turn on the traffic as you activate it for the first time. Click on the blue pen to enter the configuration section. Click on ON
and save the configuration. ORP will start sending messages to your system. Your dashboard should look like this: The status bar indicates your system is running as it appears green. If you want to see details about your performance, you may switch to the statistics
section. There, you will find a row for each day summarising your achievements. You can observe the number of request you have received and the total number as well as proportion of the recommendation that had been clicked. In addition, there is a visualisation of these figures on top of the dashboard. We encourage your to play with it! Finally, there is the Leaderboard
section. This section allows you to compare your performance with other users of ORP. Can you manage to beat them? You will find a link to the documentation on your dashboard. The documentation shall help you understanding the data send by ORP. In case you do have questions, do not hesitate to get in touch with us!