Performance evaluation in Information Retrieval (IR) has a long tradition that has greatly boosted the development of information access systems (IASs). Over the past few years, however, these well established IR evaluation methodologies were criticized over a number of important failings: failures in addressing users, search interfaces, the scale of testing environments, the diversity of IASs, and the diversity of search tasks and search processes.
In research related to IASs, such as search engines, evaluation is often related to the question of whether the eventual day-to-day users of a system will be more successful at (or simply prefer) using one IAS over another. It is impractical for scientists to continually ask such users to help in evaluation; therefore, means of simulating these users were devised. The primary approach to such simulation was to create a benchmark, known as a test collection to assess the effectiveness of an IAS. The collections are most commonly used to compare the number of relevant documents retrieved by competing systems. Evaluating competing systems on test collections proved to be a powerful way to improve the state of the art. However, this decades old design remains an extremely crude simulation of user behavior.
The main envisaged achievement of ELIAS is the establishment of a new evaluation paradigm for information access. Specifically, we take this to mean the following:
- A new test collection/living lab methodology, comparable to the Cranfield methodology, but now directed at user-oriented interactive evaluation of IR systems in the large.
- A new and tested set of interactive evaluation metrics that measure the costs (efforts) and benefits of information access system users.
- Infrastructure and test suites based on the above
- An ongoing community-based forum with tracks and annual evaluation tasks/rounds with a focus on evaluating information access systems