Iterative query selection for opaque search engines with pseudo relevance feedback

Maor Reuben, Aviad Elyashar, Rami Puzis

Expert Systems with Applications 201, 117027, 2022

Retrieving information from an online search engine is the first and most important step in many data mining tasks, such as fake news detection. Most of the search engines currently available on the web, including all social media platforms, are black-boxes (i.e., opaque) supporting short keyword queries. In these settings, it is challenging to retrieve all posts and comments discussing a particular news item automatically and on a large scale.In this paper, we propose a method for generating short keyword queries given a prototype document. The proposed iterative query selection (IQS) algorithm interacts with the opaque search engine to iteratively improve the query, by maximizing the number of relevant results retrieved. Our evaluation of IQS was performed on the Twitter TREC Microblog 2012 and TREC-COVID 2019 datasets and demonstrated the algorithm’s superior performance compared to state-of-the-art. In …