Iterative Keyword Optimization

Aviad Elyashar, Maor Reuben, Rami Puzis

Short keyword queries are one of the main tool of any user or bot seeking information through the ubiquitous search engines. Automated keyword optimization relies primarily on the analysis of data repositories in order to find a small set of keywords that identify the topic discussed and relevant documents. However, most search engines, available today on the Web are opaque, providing little to no information about their methods and the searched repository.In this paper, we propose an automated iterative optimization of short keyword queries in order to improve information retrieval from opaque (black box) search engines. The use case considered involves the retrieval of relevant posts from online social media for a given a news article (claim) discussed online. The proposed algorithm iteratively selects keywords while querying the search engine and comparing a small set of retrieved posts to the given news article using a mean relevance error based on word embedding. We demonstrate the proposed algorithm while building a Fake News dataset from claims (collected from fact-checking websites) and their associated tweets. The proposed mean relevance error was found to be accurate for differentiating between relevant and irrelevant posts (0.9 AUC). The optimized queries produce similar results to manually extracted keywords and outperform TF-IDF based methods and POS tagging.