The BioHIT group of SKEL | The AI Lab, organises an online presentation on Monday 12 April at 12.00 with invited speaker Andreas Kontogiannis, entitled Focused Crawling Ethnopharmacological References with Active and Reinforcement Learning.
Ethnopharmacology is the scientific study of ethnic groups and their use of herbal medicines. It – being a particular field of traditional medicine – is now widely considered as a promising alternative medicine for complementary treatment of the well-known western world. However, the search of indigenous knowledge on the use of specific plant properties by the experts themselves is a very challenging task, taking into account the volume of information shared through literature. Scientific research requires anyone to be able to efficiently search for relevant documents related to their subjects. These kinds of challenges can be faced as Internet focused search problems. To support experts, we propose the use of intelligent focused search systems, known as focused crawlers. Typically, such a system receives a few initial seed URLs and optionally some keywords as input, all of which are relevant to a predefined search topic. The goal of a focused crawler is to discover and output as many relevant webpages as possible. In the present work, we develop intelligent focused crawler systems, so that they become supportive tools for ethnopharmacological research. We propose a two-stage Machine Learning focused crawler that follows a Researcher-Apprentice paradigm. In the first stage, we recommend the use of Active Learning (AL); the system is trained to identify the relevant documents by receiving feedback from the researcher. In the second stage, we propose the use of Reinforcement Learning (RL), regarding the focused crawler as an intelligent agent. The agent estimates how profitable would be to follow the available URLs, in the long term, and selects the most promising ones. In the RL framework, we model the focused crawler environment as a Markov Decision Process (MDP), considering shared representations between the states and the actions of the agent. We evaluate two different search problems; one general, based on initial seed documents and one more specific, based on initial seed documents along with keywords. We compare 6 different AL models, 3 different state-action shared representations and 2 RL agents; the Deep Q-Network (DQN) and the Double DQN (DDQN). The two-stage focused crawler with the use of (D)DQN agent is more effective than baseline methods, such as random crawling and a greedy deterministic focused crawler we defined. Finally, comparing our method on the more specific setting to an estimated real-time researcher performance, we outperform 5.14 times the efficiency and 3.31 times the effectiveness of the expert.
Speaker Bio: Andreas Kontogiannis holds a MEng in Electrical and Computer Engineering (ECE) from National Technical University of Athens (NTUA). His research interests include Reinforcement Learning, Information Retrieval, Recommender Systems and Data Mining.
The recording is available to view here.