January 24, 2020

Talk: Fast and Scalable Link Discovery for Modern Data-Driven applications | K. Georgala

  • January 29, 2020 till January 29, 2020
  • IIT, Lecture Hall

Visiting speaker Klairi Georgala, University of Paderborn will give a talk on “Fast and Scalable Link Discovery for Modern Data-Driven applications” at the Lecture Hall of the Institute of Informatics and Telecommunications, of NCSR Demokritos on Wednesday 29/1, at 13:00.

Abstract: Over the last years, the Linked Data Web has grown to contain billions of triples distributed over hundreds of Knowledge Bases (KBs). The English version of the DBpedia Knowledge Base currently describes 4.58 million things, including 1.4 million persons, 735K places, 411K creative works, 241K organizations and 251K species. Datasets such as the LSQ and LinkedGeoData consist of more than 1.3 billion triples resp. With the uptake of semantic technologies, the large collections of data is available in Resource Description Framework (RDF) format. Since that the computation of links is the fourth principle of Linked Data, a large number of frameworks have been developed to facilitate the computation of links between Knowledge Bases.
A plethora of approaches have been developed for this purpose and contain algorithms ranging from genetic programming to probabilistic models. In addition to addressing the need for accurate links, Link Discovery frameworks need to address the challenge of time efficiency. This challenge comes about because of the mere size of Knowledge Bases that need to be linked. Under the declarative representation paradigm, most Link Discovery frameworks rely on atomic or complex Link Specifications to determine candidates for links, in which the scalability of execution is of significant importance. In this thesis, we focus on the challenge of time efficiency and we propose a set of approaches towards fast and scalable Link Discovery. We divide our set of approaches into sub-sets: (1) approaches for optimizing the efficiency of atomic similarities for Link Discovery and (2) approaches towards the fast execution of complex similarities and Link Specifications. Regarding the first set of approaches, we are motivated by the lack of a latest systematic study regarding the application of String Similarity Joins in Link Discovery, the absent of fast approaches for linking event data and the issues related to the scalability performance of semantic string similarities in current systems. Regarding the second set of approaches, our work is enabled by time efficient Link Discovery approaches that operate under time and space constraints and the absence of planning approaches that exploit global knowledge about the execution of Link Specifications, that could potentially fasten the execution of links.

Speaker Bio: Kleanthi Georgala is Data Scientist and Researcher with 5 years of experience in Machine Learning, Semantic Web, Link Discovery, Record Linkage, Text Mining and Classification.
Currently finishing her PhD thesis, that will be submitted during the next couple of months. The main focus of her thesis is scalable and time efficient Link Discovery with applications to Machine Learning, Predictive Maintenance, Question Answering and Complex Event Processing. Prior to her PhD, she has worked as a Scientific Programmer at the Leiden University, focusing on Record Linkage between historic documents. Her work in Leiden University involved Machine Learning, Pattern Recognition and Entity Extraction.
During her academic career, she has worked in multiple projects such HOBBIT (EU-funded) and she has published many papers in international peer-reviewed conferences around the globe. She has also collaborated with NCSR Demokritos for the completion of her bachelor thesis. Kleanthi Georgalas’s website