Blocking Techniques for Web-Scale Entity Resolution

Printer-friendly versionSend by email

Entity Resolution constitutes one of the cornerstone tasks for the integration of overlapping information sources. Due to its quadratic complexity, a bulk of research has focused on improving its efficiency so that it can be applied to Web Data collections, which are inherently voluminous and highly heterogeneous. The most common approach for this purpose is blocking, which clusters similar entities into blocks so that the pair-wise comparisons are restricted to the entities contained within each block. In this talk, we elaborate on blocking techniques, starting from the early, schema-based ones that were crafted for database integration. We highlight the challenges posed by today's heterogeneous, noisy, voluminous Web Data and explain why they render inapplicable the early blocking methods. We continue with the presentation of the latest blocking methods that are crafted for Web-scale data. We also explain how their efficiency can be improved by meta-blocking techniques. We also present a publicly available framework that implements most of the state-of-the-art techniques and provides established benchmarks for evaluating them experimentally.

Wed, 03/12/2014 - 14:00 - 16:00
Main Lecture Room (IIT)
IIT, NCSR "Demokritos"

© 2018 - Institute of Informatics and Telecommunications | National Centre for Scientific Research "Demokritos"