Structuring the Blogosphere on News from Traditional Media

Printer-friendly versionSend by email
Conference Proceedings (fully refereed)
Georgios Petasis
News and social media are emerging as a dominant source of information for numerous applications. However, their vast unstructured content present challenges to efficient extraction of such information. In this paper, we present the SYNC3 system that aims to intelligently structure content from both traditional news media and the blogosphere. To achieve this goal, SYNC3 incorporates innovative algorithms that first model news media content statistically, based on fine clustering of articles into so-called “news events”. Such models are then adapted and applied to the blogosphere domain, allowing its content to map to the traditional news domain. In this paper an unsupervised approach to do-main adaptation is presented, which exploits external knowledge sources in order to port a classification model into a new thematic domain. Our approach extracts a new feature set from documents of the target domain, and tries to align the new features to the original ones, by exploiting text relatedness from external knowledge sources, such as WordNet. The approach has been evaluated on the task of document classification, involving the classification of newsgroup postings into 20 news groups.
Software and Knowledge Engineering Laboratory (SKEL)
Conference Short Name: 
OTM 2013
Conference Full Name: 
On the Move to Meaningful Internet Systems: OTM 2013 Workshops - Confederated International Workshops: OTM Academy, OTM Industry Case Studies Program, ACM, EI2N, ISDE, META4eS, ORM, SeDeS, SINCOM, SMS, and SOMOCO 2013
Conference Country: 
Conference City: 
Conference Date(s): 
Mon, 09/09/2013 - Fri, 13/09/2013
Conference Level: 
Springer Berlin Heidelberg
Page Start: 
Page End: 
ISBN Code: 

© 2019 - Institute of Informatics and Telecommunications | National Centre for Scientific Research "Demokritos"

Terms of Service and Privacy Policy