Content Representation and Similarity of Movies based on Topic Extraction from Subtitles

Printer-friendly versionSend by email
Conference Proceedings (fully refereed)
Bougiatiotis, K. & Giannakopoulos, T.
In this paper we examine the existence of correlation between movie content similarity and low level textual features from respective subtitles. In addition, we demonstrate the extraction of topical representation of movies based on subtitles mining. Using natural language processing and a topic modeling algorithm, namely Latent Dirichlet Allocation, applied on the movie subtitles, we extract the latent topic structure of a set of movies. In order to demonstrate the proposed content representation approach, we have built a dataset of 160 widely known movies, represented by their corresponding subtitles. After evaluating the resulting topics' quality and coherence, we move on to assert movie similarities, exploiting their distances in the topic populated space. Finally, using those topic-space projections of the movies, we aspire to create a topic model browser for movies, allowing us to explore the different aspects of similarities between movies and discover latent knowledge regarding the movies through the association of low-level topic links and high level movie similarities.
Software and Knowledge Engineering Laboratory (SKEL)
Conference Short Name: 
Conference Full Name: 
9th Hellenic Conference on Artificial Intelligence
Conference Country: 
Conference City: 
Conference Date(s): 
Wed, 18/05/2016 - Fri, 20/05/2016
Conference Level: 

© 2019 - Institute of Informatics and Telecommunications | National Centre for Scientific Research "Demokritos"

Terms of Service and Privacy Policy