We present our recent work on language-agnostic algorithms for the construction of distributed semantic models (DSMs) using web-harvested corpora. A corpus is created from web document snippets and the relevant semantic similarity statistics are encoded in a semantic network. We propose the notion of semantic neighborhoods that are defined using co-occurrence or context similarity features. Three neighborhood-based similarity metrics are proposed, motivated by the hypotheses of attributional and maximum sense similarity. The lexical networks and semantic distances are motivated by cognitive considerations (associative networks and lexical priming).
The proposed metrics are evaluated against human similarity ratings achieving state-of-the-art results. Next the proposed DSM approach is applied to affective modeling of text. Continuous valence ratings are estimated for unseen words using the underlying assumption that semantic similarity implies affective similarity. Evaluation on affective text tasks (e.g., polarity recognition) show significant performance improvement compared to the state-of-the art. Another application of DSMs is then presented: grammar induction for spoken dialogue systems.
This work is part of our ongoing research in two European projects: PortDial (www.portdial.eu) and SpeDial (www.spedial.eu). The PortDial project aims to enable the rapid porting of SDS to new domains and languages. This is realized via the design of machine-aided methods for the creation and publishing of multilingual grammars. The SpeDial project deals with spoken dialogue analytics with three core goals: identification of hot-spots in the dialogue, selection of prompts and update of statistical grammars for call center applications, identification of user populations and adaptation of speech services to their specific needs. We conclude with the overview of the grammar induction shared evaluation task part of SemEval. The goal is to foster the application of computational models of lexical semantics to the field of SDS specifically for the problem of grammar induction.