The world is accumulating exponentially increasing amounts of electronic text. Although search of large document corpora and the Web using keywords is well established and quite successful, it is only a preliminary step towards gaining useful insights from the available text. On the other hand, text mining and natural language processing techniques, although rapidly advancing in recent years, continue to have a non-negligible error rate. The output of text mining systems is generally not available in a form that is easy for the human user of the information to digest. At the same time, with fully automatic text mining methods, the knowledge and pattern recognition abilities of the human user are only applied to understanding a static final visual product, not to improving the results interactively. Once the objective of text mining research becomes obtaining insight, as opposed to improving performance on standard benchmark data sets on a limited set of standard problems, presentation of intermediate results to the user and user interaction to guide the text mining algorithms to useful results become prominent. This requires expertise in information visualization, human computer interaction and high-performance computing (to speed up text mining sufficiently to permit interaction). We present case studies of research motivated by this point of view: interactive clustering, interactive visualization of text classification performance, interactive visual exploration of semi-structured text, and improving domain specific sentiment lexicons with minimal user interaction.
Faculty of Computer Science, Dalhousie University, Halifax, NS, Canadahttp://web.cs.dal.ca/~eem/