Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/7318

Visual summarization and exploration of text streams

Authors Cui, Weiwei
Issue Date 2011
Summary We are in the midst of a data explosion. Data in text format such as digitalized textural data and data from new social media like blogs and Twitter have been generated at an unprecedented rate. For example, Google Books has scanned and digitalized 15 million books, greatly increasing the accessibility of information all around the world. Twitter publishes more than 300 new messages every second, and the numbers keep increasing. However, exploring and analyzing this enormous amount of data become increasingly difficult. Information visualization can help analyze huge and complex data by turning them into visual representations to exploit the tremendous pattern-recognition capability of the human visual system. In this thesis, we propose three advanced text visualization techniques for summarizing and exploring various relation patterns existing in large time-varying text document collections. This thesis is composed of three main parts, each of which addresses an important problem in text visualization. In the first part, we present an enhanced word cloud layout that keeps the semantic relations between the displayed words in a sequence of word clouds generated over time for dynamic document data. In the second part, TextWheel is introduced to visualize complex micro-macro relations within news streams. In the last part, we deal with the splitting/merging patterns between topics that are extracted from text streams. We proposed TextFlow, which is inspired by river flows, to show various topic evolution patterns at different granularities. The effectiveness of these methods has been demonstrated through extensive experiments using both synthetic data and data from real applications.
Note Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2011
Subjects
Language English
Format Thesis
Access
Files in this item:
File Description Size Format
th_redirect.html 339 B HTML