HKUST Library Institutional Repository Banner

HKUST Institutional Repository >
Computer Science and Engineering >
CSE Master Theses  >

Please use this identifier to cite or link to this item:
Title: Learned text categorization by backpropagation neural network
Authors: Lam, Dominic Savio Lai Yin
Issue Date: 1996
Abstract: Text categorization is the classification of unstructured text documents with respect to a set of one or more pre-defined categories. This task is often performed in automatic text indexing systems to assign subject categories to text documents. The benefit of text categorization is that once the documents are categorized, users can limit the scope of search by concentrating on a few categories relevant to their information needs. In this thesis, we propose a text categorization model using an artificial neural network trained by the Backpropagation learning algorithm as the text classifier. Due to the high dimensionality of the feature space typical for textual data, scalability is poor if the neural network is trained using this high dimensional raw data. In order to improve the scalability of the proposed model, we proposed and compared four dimensionality reduction techniques to reduce the feature space into an input space of much lower dimension for the neural network classifier. The first three of these techniques are domain dependent term selection methods, namely the DF method, the CF-DF method and the TFxIDF method. The fourth technique is a domain independent feature extraction method based on a statistical multivariate data analysis technique called Principal Component Analysk. To test the effectiveness of the proposed model, experiments were conducted using a subset of the Reuters-22173 test collection for text categorization. The results showed that the proposed model was able to achieve high categorization effectiveness as measured by precision and recall. Among the four dimensionality reduction techniques proposed, Principal Component Analysis was found to be the most effective in reducing the dimensionality of the feature space.
Description: Thesis (M.Phil.)--Hong Kong University of Science and Technology, 1996
xiii, 102 leaves : ill. ; 30 cm
HKUST Call Number: Thesis COMP 1996 LamLY
Appears in Collections:CSE Master Theses

Files in This Item:

File Description SizeFormat

All items in this Repository are protected by copyright, with all rights reserved.