HKUST Library Institutional Repository Banner

HKUST Institutional Repository >
Information Systems, Business Statistics and Operations Management  >
ISOM Doctoral Theses >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/7191
Title: Classification of complex disease incorporating interaction effects and identifying causal SNPs in the context of rare variants
Authors: Wang, Haitian
Issue Date: 2011
Abstract: In the past decade, the genome-wide association studies (GWAS) have sequenced over 40 complex diseases using microarray technology and their genetic associations have been intensively studied. However, though some of the diseases are clearly inherited, only 5-10% of the disease variation can be explained through these studies. It is now widely believed that the missing inheritability might be due to either the failure to incorporate interaction effects, or the ignorance of the rare variants effects in the genome. In this dissertation, the genetic association is studied from the above-mentioned two aspects. In Essay I (Chapters 1-4), a classification algorithm, incorporating the interactions among variables, is proposed. The algorithm is assessed on several gene-expression datasets and results in much lower error rates than those reported in literature. In Essay II (Chapters 5-8), a new approach, using established statistical methods, is applied to identify causal SNPs in the context of rare variants. The resulting false-discovery rate is the lowest among the methods that use the same dataset. High-dimensionality is one of the most challenging problems in the analysis of genetic data. In both studies, a search-by-layer framework is used to overcome the high-dimensionality problem. In the first layer, high quality markers are selected using certain statistics, and the number of variables is thereby reduced from thousands to around a hundred for both of the studies. The second layer in the framework is project specific. In the first essay, subsets with high-order interactions are formed, and in the second project, the false-positive markers are eliminated. Both of these exercises lead to the identification of a small pool of influential variables. In the first study, the identified variables are used to form a classification rule.
Description: Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2011
x, 69 p. : ill. ; 30 cm
HKUST Call Number: Thesis ISOM 2011 Wang
URI: http://hdl.handle.net/1783.1/7191
Appears in Collections:ISOM Doctoral Theses

Files in This Item:

File Description SizeFormat
th_redirect.html0KbHTMLView/Open

All items in this Repository are protected by copyright, with all rights reserved.