Please use this identifier to cite or link to this item: http://hdl.handle.net/1783.1/7191

Classification of complex disease incorporating interaction effects and identifying causal SNPs in the context of rare variants

Authors Wang, Haitian
Issue Date 2011
Summary In the past decade, the genome-wide association studies (GWAS) have sequenced over 40 complex diseases using microarray technology and their genetic associations have been intensively studied. However, though some of the diseases are clearly inherited, only 5-10% of the disease variation can be explained through these studies. It is now widely believed that the missing inheritability might be due to either the failure to incorporate interaction effects, or the ignorance of the rare variants effects in the genome. In this dissertation, the genetic association is studied from the above-mentioned two aspects. In Essay I (Chapters 1-4), a classification algorithm, incorporating the interactions among variables, is proposed. The algorithm is assessed on several gene-expression datasets and results in much lower error rates than those reported in literature. In Essay II (Chapters 5-8), a new approach, using established statistical methods, is applied to identify causal SNPs in the context of rare variants. The resulting false-discovery rate is the lowest among the methods that use the same dataset. High-dimensionality is one of the most challenging problems in the analysis of genetic data. In both studies, a search-by-layer framework is used to overcome the high-dimensionality problem. In the first layer, high quality markers are selected using certain statistics, and the number of variables is thereby reduced from thousands to around a hundred for both of the studies. The second layer in the framework is project specific. In the first essay, subsets with high-order interactions are formed, and in the second project, the false-positive markers are eliminated. Both of these exercises lead to the identification of a small pool of influential variables. In the first study, the identified variables are used to form a classification rule.
Note Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2011
Subjects
Language English
Format Thesis
Access
Files in this item:
File Description Size Format
th_redirect.html 341 B HTML