Please use this identifier to cite or link to this item:

Use of bias term in projection pursuit learning improves approximation and convergence properties

Authors Kwok, Tin Yau View this author's profile
Yeung, Dit Yan View this author's profile
Issue Date 1996
Source IEEE transactions on neural networks , v. 7, (5), 1996, SEP, p. 1168-1183
Summary In a regression problem, one is given a d-dimensional random vector X, the components of which are called predictor variables, and a random variable, Y, called response. A regression surface describes a general relationship between variables X and Y. One nonparametric regression technique that has been successfully applied to high-dimensional data is projection pursuit regression (PPR), In this method, the regression surface is approximated by a slim of empirically determined univariate functions of linear combinations of the predictors. Projection pursuit learning (PPL) proposed by Hwang et al. formulates PPR using a two-layer feedforward neural network. One of the main differences between PPR and PPL is that the smoothers in PPR are nonparametric, whereas those in PPL are based on Hermite functions of some predefined highest order R. While the convergence property of PPR is already known, that for PPL has not been thoroughly studied, In this paper, we demonstrate that PPL networks in the original form proposed by Hwang er al. do not have the universal approximation property for any finite R, and thus cannot converge to the desired function even with an arbitrarily large number of hidden units, But, by including a bias term in each linear projection of the predictor variables, PPL networks can regain these capabilities, independent of the exact choice of R. Experimentally, it is shown in this paper that this modification increases the rate of convergence with respect to the number of hidden units, improves the generalization performance, and makes it less sensitive to the setting of R, Finally, we apply PPL to chaotic time series prediction, and obtain superior results compared with the cascade-correlation architecture.
ISSN 1045-9227
Rights © 1996 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Language English
Format Article
Access View full-text via Web of Science
View full-text via Scopus
Files in this item:
File Description Size Format
tnn96.pdf 663341 B Adobe PDF