||In recent years, kernel methods have become popular and powerful tools in field of machine learning, with superior performance on many practical applications. In this thesis, I study kernel methods in both supervised and unsupervised learning. First, in using the ∈-support vector regression (∈-SVR) algorithm, one has to decide a suitable value for the insensitivity parameter ∈. Smola et al. considered its "optimal" choice by studying the statistical efficiency in a location parameter estimation problem. While they successfully predicted a linear scaling between the optimal ∈ and the noise in the data, their theoretically optimal value does not have a close match with its experimentally observed counterpart in the case of Gaussian noise. In this thesis, I attempt to better explain their experimental results by studying the regression problem itself. This resultant predicted choice of ∈ is much closer to the experimentally observed optimal value, while again demonstrating a linear trend with the input noise. In the second part of this thesis, I address the problem of finding the preimage of a feature vector in the feature space induced by a kernel. This is of central importance in some kernel applications, such as on using kernel principal component analysis (PCA) for image denoising. Unlike the traditional method in  which relies on nonlinear optimization, this proposed method directly finds the location of the pre-image based on distance constraints in the feature space. It is non-iterative, involves only linear algebra and does not suffer from numerical instability or local minimum problems. Evaluations on performing kernel PCA and kernel clustering show much improved performance.