Towards Data Science | Medium

Using PCA for Outlier Detection

PCA (principle component analysis) is a data science technique commonly used for dimensionality reduction and visualization, but it is also useful for outlier detection. It transforms data into a new coordinate system, where the dimensions are known as components, and often separates outliers well within these components. The method can be used to identify outliers by transforming the data using PCA and then applying simple tests on each component to score each row, or by looking at the reconstruction error. The technique assumes correlations between features and works by creating a covariance matrix that represents the general shape of the data, which is then used to transform the space. PyOD provides three classes based on PCA for outlier detection: PyODKernelPCA, PCA, and KPCA. These classes can be used to perform PCA transformations and detect outliers in the transformed data.
favicon
towardsdatascience.com
towardsdatascience.com
Create attached notes ...