R 数据可视化: PCA 主成分分析图

简介

PCA形象解释说明

`PCA的设计理念与此类似，它可以将高维数据集映射到低维空间的同时，尽可能的保留更多变量。`

开始作图

1. PCA 分析图本质上是散点图

``````library(ggplot2)

# 数据准备
data = subset(iris, select = -Species)
class = iris[["Species"]]

# PCA
pca = prcomp(data, center = T, scale. = T)
pca.data = data.frame(pca\$x)
pca.variance = pca\$sdev^2 / sum(pca\$sdev^2)

ggplot(pca.data, aes(x = PC1, y = PC2)) +
geom_point(size = 3) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
stat_ellipse(aes(x = PC1, y = PC2), linetype = 2, size = 0.5, level = 0.95) +
theme_bw()
``````

2. 为不同类别着色

``````library(ggplot2)

# 数据准备
data = subset(iris, select = -Species)
class = iris[["Species"]]

# PCA
pca = prcomp(data, center = T, scale. = T)
pca.data = data.frame(pca\$x)
pca.variance = pca\$sdev^2 / sum(pca\$sdev^2)

ggplot(pca.data, aes(x = PC1, y = PC2, color = class)) +
geom_point(size = 3) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
stat_ellipse(aes(x = PC1, y = PC2), linetype = 2, size = 0.5, level = 0.95) +
theme_bw()
``````

`inherit.aes`
default TRUE, If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn’t inherit behaviour from the default plot specification,

``````library(ggplot2)

# 数据准备
data = subset(iris, select = -Species)
class = iris[["Species"]]

# PCA
pca = prcomp(data, center = T, scale. = T)
pca.data = data.frame(pca\$x)
pca.variance = pca\$sdev^2 / sum(pca\$sdev^2)

ggplot(pca.data, aes(x = PC1, y = PC2, color = class)) +
geom_point(size = 3) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
stat_ellipse(aes(x = PC1, y = PC2), linetype = 2, size = 0.5, level = 0.95, inherit.aes = FALSE) +
theme_bw()
``````

3. 样式微调

``````library(ggplot2)

# 数据准备
data = subset(iris, select = -Species)
class = iris[["Species"]]

# PCA
pca = prcomp(data, center = T, scale. = T)
pca.data = data.frame(pca\$x)
pca.variance = pca\$sdev^2 / sum(pca\$sdev^2)

# 自定义颜色
palette = c("mediumseagreen", "darkorange", "royalblue")

ggplot(pca.data, aes(x = PC1, y = PC2, color = class)) +
geom_point(size = 3) +
geom_hline(yintercept = 0) +
geom_vline(xintercept = 0) +
stat_ellipse(aes(x = PC1, y = PC2), linetype = 2, size = 0.5, level = 0.95, inherit.aes = FALSE) +
theme_bw() +
scale_color_manual(values = palette) +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()) +
labs(x = paste0("PC1: ", signif(pca.variance[1] * 100, 3), "%"),
y = paste0("PC2: ", signif(pca.variance[2] * 100, 3), "%"),
title = paste0("PCA of iris")) +
theme(plot.title = element_text(hjust = 0.5))
``````

参考

[1] Master Machine Learning With scikit-learn

相关文章

原文作者：watermark's
原文地址: https://www.cnblogs.com/myownswordsman/p/r-ggplot-pca.html
本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。