# R机器学习：分类算法之判别分析LDA,QDA的原理与实现

Linear discriminant analysis (LDA) is a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

# 维数灾难curse of dimensionality

for the same number of cases in a dataset, if you increase the feature space, the cases get further apart from each other, and there is more empty space between them

# 判别分析的原理

it maximizes the separation between each class centroid and the grand centroid of the data (the centroid of all the data, ignoring class membership)

LDA first finds the axis that best separates the class centroids from the grand centroid that minimizes the variance of each class along it. Then, LDA constructs a second DF that is orthogonal to the first. This simply means the second DF must be perpendicular to the first (at a right angle in this 2D example).

the number of DFs is the smaller of the number of classes minus 1 or the number of predictors

# 曲线判别QDA

LDA assumes that for each class in the dataset, the predictor variables covary with each other the same amount

# 用判别分析做预测

p(k)就是你这整个样本中k类别的概率，叫做先验概率prior probability，p(x)就是你这个整个数据中有预测变量x的个案的比例，或者是观察到有x预测变量的个案的概率，这个叫做证据evidence

# 实例操练

``````<span style="color:#222222"><code>wineTask <- makeClassifTask(<span style="color:#114ba6">data</span> = wineTib, target = <span style="color:#00753b">"Class"</span>)
lda <- makeLearner(<span style="color:#00753b">"classif.lda"</span>)

``````<span style="color:#222222"><code><span style="color:#114ba6">ldaModelData</span> <- getLearnerModel(ldaModel)
ldaPreds <- predict(ldaModelData)<span style="color:#d96322">\$x</span>

``````<span style="color:#222222"><code>wineTib %>%
mutate(LD1 = ldaPreds[, <span style="color:#a82e2e">1</span>],
LD2 = ldaPreds[, <span style="color:#a82e2e">2</span>]) %>%
ggplot(aes(LD1, LD2, col = Class)) +
geom_point() +
stat_ellipse() +
theme_bw()</code></span>``````

``````<span style="color:#222222"><code><span style="color:#114ba6">qda</span> <- makeLearner(<span style="color:#00753b">"classif.qda"</span>)

# 模型的交叉验证

``````<span style="color:#222222"><code>kFold <span style="color:#00753b"><- makeResampleDesc(method = "RepCV", folds = 10, reps = 50,</span>
stratify = <span style="color:#00753b">TRUE)</span>
ldaCV <span style="color:#00753b"><- resample(learner = lda, task = wineTask, resampling = kFold,</span>
measures = <span style="color:#00753b">list(mmce, acc))</span>
qdaCV <span style="color:#00753b"><- resample(learner = qda, task = wineTask, resampling = kFold,</span>
measures = <span style="color:#00753b">list(mmce, acc))</span></code></span>``````

``<span style="color:#222222"><code>calculateConfusionMatrix(ldaCV\$pred, relative = <span style="color:#114ba6">TRUE</span>)</code></span>``

# 用判别分析做预测

``````<span style="color:#222222"><code>newcase <span style="color:#114ba6"><- tibble(Alco = <span style="color:#00753b">13,</span> Malic = <span style="color:#00753b">2,</span> Ash = <span style="color:#00753b">2.2,</span> Alk = <span style="color:#00753b">19,</span> Mag = <span style="color:#00753b">100,</span>
Phe = <span style="color:#00753b">2.3,</span> Flav = <span style="color:#00753b">2.5,</span> Non_flav = <span style="color:#00753b">0.35,</span> Proan = <span style="color:#00753b">1.7,</span>
Col = <span style="color:#00753b">4,</span> Hue = <span style="color:#00753b">1.1,</span> OD = <span style="color:#00753b">3,</span> Prol = <span style="color:#00753b">750)</span>

predict(qdaModel, newdata = <span style="color:#00753b">newcase)</span></span></code></span>``````

# 小结

原文作者：公众号Codewar原创作者
原文地址: https://blog.csdn.net/tm_ggplot2/article/details/116333970
本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。