Linear Discriminant Analysis In Python
Linear Discriminant Analysis (LDA) is a dimensionality reduction technique. As the name implies dimensionality reduction techniques reduce the number of dimensions (i.e. variables) in a dataset while retaining as much information as possible. For instance, suppose that we plotted the relationship between two variables where each color represent a different class.
If we’d like to reduce the number of dimensions down to 1, one approach would be to project everything on to the x-axis.
This is bad because it disregards any useful information provided by the second feature. On the other hand, Linear Discriminant Analysis, or LDA, uses the information from both features to create a new axis and projects the data on to the new axis in such a way as to minimizes the variance and maximizes the distance between the means of the two classes.
Code
Let’s see how we could go about implementing Linear Discriminant Analysis from scratch using Python. To start, import the following libraries.
from sklearn.datasets import load_wine
import pandas as pd
import numpy as np
np.set_printoptions(precision=4)
from matplotlib import pyplot as plt
import seaborn as sns
sns.set()
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
In the proceeding tutorial, we’ll be working with the wine dataset which can be obtained from the UCI machine learning repository. Fortunately, the scitkit-learn
library provides a wrapper function for downloading and
wine = load_wine()
X = pd.DataFrame(wine.data, columns=wine.feature_names)
y = pd.Categorical.from_codes(wine.target, wine.target_names)
The dataset contains 178 rows of 13 columns each.
X.shape
The features are composed of various characteristics such as the magnesium and alcohol content of the wine.
X.head()
There are 3 different kinds of wine.
wine.target_names
We create a DataFrame containing both the features and classes.
df = X.join(pd.Series(y, name='class'))
Linear Discriminant Analysis can be broken up into the following steps:
- Compute the within class and between class scatter matrices
- Compute the eigenvectors and corresponding eigenvalues for the scatter matrices
- Sort the eigenvalues and select the top k
- Create a new matrix containing eigenvectors that map to the k eigenvalues
- Obtain the new features (i.e. LDA components) by taking the dot product of the data and the matrix from step 4
Within Class Scatter Matrix
We calculate the within class scatter matrix using the following formula.
where c is the total number of distinct classes and
where x is a sample (i.e. row) and n is the total number of samples with a given class.
For every class, we create a vector with the means of each feature.
class_feature_means = pd.DataFrame(columns=wine.target_names)
for c, rows in df.groupby('class'):
class_feature_means[c] = rows.mean()
class_feature_means
Then, we plug the mean vectors (mi) into the equation from before in order to obtain the within class scatter matrix.
within_class_scatter_matrix = np.zeros((13,13))
for c, rows in df.groupby('class'):
rows = rows.drop(['class'], axis=1)
s = np.zeros((13,13))
for index, row in rows.iterrows():
x, mc = row.values.reshape(13,1), class_feature_means[c].values.reshape(13,1)
s += (x - mc).dot((x - mc).T)
within_class_scatter_matrix += s
Between Class Scatter Matrix
Next, we calculate the between class scatter matrix using the following formula.
where
``` feature_means = df.mean() ``` ``` between_class_scatter_matrix = np.zeros((13,13)) ``` ``` for c in class_feature_means: n = len(df.loc[df['class'] == c].index)mc, m = class_feature_means[c].values.reshape(13,1), feature_means.values.reshape(13,1)
between_class_scatter_matrix += n * (mc - m).dot((mc - m).T)
Then, we solve the generalized eigenvalue problem for
<figure>
![](https://d3i8kmg9ozyieo.cloudfront.net/linear-discriminant-analysis-in-python-17.png)
</figure>
to obtain the linear discriminants.
eigen_values, eigen_vectors = np.linalg.eig(np.linalg.inv(within_class_scatter_matrix).dot(between_class_scatter_matrix))
The eigenvectors with the highest eigenvalues carry the most information about the distribution of the data. Thus, we sort the eigenvalues from highest to lowest and select the first **_k_** eigenvectors. In order to ensure that the eigenvalue maps to the same eigenvector after sorting, we place them in a temporary array.
pairs = [(np.abs(eigen_values[i]), eigen_vectors[:,i]) for i in range(len(eigen_values))]
pairs = sorted(pairs, key=lambda x: x[0], reverse=True)
for pair in pairs:
print(pair[0])
<figure>
![](https://d3i8kmg9ozyieo.cloudfront.net/linear-discriminant-analysis-in-python-18.png)
</figure>
Just looking at the values, it’s difficult to determine how much of the variance is explained by each component. Thus, we express it as a percentage.
eigen_value_sums = sum(eigen_values)
print(‘Explained Variance’)
for i, pair in enumerate(pairs):
print(‘Eigenvector {}: {}‘.format(i, (pair[0]/eigen_value_sums).real))
<figure>
![](https://d3i8kmg9ozyieo.cloudfront.net/linear-discriminant-analysis-in-python-19.png)
</figure>
First, we create a matrix **_W_** with the first two eigenvectors.
w_matrix = np.hstack((pairs[0][1].reshape(13,1), pairs[1][1].reshape(13,1))).real
Then, we save the dot product of **_X_** and **_W_** into a new matrix **_Y_**.
<figure>
![](https://d3i8kmg9ozyieo.cloudfront.net/linear-discriminant-analysis-in-python-20.png)
</figure>
where **_X_** is a **_n×d_** matrix with **_n_** samples and **_d_** dimensions, and **_Y_** is a **_n×k_** matrix with **_n_** samples and **_k_** ( k<n) dimensions. In other words, **_Y_** is composed of the LDA components, or said yet another way, the new feature space.
X_lda = np.array(X.dot(w_matrix))
`matplotlib` can’t handle categorical variables directly. Thus, we encode every class as a number so that we can incorporate the class labels into our plot.
le = LabelEncoder()
y = le.fit_transform(df[‘class’])
Then, we plot the data as a function of the two LDA components and use a different color for each class.
plt.xlabel(‘LD1’)
plt.ylabel(‘LD2’)
plt.scatter(
X_lda[:,0],
X_lda[:,1],
c=y,
cmap=‘rainbow’,
alpha=0.7,
edgecolors=‘b’
)
<figure>
![](https://d3i8kmg9ozyieo.cloudfront.net/linear-discriminant-analysis-in-python-21.png)
</figure>
Rather than implementing the Linear Discriminant Analysis algorithm from scratch every time, we can use the predefined `LinearDiscriminantAnalysis` class made available to us by the `scikit-learn` library.
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis()
X_lda = lda.fit_transform(X, y)
We can access the following property to obtain the variance explained by each component.
lda.explained_variance_ratio_
<figure>
![](https://d3i8kmg9ozyieo.cloudfront.net/linear-discriminant-analysis-in-python-22.png)
</figure>
Just like before, we plot the two LDA components.
plt.xlabel(‘LD1’)
plt.ylabel(‘LD2’)
plt.scatter(
X_lda[:,0],
X_lda[:,1],
c=y,
cmap=‘rainbow’,
alpha=0.7,
edgecolors=‘b’
)
<figure>
![](https://d3i8kmg9ozyieo.cloudfront.net/linear-discriminant-analysis-in-python-23.png)
</figure>
Next, let’s take a look at how LDA compares to Principal Component Analysis or PCA. We start off by creating and fitting an instance of the `PCA` class.
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X, y)
We can access the `explained_variance_ratio_` property to view the percentage of the variance explained by each component.
pca.explained_variance_ratio_
<figure>
![](https://d3i8kmg9ozyieo.cloudfront.net/linear-discriminant-analysis-in-python-24.png)
</figure>
As we can see, PCA selected the components which would result in the highest spread (retain the most information) and not necessarily the ones which maximize the separation between classes.
plt.xlabel(‘PC1’)
plt.ylabel(‘PC2’)
plt.scatter(
X_pca[:,0],
X_pca[:,1],
c=y,
cmap=‘rainbow’,
alpha=0.7,
edgecolors=‘b’
)
<figure>
![](https://d3i8kmg9ozyieo.cloudfront.net/linear-discriminant-analysis-in-python-25.png)
</figure>
Next, let’s see whether we can create a model to classify the using the LDA components as features. First, we split the data into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X_lda, y, random_state=1)
Then, we build and train a Decision Tree. After predicting the category of each sample in the test set, we create a confusion matrix to evaluate the model’s performance.
dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
confusion_matrix(y_test, y_pred)
<figure>
![](https://d3i8kmg9ozyieo.cloudfront.net/linear-discriminant-analysis-in-python-26.png)
</figure>
As we can see, the Decision Tree classifier correctly classified everything in the test set.