Independent Component Analysis (ICA) In Python
Suppose that you’re at a house party and you’re talking to some cute girl. As you listen, your ears are being bombarded by the sound coming from the conversations going on between different groups of people through out the house and from the music that’s playing rather loudly in the background. Yet, none of this prevents you from focusing in on what the girl is saying since human beings possess the innate ability to differentiate between sounds.
If, however, this were taking place as part of scene in a movie, the microphone which we’d use to record the conversation would lack the necessary capacity to differentiate between all the sounds going on in the room. This is where Independent Component Analysis, or ICA for short, comes in to play. ICA is a computational method for separating a multivariate signal into its underlying components. Using ICA, we can extract the desired component (i.e. conversation between you and the girl) from the amalgamation of multiple signals.
Independent Component Analysis (ICA) Algorithm
At a high level, ICA can be broken down into the following steps.
- Center x by subtracting the mean
- Whiten x
- Choose a random initial value for the de-mixing matrix w
- Calculate the new value for w
- Normalize w
- Check whether algorithm has converged and if it hasn’t, return to step 4
- Take the dot product of w and x to get the independent source signals
Whitening
Before applying the ICA algorithm, we must first “whiten” our signal. To “whiten” a given signal means that we transform it in such a way that potential correlations between its components are removed (covariance equal to 0) and the variance of each component is equal to 1. Another way of looking at it is that the covariance matrix of the whitened signal will be equal to identity matrix.
The actual way we set about whitening a signal involves the eigen-value decomposition of its covariance matrix. The corresponding mathematical equation can be described as follows.
where D is a diagonal matrix of eigenvalues (every lambda is an eigenvalue of the covariance matrix)
and E is an orthogonal matrix of eigenvectors
Once we’ve finished preprocessing the signal, for each component, we update the values of the de-mixing matrix w until the algorithm has converged or the maximum number of iterations has been reached. Convergence is considered attained when the dot product of w and its transpose is roughly equal to 1.
where
Python Code
Let’s see how we can go about implementing ICA from scratch in Python using Numpy. To start, we import the following libraries.
import numpy as np
np.random.seed(0)
from scipy import signal
from scipy.io import wavfile
from matplotlib import pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11.7,8.27)})
Next, we define g
and g’
which we’ll use to determine the new value for w.
def g(x):
return np.tanh(x)
def g_der(x):
return 1 - g(x) * g(x)
We create a function to center the signal by subtracting the mean.
def center(X):
X = np.array(X)
mean = X.mean(axis=1, keepdims=True)
return X- mean
We define a function to whiten the signal using the method described above.
def whitening(X):
cov = np.cov(X)
d, E = np.linalg.eigh(cov)
D = np.diag(d)
D_inv = np.sqrt(np.linalg.inv(D))
X_whiten = np.dot(E, np.dot(D_inv, np.dot(E.T, X)))
return X_whiten
We define a function to update the de-mixing matrix w.
def calculate_new_w(w, X):
w_new = (X * g(np.dot(w.T, X))).mean(axis=1) - g_der(np.dot(w.T, X)).mean() * w
w_new /= np.sqrt((w_new ** 2).sum())
return w_new
Finally, we define the main method which calls the preprocessing functions, initializes w to some random set of values and iteratively updates w. Again, convergence can be judged by the fact that an ideal w would be orthogonal, and hence w multiplied by its transpose would be approximately equal to 1. After computing the optimal value of w for each component, we take the dot product of the resulting matrix and the signal x to get the sources.
def ica(X, iterations, tolerance=1e-5):
X = center(X)
X = whitening(X)
components_nr = X.shape[0]
W = np.zeros((components_nr, components_nr), dtype=X.dtype)
for i in range(components_nr):
w = np.random.rand(components_nr)
for j in range(iterations):
w_new = calculate_new_w(w, X)
if i >= 1:
w_new -= np.dot(np.dot(w_new, W[:i].T), W[:i])
distance = np.abs(np.abs((w * w_new).sum()) - 1)
w = w_new
if distance < tolerance:
break
W[i, :] = w
S = np.dot(W, X)
return S
We define a function to plot and compare the original, mixed and predicted signals.
def plot_mixture_sources_predictions(X, original_sources, S):
fig = plt.figure()
plt.subplot(3, 1, 1)
for x in X:
plt.plot(x)
plt.title("mixtures")
plt.subplot(3, 1, 2)
for s in original_sources:
plt.plot(s)
plt.title("real sources")
plt.subplot(3,1,3)
for s in S:
plt.plot(s)
plt.title("predicted sources")
fig.tight_layout()
plt.show()
For the sake of the proceeding example, we create a method to artificially mix different source signals.
def mix_sources(mixtures, apply_noise=False):
for i in range(len(mixtures)):
max_val = np.max(mixtures[i])
if max_val > 1 or np.min(mixtures[i]) < 1:
mixtures[i] = mixtures[i] / (max_val / 2) - 0.5
X = np.c_[[mix for mix in mixtures]]
if apply_noise:
X += 0.02 * np.random.normal(size=X.shape)
return X
Then, we create 3 signals, each with its own distinct pattern.
n_samples = 2000
time = np.linspace(0, 8, n_samples)
s1 = np.sin(2 * time) # sinusoidal
s2 = np.sign(np.sin(3 * time)) # square signal
s3 = signal.sawtooth(2 * np.pi * time) # saw tooth signal
In the proceeding example, we compute the dot product of the matrix A and the signals to obtain a combination of all three. We then use Independent Component Analysis to separate the mixed signal into the original source signals.
X = np.c_[s1, s2, s3]
A = np.array(([[1, 1, 1], [0.5, 2, 1.0], [1.5, 1.0, 2.0]]))
X = np.dot(X, A.T)
X = X.T
S = ica(X, iterations=1000)
plot_mixture_sources_predictions(X, [s1, s2, s3], S)
Next, we use ICA to decompose a mixture of actual audio tracks and plot the result. If you’d like to try it yourself, you can get the audio samples here. I encourage you to actually try listening to the different audio tracks.
sampling_rate, mix1 = wavfile.read('mix1.wav')
sampling_rate, mix2 = wavfile.read('mix2.wav')
sampling_rate, source1 = wavfile.read('source1.wav')
sampling_rate, source2 = wavfile.read('source2.wav')
X = mix_sources([mix1, mix2])
S = ica(X, iterations=1000)
plot_mixture_sources_predictions(X, [source1, source2], S)
wavfile.write('out1.wav', sampling_rate, S[0])
wavfile.write('out2.wav', sampling_rate, S[1])
Sklearn
Finally, we take a look at how we could go about achieving the same result using the scikit-learn
implementation of ICA.
from sklearn.decomposition import FastICA
np.random.seed(0)
n_samples = 2000
time = np.linspace(0, 8, n_samples)
s1 = np.sin(2 * time)
s2 = np.sign(np.sin(3 * time))
s3 = signal.sawtooth(2 * np.pi * time)
S = np.c_[s1, s2, s3]
S += 0.2 * np.random.normal(size=S.shape)
S /= S.std(axis=0)
A = np.array([[1, 1, 1], [0.5, 2, 1.0], [1.5, 1.0, 2.0]])
X = np.dot(S, A.T)
ica = FastICA(n_components=3)
S_ = ica.fit_transform(X)
fig = plt.figure()
models = [X, S, S_]
names = ['mixtures', 'real sources', 'predicted sources']
colors = ['red', 'blue', 'orange']
for i, (name, model) in enumerate(zip(names, models)):
plt.subplot(4, 1, i+1)
plt.title(name)
for sig, color in zip (model.T, colors):
plt.plot(sig, color=color)
fig.tight_layout()
plt.show()
The accompanying Jupyter Notebook can be found here.