Python: Independent Component Analysis (ICA) versus PCA

This post demonstrates the usage of Independent Component Analysis (ICA) in Python, a technique akin to PCA but focused on independent factors (source signals).



PCA versus ICA



Principal Component Analysis (PCA) and Independent Component Analysis (ICA) are both dimensionality reduction techniques. PCA transforms the data into a new set of uncorrelated variables, called principal components, which capture the maximum variance in the dataset.

In contrast, ICA separates a multivariate signal into additive components, assuming that the components are statistically independent and non-Gaussian, aiming to uncover underlying hidden factors that generated the observed data.

In other words, ICA extract statistically independent signals from a mixture of sources like the following illustration.


I use the DRA yield curve dataset to conduct PCA and ICA, concluding with a discussion on a reordering issue.


Python code


At first, let's download the DRA data and PCA and ICA analyses have been conducted on this dataset, aiming for a total of 6 components. It is so simple.


import pandas as pd
from sklearn.decomposition import PCA, FastICA
    
import requests
from io import StringIO
 
#==============================================
# URL for the data
#==============================================
url = "http://econweb.umd.edu/~webspace/aruoba" \
    + "/research/paper5/DRA%20Data.txt"
response = requests.get(url)
data = StringIO(response.text)
df = pd.read_csv(data, sep="\t", header=0)
 
# Convert yield columns to a matrix divided by 100
df = df.iloc[:, 1:18/ 100
 
#==============================================
# Perform PCA and ICA
#==============================================
ncomp = 6 # number of components
 
pca = PCA(n_components=ncomp)
pca_res = pca.fit_transform(df)
 
ica = FastICA(n_components=ncomp)
ica_res = ica.fit_transform(df)
 
#==============================================
# We can access each components of PCs or ICs
#----------------------------------------------
# pca_res[:, 0], pca_res[:, 1], ...
# ica_res[:, 0], ica_res[:, 1], ...
#==============================================
 
cs


And now, we can access each component obtained from the PCA or ICA separately through their respective arrays in the following way.

pca_res[:, 0], pca_res[:, 1], ...
 
ica_res[:, 0], ica_res[:, 1], ...
 
cs


To compare their behaviors, it's necessary to visually represent each component of both methods graphically.

The figures below illustrate the ordered pairs of PCA and ICA components in the left panel, and their standardized counterparts in the right panel for comparison. It's crucial to note that PCs are ordered based on their eigenvalues, whereas ICs lack a similar criterion due to the nature of ICA. Consequently, comparing these components may not yield meaningful insights due to the distinct characteristics between PCA and ICA.



In comparing PCs and ICs, reordering the ICs becomes necessary. To do this, I utilize pairwise correlation to identify the IC that displays the highest correlation with each ordered PC, based on their variance. This sequential process excludes previously selected ICs.

The figure below illustrates that although PCs and ICs generally display similarity, there are instances of disparity between them.




No comments:

Post a Comment