🎯 Principal Component Analysis (PCA)

PCA is a fundamental dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving maximum variance, making it invaluable for data visualization, noise reduction, and feature extraction.

Resources: Scikit-learn PCA | Elements of Statistical Learning - Chapter 14 | Pattern Recognition and Machine Learning - Chapter 12

Summary

Principal Component Analysis (PCA) is an unsupervised linear dimensionality reduction technique that identifies the principal components (directions of maximum variance) in high-dimensional data. It projects the original data onto a lower-dimensional subspace defined by these components, effectively reducing the number of features while retaining as much information as possible.

Key characteristics: - Dimensionality reduction: Reduces the number of features while preserving information - Variance maximization: Finds directions that capture maximum variance in data - Linear transformation: Uses linear combinations of original features - Orthogonal components: Principal components are orthogonal to each other - Data compression: Enables efficient storage and transmission of data - Noise reduction: Can filter out noise by discarding low-variance components

Applications: - Data visualization (reducing to 2D/3D for plotting) - Image compression and processing - Feature extraction for machine learning - Exploratory data analysis - Noise reduction and signal processing - Face recognition systems - Stock market analysis - Gene expression analysis

Types: - Standard PCA: Linear dimensionality reduction using covariance matrix - Kernel PCA: Non-linear extension using kernel methods - Sparse PCA: Incorporates sparsity constraints on components - Incremental PCA: For large datasets that don't fit in memory

>� Intuition

How PCA Works

Imagine you have a dataset of house prices with features like size, number of rooms, age, etc. Some features might be highly correlated (e.g., size and number of rooms). PCA finds new "directions" (principal components) that best capture the variation in your data. The first principal component captures the most variation, the second captures the next most variation (orthogonal to the first), and so on.

Think of it like finding the best angle to photograph a 3D object on a 2D photo - you want the angle that preserves the most information about the object's shape.

Mathematical Foundation

1. Covariance Matrix

For a dataset \(X \in \mathbb{R}^{n \times d}\) (n samples, d features), first center the data: \(\(\bar{X} = X - \mathbf{1}\mu^T\)\)

where \(\mu = \frac{1}{n}\sum_{i=1}^{n} X_i\) is the mean vector.

The covariance matrix is: \(\(C = \frac{1}{n-1}\bar{X}^T\bar{X}\)\)

2. Eigenvalue Decomposition

PCA finds the eigenvalues and eigenvectors of the covariance matrix: \(\(C\mathbf{v} = \lambda\mathbf{v}\)\)

Where: - \(\mathbf{v}\) are the eigenvectors (principal components) - \(\lambda\) are the eigenvalues (explained variance)

3. Principal Components

The eigenvectors \(\mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_d\) ordered by decreasing eigenvalues \(\lambda_1 \geq \lambda_2 \geq ... \geq \lambda_d\) are the principal components.

4. Dimensionality Reduction

To reduce to \(k\) dimensions, select the first \(k\) eigenvectors: \(\(W = [\mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_k] \in \mathbb{R}^{d \times k}\)\)

Transform the data: \(\(Z = \bar{X}W \in \mathbb{R}^{n \times k}\)\)

5. Reconstruction

The original data can be approximated as: \(\(\hat{X} = ZW^T + \mathbf{1}\mu^T\)\)

6. Explained Variance Ratio

The proportion of variance explained by the first \(k\) components: \(\(\text{Explained Variance Ratio} = \frac{\sum_{i=1}^{k} \lambda_i}{\sum_{i=1}^{d} \lambda_i}\)\)

=" Implementation using Libraries

Scikit-learn Implementation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, make_blobs
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import seaborn as sns

# Load and prepare data
iris = load_iris()
X, y = iris.data, iris.target
feature_names = iris.feature_names

# Standardize the features (important for PCA)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA()
X_pca = pca.fit_transform(X_scaled)

# Get explained variance ratio
explained_variance = pca.explained_variance_ratio_
cumulative_variance = np.cumsum(explained_variance)

print("Explained Variance by Component:")
for i, var in enumerate(explained_variance):
    print(f"PC{i+1}: {var:.3f} ({var*100:.1f}%)")

print(f"\nCumulative Explained Variance:")
for i, cum_var in enumerate(cumulative_variance):
    print(f"First {i+1} components: {cum_var:.3f} ({cum_var*100:.1f}%)")

# Visualize explained variance
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.bar(range(1, len(explained_variance) + 1), explained_variance)
plt.title('Explained Variance by Principal Component')
plt.xlabel('Principal Component')
plt.ylabel('Explained Variance Ratio')

plt.subplot(1, 2, 2)
plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, 'bo-')
plt.axhline(y=0.95, color='r', linestyle='--', label='95% threshold')
plt.title('Cumulative Explained Variance')
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance')
plt.legend()
plt.tight_layout()
plt.show()

# 2D visualization using first 2 components
pca_2d = PCA(n_components=2)
X_pca_2d = pca_2d.fit_transform(X_scaled)

plt.figure(figsize=(10, 6))
colors = ['red', 'green', 'blue']
for i, color in enumerate(colors):
    plt.scatter(X_pca_2d[y == i, 0], X_pca_2d[y == i, 1], 
                c=color, label=iris.target_names[i], alpha=0.6)
plt.xlabel(f'PC1 ({pca_2d.explained_variance_ratio_[0]:.2f} variance)')
plt.ylabel(f'PC2 ({pca_2d.explained_variance_ratio_[1]:.2f} variance)')
plt.title('PCA: Iris Dataset in 2D')
plt.legend()
plt.grid(True)
plt.show()

# Component interpretation
components_df = pd.DataFrame(
    pca_2d.components_.T,
    columns=['PC1', 'PC2'],
    index=feature_names
)
print("\nPrincipal Component Loadings:")
print(components_df)

# Biplot (features and data points)
def biplot(X_pca, components, feature_names, y):
    plt.figure(figsize=(12, 8))

    # Plot data points
    colors = ['red', 'green', 'blue']
    for i, color in enumerate(colors):
        plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], 
                   c=color, label=iris.target_names[i], alpha=0.6)

    # Plot feature vectors
    for i, feature in enumerate(feature_names):
        plt.arrow(0, 0, components[i, 0]*3, components[i, 1]*3,
                 head_width=0.1, head_length=0.1, fc='black', ec='black')
        plt.text(components[i, 0]*3.2, components[i, 1]*3.2, feature,
                fontsize=12, ha='center', va='center')

    plt.xlabel('PC1')
    plt.ylabel('PC2')
    plt.title('PCA Biplot - Iris Dataset')
    plt.legend()
    plt.grid(True)
    plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
    plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
    plt.show()

biplot(X_pca_2d, pca_2d.components_, feature_names, y)

Dimensionality Reduction for Classification

# Compare classification performance with and without PCA
def compare_with_without_pca(X, y, n_components=2):
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42, stratify=y)

    # Standardize
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    # Without PCA
    rf_original = RandomForestClassifier(random_state=42)
    rf_original.fit(X_train_scaled, y_train)
    y_pred_original = rf_original.predict(X_test_scaled)
    accuracy_original = accuracy_score(y_test, y_pred_original)

    # With PCA
    pca = PCA(n_components=n_components)
    X_train_pca = pca.fit_transform(X_train_scaled)
    X_test_pca = pca.transform(X_test_scaled)

    rf_pca = RandomForestClassifier(random_state=42)
    rf_pca.fit(X_train_pca, y_train)
    y_pred_pca = rf_pca.predict(X_test_pca)
    accuracy_pca = accuracy_score(y_test, y_pred_pca)

    print(f"Results Comparison:")
    print(f"Original features ({X.shape[1]}): {accuracy_original:.3f}")
    print(f"PCA features ({n_components}): {accuracy_pca:.3f}")
    print(f"Variance explained by PCA: {pca.explained_variance_ratio_.sum():.3f}")
    print(f"Dimensionality reduction: {X.shape[1]} -> {n_components} " +
          f"({(1 - n_components/X.shape[1])*100:.1f}% reduction)")

compare_with_without_pca(X_scaled, y, n_components=2)

� From Scratch Implementation

import numpy as np
import matplotlib.pyplot as plt

class PCAFromScratch:
    def __init__(self, n_components=None):
        """
        Principal Component Analysis implementation from scratch

        Parameters:
        n_components: Number of components to keep (if None, keep all)
        """
        self.n_components = n_components
        self.components_ = None
        self.explained_variance_ = None
        self.explained_variance_ratio_ = None
        self.mean_ = None
        self.singular_values_ = None

    def fit(self, X):
        """
        Fit PCA on the training data

        Parameters:
        X: Training data of shape (n_samples, n_features)
        """
        # Center the data
        self.mean_ = np.mean(X, axis=0)
        X_centered = X - self.mean_

        # Compute covariance matrix
        n_samples = X.shape[0]
        covariance_matrix = np.dot(X_centered.T, X_centered) / (n_samples - 1)

        # Compute eigenvalues and eigenvectors
        eigenvalues, eigenvectors = np.linalg.eigh(covariance_matrix)

        # Sort by decreasing eigenvalues
        idx = np.argsort(eigenvalues)[::-1]
        eigenvalues = eigenvalues[idx]
        eigenvectors = eigenvectors[:, idx]

        # Store results
        if self.n_components is None:
            self.n_components = len(eigenvalues)

        self.components_ = eigenvectors[:, :self.n_components].T
        self.explained_variance_ = eigenvalues[:self.n_components]
        self.explained_variance_ratio_ = (
            self.explained_variance_ / np.sum(eigenvalues)
        )

        # For compatibility with sklearn
        self.singular_values_ = np.sqrt(self.explained_variance_ * (n_samples - 1))

        return self

    def transform(self, X):
        """
        Transform the data to the principal component space

        Parameters:
        X: Data to transform of shape (n_samples, n_features)

        Returns:
        X_transformed: Transformed data of shape (n_samples, n_components)
        """
        X_centered = X - self.mean_
        return np.dot(X_centered, self.components_.T)

    def fit_transform(self, X):
        """
        Fit PCA and transform the data
        """
        return self.fit(X).transform(X)

    def inverse_transform(self, X_transformed):
        """
        Transform the data back to original space (reconstruction)

        Parameters:
        X_transformed: Data in PC space of shape (n_samples, n_components)

        Returns:
        X_reconstructed: Reconstructed data in original space
        """
        return np.dot(X_transformed, self.components_) + self.mean_

    def get_covariance(self):
        """
        Get the covariance matrix of the data in PC space
        """
        return np.dot(self.components_ * self.explained_variance_,
                     self.components_.T)

# Demonstration with synthetic data
np.random.seed(42)

# Create correlated 2D data
mean = [0, 0]
cov = [[3, 2.5], [2.5, 3]]
X_synthetic = np.random.multivariate_normal(mean, cov, 300)

# Apply custom PCA
pca_custom = PCAFromScratch(n_components=2)
X_pca_custom = pca_custom.fit_transform(X_synthetic)

# Compare with sklearn
from sklearn.decomposition import PCA
pca_sklearn = PCA(n_components=2)
X_pca_sklearn = pca_sklearn.fit_transform(X_synthetic)

print("Custom PCA Results:")
print("Explained variance ratio:", pca_custom.explained_variance_ratio_)
print("Components shape:", pca_custom.components_.shape)

print("\nSklearn PCA Results:")
print("Explained variance ratio:", pca_sklearn.explained_variance_ratio_)
print("Components shape:", pca_sklearn.components_.shape)

print("\nDifference in results (should be close to zero):")
print("Explained variance ratio diff:", 
      np.abs(pca_custom.explained_variance_ratio_ - 
             pca_sklearn.explained_variance_ratio_).max())

# Visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Original data
axes[0].scatter(X_synthetic[:, 0], X_synthetic[:, 1], alpha=0.7)
axes[0].set_title('Original Data')
axes[0].set_xlabel('Feature 1')
axes[0].set_ylabel('Feature 2')
axes[0].grid(True)

# PCA transformed data
axes[1].scatter(X_pca_custom[:, 0], X_pca_custom[:, 1], alpha=0.7)
axes[1].set_title('PCA Transformed Data')
axes[1].set_xlabel('PC1')
axes[1].set_ylabel('PC2')
axes[1].grid(True)

# Original data with principal components
axes[2].scatter(X_synthetic[:, 0], X_synthetic[:, 1], alpha=0.7)
mean_point = pca_custom.mean_

# Plot principal component directions
for i in range(2):
    direction = pca_custom.components_[i] * 3 * np.sqrt(pca_custom.explained_variance_[i])
    axes[2].arrow(mean_point[0], mean_point[1], 
                  direction[0], direction[1],
                  head_width=0.2, head_length=0.3, 
                  fc=f'C{i+1}', ec=f'C{i+1}', linewidth=2,
                  label=f'PC{i+1}')

axes[2].set_title('Original Data with Principal Components')
axes[2].set_xlabel('Feature 1')
axes[2].set_ylabel('Feature 2')
axes[2].legend()
axes[2].grid(True)

plt.tight_layout()
plt.show()

Advanced Features Implementation

def reconstruction_error_analysis(X, max_components=None):
    """
    Analyze reconstruction error vs number of components
    """
    if max_components is None:
        max_components = min(X.shape) - 1

    errors = []
    components_range = range(1, max_components + 1)

    for n_comp in components_range:
        pca = PCAFromScratch(n_components=n_comp)
        X_transformed = pca.fit_transform(X)
        X_reconstructed = pca.inverse_transform(X_transformed)

        # Calculate reconstruction error (mean squared error)
        error = np.mean((X - X_reconstructed) ** 2)
        errors.append(error)

    return components_range, errors

# Example with iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
X_iris = iris.data

# Standardize
X_iris_scaled = (X_iris - np.mean(X_iris, axis=0)) / np.std(X_iris, axis=0)

# Analyze reconstruction error
components, errors = reconstruction_error_analysis(X_iris_scaled, max_components=4)

plt.figure(figsize=(10, 6))
plt.plot(components, errors, 'bo-', linewidth=2, markersize=8)
plt.title('Reconstruction Error vs Number of Components')
plt.xlabel('Number of Principal Components')
plt.ylabel('Mean Squared Reconstruction Error')
plt.grid(True)
plt.xticks(components)
plt.show()

print("Reconstruction Errors:")
for comp, error in zip(components, errors):
    print(f"{comp} components: {error:.6f}")

� Assumptions and Limitations

Key Assumptions

Linear relationships: PCA assumes linear relationships between variables
Variance equals importance: Higher variance directions are assumed to be more important
Orthogonal components: Principal components are orthogonal (perpendicular)
Gaussian distribution: Works best with normally distributed data
Standardization: Features should be on similar scales (usually requires standardization)

Limitations

Linear transformation only: Cannot capture non-linear relationships
Solution: Use Kernel PCA or other non-linear techniques
Interpretability loss: Principal components are linear combinations of original features
Solution: Use factor analysis or sparse PCA for more interpretable components
Sensitive to scaling: Features with larger scales dominate the principal components
Solution: Always standardize features before applying PCA
Information loss: Dimensionality reduction inherently loses some information
Assessment: Monitor explained variance ratio and reconstruction error
Outlier sensitivity: Outliers can significantly affect principal components
Solution: Use robust PCA variants or outlier detection/removal
No guarantee of class separation: PCA maximizes variance, not class separability
Alternative: Use Linear Discriminant Analysis (LDA) for classification tasks

Comparison with Other Techniques

Method	Linear	Supervised	Interpretable	Non-linear
PCA			Partial
LDA			Partial
t-SNE
UMAP
Factor Analysis
ICA

When to avoid PCA: - When original features have clear business meaning that must be preserved - With categorical or ordinal data without proper encoding - When non-linear relationships are important - With very sparse data (consider specialized sparse PCA) - When you need exactly interpretable features for regulatory compliance

❓ Interview Questions

What is the mathematical intuition behind PCA and how does it work?

Answer: PCA finds the directions (principal components) in the data that capture the maximum variance. Mathematically, it performs eigenvalue decomposition on the covariance matrix:

Center the data: Subtract the mean from each feature
Compute covariance matrix: C = (X^T * X) / (n-1)
Find eigenvalues and eigenvectors: C*v = �*v
Sort by eigenvalues: Largest eigenvalues correspond to directions with most variance
Project data: Transform original data onto selected eigenvectors

The key insight is that eigenvectors of the covariance matrix are orthogonal directions of maximum variance, and eigenvalues represent the amount of variance explained by each direction.

Why do we need to standardize features before applying PCA?

Answer: Features must be standardized because PCA is sensitive to the scale of variables:

Scale dominance: Features with larger scales (e.g., income in dollars vs age in years) will dominate the principal components
Variance bias: PCA maximizes variance, so large-scale features appear to have more "importance"
Covariance matrix distortion: The covariance matrix will be dominated by high-variance features

Example: Without standardization, if you have height (cm, ~170) and weight (kg, ~70), height will dominate simply due to larger numerical values, not because it's more important.

Solution: Use z-score standardization: (x - �) / � for each feature.

How do you choose the optimal number of principal components?

Answer: Several methods exist for selecting the number of components:

Explained Variance Threshold: Keep components explaining 80-95% of variance
Elbow Method: Plot explained variance vs components, look for "elbow" point
Kaiser Rule: Keep components with eigenvalues > 1 (for standardized data)
Scree Plot: Visual inspection of eigenvalue decay
Cross-validation: Use downstream task performance to select optimal number
Business requirements: Based on computational constraints or interpretability needs

Code example:

cumsum_var = np.cumsum(pca.explained_variance_ratio_)
n_components = np.argmax(cumsum_var >= 0.95) + 1  # 95% variance

What's the difference between PCA and Linear Discriminant Analysis (LDA)?

Answer: Key differences:

Aspect	PCA	LDA
Type	Unsupervised	Supervised
Objective	Maximize variance	Maximize class separation
Input	Features only	Features + labels
Components	Up to min(n_features, n_samples)	Up to (n_classes - 1)
Use case	Dimensionality reduction	Classification preprocessing

When to use each: - PCA: Data exploration, compression, noise reduction, visualization - LDA: Classification tasks, when you want to maximize class separability

Example: For 3-class iris dataset, LDA can find at most 2 components, while PCA can find up to 4.

How do you interpret the principal components and their loadings?

Answer: Principal components and loadings provide insights into data structure:

Loadings (Component coefficients): - Show contribution of each original feature to each PC - Values range typically from -1 to 1 - Large absolute values indicate strong influence

Interpretation steps: 1. Examine loading values: Which features contribute most to each PC? 2. Look for patterns: Do related features load together? 3. Name components: Based on dominant features (e.g., "size factor", "ratio factor")

Example interpretation:

PC1 loadings: [0.8 height, 0.7 weight, 0.1 age] � "Physical size factor"
PC2 loadings: [0.2 height, -0.1 weight, 0.9 age] � "Age factor"

What are the limitations of PCA and when should you not use it?

Answer: Major limitations and alternatives:

Limitations: 1. Linear only: Cannot capture non-linear relationships � Use Kernel PCA, t-SNE 2. Variance ` Importance: High variance doesn't always mean importance � Use domain knowledge 3. Loss of interpretability: PCs are combinations of original features � Use Sparse PCA, Factor Analysis 4. Outlier sensitive: Outliers can skew components � Use Robust PCA 5. No class consideration: Doesn't consider target variable � Use LDA for classification

When NOT to use PCA: - Categorical data without proper encoding - When original features must be preserved (regulatory requirements) - Very sparse data (many zeros) - Non-linear relationships are crucial - Small datasets (overfitting risk)

How do you handle missing values when applying PCA?

Answer: Several strategies for missing data in PCA:

1. Complete Case Analysis:

# Remove rows with any missing values
X_complete = X.dropna()

2. Imputation before PCA:

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
X_imputed = imputer.fit_transform(X)

3. Iterative Imputation:

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
imputer = IterativeImputer()
X_imputed = imputer.fit_transform(X)

4. PCA with missing values (specialized methods): - Use algorithms like NIPALS (Nonlinear Iterative Partial Least Squares) - Probabilistic PCA that handles missing values directly

Best practice: Analyze missing data patterns first, then choose appropriate strategy based on data characteristics.

Explain the relationship between PCA and Singular Value Decomposition (SVD).

Answer: PCA and SVD are mathematically related:

SVD decomposition of centered data matrix X:

X = U * � * V^T

Where: - U: Left singular vectors - �: Singular values (diagonal matrix) - V: Right singular vectors

Connection to PCA: - Principal components = columns of V - Explained variance = (singular values)� / (n-1) - Transformed data = U * �

Advantages of SVD approach: 1. More numerically stable 2. Computationally efficient for tall matrices 3. Doesn't require computing covariance matrix explicitly 4. Better for sparse data

Implementation:

# Using SVD for PCA
U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)
components = Vt  # Principal components
explained_variance = (s ** 2) / (n - 1)

How do you evaluate the quality of PCA results?

Answer: Multiple metrics assess PCA quality:

1. Explained Variance Ratio:

total_variance_explained = sum(pca.explained_variance_ratio_)
print(f"Total variance explained: {total_variance_explained:.3f}")

2. Reconstruction Error:

X_reconstructed = pca.inverse_transform(X_pca)
mse = np.mean((X_original - X_reconstructed) ** 2)

3. Silhouette Score (if labels available):

from sklearn.metrics import silhouette_score
score = silhouette_score(X_pca, labels)

4. Downstream Task Performance: - Compare classifier accuracy before/after PCA - Monitor if important patterns are preserved

5. Visual Assessment: - Scree plots for eigenvalue decay - Biplots for feature relationships - 2D/3D scatter plots for cluster visualization

Quality indicators: - First few PCs explain >80% variance - Smooth eigenvalue decay (no sudden drops) - Components are interpretable - Downstream performance maintained

>� Examples

Real-world Example: Image Compression with PCA

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_olivetti_faces
from sklearn.decomposition import PCA

# Load face images dataset
faces = fetch_olivetti_faces(shuffle=True, random_state=42)
X_faces = faces.data
y_faces = faces.target

print(f"Dataset shape: {X_faces.shape}")
print(f"Original image dimensions: {int(np.sqrt(X_faces.shape[1]))}x{int(np.sqrt(X_faces.shape[1]))}")

# Apply PCA with different numbers of components
n_components_list = [10, 50, 100, 200, 400]
fig, axes = plt.subplots(2, len(n_components_list) + 1, figsize=(18, 6))

# Original image
sample_idx = 0
original_image = X_faces[sample_idx].reshape(64, 64)
axes[0, 0].imshow(original_image, cmap='gray')
axes[0, 0].set_title('Original')
axes[0, 0].axis('off')

# Show compression ratios
compression_ratios = []
reconstruction_errors = []

for i, n_comp in enumerate(n_components_list):
    # Apply PCA
    pca = PCA(n_components=n_comp)
    X_pca = pca.fit_transform(X_faces)
    X_reconstructed = pca.inverse_transform(X_pca)

    # Calculate compression ratio and error
    original_size = X_faces.shape[1]  # 4096 pixels
    compressed_size = n_comp + original_size * n_comp  # components + loadings
    compression_ratio = original_size / compressed_size

    mse = np.mean((X_faces - X_reconstructed) ** 2)

    compression_ratios.append(compression_ratio)
    reconstruction_errors.append(mse)

    # Display reconstructed image
    reconstructed_image = X_reconstructed[sample_idx].reshape(64, 64)
    axes[0, i + 1].imshow(reconstructed_image, cmap='gray')
    axes[0, i + 1].set_title(f'{n_comp} PCs\n({pca.explained_variance_ratio_.sum():.2f} var)')
    axes[0, i + 1].axis('off')

# Plot metrics
axes[1, 0].remove()  # Remove empty subplot
axes[1, 1].bar(range(len(n_components_list)), compression_ratios)
axes[1, 1].set_title('Compression Ratio')
axes[1, 1].set_xlabel('Number of Components')
axes[1, 1].set_xticks(range(len(n_components_list)))
axes[1, 1].set_xticklabels(n_components_list)

axes[1, 2].plot(n_components_list, reconstruction_errors, 'ro-')
axes[1, 2].set_title('Reconstruction Error')
axes[1, 2].set_xlabel('Number of Components')
axes[1, 2].set_ylabel('MSE')

# Explained variance
axes[1, 3].remove()
axes[1, 4].remove()
ax_combined = plt.subplot(2, len(n_components_list) + 1, (2*len(n_components_list) + 4, 2*len(n_components_list) + 6))

# Plot cumulative explained variance
pca_full = PCA()
pca_full.fit(X_faces)
cumsum_var = np.cumsum(pca_full.explained_variance_ratio_)

ax_combined.plot(range(1, min(201, len(cumsum_var) + 1)), 
                cumsum_var[:200], 'b-', linewidth=2)
ax_combined.axhline(y=0.95, color='r', linestyle='--', label='95% variance')
ax_combined.set_title('Cumulative Explained Variance')
ax_combined.set_xlabel('Number of Components')
ax_combined.set_ylabel('Cumulative Variance')
ax_combined.legend()
ax_combined.grid(True)

plt.tight_layout()
plt.show()

# Print summary statistics
print("\nImage Compression Results:")
for n_comp, ratio, error in zip(n_components_list, compression_ratios, reconstruction_errors):
    print(f"{n_comp:3d} components: {ratio:4.1f}x compression, MSE: {error:.6f}")

Market Analysis Example

import pandas as pd
import yfinance as yf
from datetime import datetime, timedelta

# Download stock data for tech companies
tickers = ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'TSLA', 'META', 'NVDA', 'NFLX']
end_date = datetime.now()
start_date = end_date - timedelta(days=365*2)  # 2 years of data

# Download stock returns
stock_data = {}
for ticker in tickers:
    try:
        stock = yf.download(ticker, start=start_date, end=end_date)
        stock_data[ticker] = stock['Adj Close'].pct_change().dropna()
    except:
        print(f"Could not download data for {ticker}")

# Create returns dataframe
returns_df = pd.DataFrame(stock_data)
returns_df = returns_df.dropna()

print(f"Stock returns data shape: {returns_df.shape}")
print("\nBasic statistics:")
print(returns_df.describe())

# Apply PCA to stock returns
from sklearn.preprocessing import StandardScaler

# Standardize returns
scaler = StandardScaler()
returns_scaled = scaler.fit_transform(returns_df)

# Fit PCA
pca_stocks = PCA()
returns_pca = pca_stocks.fit_transform(returns_scaled)

# Analyze results
explained_var = pca_stocks.explained_variance_ratio_
cumulative_var = np.cumsum(explained_var)

print(f"\nPCA Results for Stock Returns:")
print("Explained Variance by Component:")
for i, var in enumerate(explained_var[:5]):
    print(f"PC{i+1}: {var:.3f} ({var*100:.1f}%)")

print(f"\nFirst 3 components explain {cumulative_var[2]:.3f} ({cumulative_var[2]*100:.1f}%) of variance")

# Component interpretation
components_df = pd.DataFrame(
    pca_stocks.components_[:3].T,  # First 3 components
    columns=['PC1 (Market)', 'PC2 (Tech vs Value)', 'PC3 (Volatility)'],
    index=returns_df.columns
)

print("\nComponent Loadings (Stock Exposure to Factors):")
print(components_df.round(3))

# Visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Explained variance
axes[0, 0].bar(range(1, len(explained_var) + 1), explained_var)
axes[0, 0].set_title('Explained Variance by Component')
axes[0, 0].set_xlabel('Principal Component')
axes[0, 0].set_ylabel('Explained Variance Ratio')

# Cumulative variance
axes[0, 1].plot(range(1, len(cumulative_var) + 1), cumulative_var, 'bo-')
axes[0, 1].axhline(y=0.95, color='r', linestyle='--', label='95%')
axes[0, 1].set_title('Cumulative Explained Variance')
axes[0, 1].set_xlabel('Number of Components')
axes[0, 1].legend()

# Component loadings heatmap
import seaborn as sns
sns.heatmap(components_df.T, annot=True, cmap='coolwarm', center=0,
            ax=axes[1, 0], cbar_kws={'label': 'Loading'})
axes[1, 0].set_title('Component Loadings Heatmap')

# Factor scores over time
factor_scores = pd.DataFrame(returns_pca[:, :3], 
                           index=returns_df.index,
                           columns=['Market Factor', 'Style Factor', 'Volatility Factor'])

axes[1, 1].plot(factor_scores.index, factor_scores['Market Factor'], 
                label='Market Factor', alpha=0.7)
axes[1, 1].plot(factor_scores.index, factor_scores['Style Factor'], 
                label='Style Factor', alpha=0.7)
axes[1, 1].set_title('Principal Component Scores Over Time')
axes[1, 1].set_ylabel('Factor Score')
axes[1, 1].legend()
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Portfolio analysis using PCA
print("\nPortfolio Risk Analysis using PCA:")

# Risk contribution by component
portfolio_weights = np.ones(len(tickers)) / len(tickers)  # Equal weights
portfolio_return_std = np.dot(portfolio_weights, returns_df.std())

print(f"Portfolio return volatility: {portfolio_return_std:.4f}")

# Factor exposures
factor_exposures = np.dot(portfolio_weights, components_df.values)
print("Portfolio factor exposures:")
for i, exposure in enumerate(factor_exposures):
    print(f"  PC{i+1}: {exposure:.3f}")

📚 References

Books:
The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman - Chapter 14
Pattern Recognition and Machine Learning by Christopher Bishop - Chapter 12
Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani - Chapter 10
Documentation:
Scikit-learn PCA
Scikit-learn Decomposition Guide
NumPy Linear Algebra
Tutorials:
PCA Explained Visually
A Tutorial on Principal Component Analysis
PCA with Python Tutorial
Research Papers:
Pearson, K. (1901). On Lines and Planes of Closest Fit to Systems of Points in Space
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components
Jolliffe, I.T. (2002). Principal Component Analysis, Second Edition
Advanced Topics:
Kernel PCA
Sparse PCA
Incremental PCA
Online Courses:
Machine Learning Course - Stanford CS229
Dimensionality Reduction - Coursera
Advanced Machine Learning - edX