Causal Inference Basics

For ML Researchers

SuNaAI Lab

Technical Guide Series

ResourcesTechnical GuidesCausal Inference Basics

Chapter 1: Beyond Correlation

Understanding causation in machine learning

Traditional machine learning excels at finding patterns and correlations in data. However, many real-world problems require understanding causation: What happens if we change X? Will treatment Y improve outcome Z? Causal inference provides the toolkit to answer these questions.

Why Causal Inference Matters

In many ML applications, we need to predict the effect of interventions and understand mechanisms. Causal inference enables us to move beyond associations to true cause-and-effect relationships.

Key Applications

  • Treatment effect estimation in medical trials
  • Policy evaluation and decision-making
  • Recommender systems with causal reasoning
  • Counterfactual predictions in ML models
  • Understanding feature importance causally
  • Debiasing algorithms and datasets

Chapter 2: Causality vs Correlation

Understanding the fundamental difference

⚠️ Correlation ≠ Causation

Ice cream sales and drowning deaths are correlated (both increase in summer), but ice cream doesn't cause drowning. Correlation measures association; causation requires understanding mechanisms.

Common Confounders

🔀 Confounding Variables

Factors that affect both treatment and outcome, creating spurious associations.

Example: Age confounds the relationship between exercise (treatment) and health (outcome) - older people exercise less and have worse health.

🔄 Selection Bias

When individuals are not randomly assigned to treatment, creating systematic differences.

Example: Comparing job training programs between volunteers and non-volunteers is biased because volunteers might be more motivated.

⚡ Intermediate Variables

Variables on the causal path that may mediate or modify the treatment effect.

Example: Education → Job Skills → Income. Job skills mediate the effect of education on income.

🎲 Endogeneity

When the treatment variable is correlated with the error term in the model.

Example: Price and demand are endogenous - price affects demand, but high demand also affects price.

Chapter 3: Potential Outcomes Framework

The Rubin causal model

The potential outcomes framework, also known as the Rubin causal model, is a powerful way to think about causal effects. It defines treatment effects as the difference between potential outcomes under different treatment conditions.

Key Concepts

For each individual i, we define:

  • Y(1): Potential outcome under treatment
  • Y(0): Potential outcome under control
  • ATE: Average Treatment Effect = E[Y(1) - Y(0)]

The Fundamental Problem of Causal Inference

🚨 The Challenge

For each individual, we can only observe one potential outcome. We see Y(1) if they receive treatment, or Y(0) if they don't—never both.

Observed: Y_i = T_i × Y_i(1) + (1 - T_i) × Y_i(0)
We never observe: Both Y_i(1) and Y_i(0) together

Identification Assumptions

1. Consistency

The observed outcome under treatment is the potential outcome under that treatment.

2. Ignorability (Unconfoundedness)

Treatment assignment is independent of potential outcomes, conditional on covariates.

(Y(0), Y(1)) ⟂ T | X

3. Positivity

Every individual has a positive probability of receiving both treatment and control.

4. Stable Unit Treatment Value (SUTVA)

No spillover effects—one person's treatment doesn't affect another person's outcome.

Chapter 4: Propensity Score Matching

Balancing observed confounders

Propensity scores help balance treatment and control groups on observed covariates, reducing selection bias in observational studies. They're the probability of receiving treatment given observed characteristics.

Propensity Score Definition
e(x) = P(T = 1 | X = x)

Where:
- e(x) is the propensity score
- T is treatment indicator (0 or 1)
- X are observed covariates
- P is probability

Matching Methods

1:1 Nearest Neighbor

Match each treated unit with the closest control unit on propensity score.

Pros: Simple, preserves sample size
Cons: May leave good matches unused

k:1 Matching

Match each treated unit with k closest control units.

Pros: Better variance estimation
Cons: Requires multiple controls

Caliper Matching

Only match units within a certain distance threshold.

Pros: Ensures quality matches
Cons: May leave some units unmatched

Stratification

Group units into strata based on propensity score quintiles.

Pros: Uses all data
Cons: May have poor balance within strata

Implementation Example

Propensity Score Implementation
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# Step 1: Estimate propensity scores
X = df[['age', 'education', 'income']]  # covariates
T = df['treatment']  # treatment indicator
y = df['outcome']    # outcome

# Fit logistic regression to estimate propensity scores
model = LogisticRegression()
model.fit(X, T)
propensity_scores = model.predict_proba(X)[:, 1]

# Step 2: Match on propensity score
from sklearn.neighbors import NearestNeighbors

treated = df[df['treatment'] == 1]
control = df[df['treatment'] == 0]

# Find nearest neighbors
nn = NearestNeighbors(n_neighbors=1)
nn.fit(control[['propensity']])

# Match each treated unit to nearest control
distances, indices = nn.kneighbors(treated[['propensity']])

# Step 3: Estimate treatment effect on matched sample
matched_control = control.iloc[indices.flatten()]
ate = treated['outcome'].mean() - matched_control['outcome'].mean()

print(f"Average Treatment Effect: {ate:.2f}")

Chapter 5: Instrumental Variables

Dealing with unobserved confounders

When unobserved confounders bias our treatment effect estimates, instrumental variables provide a way to identify causal effects. An instrument affects treatment but not outcome except through treatment.

Instrument Validity Conditions

  1. Relevance: Instrument affects treatment (Z → T)
  2. Exclusion: Instrument only affects outcome through treatment (Z → T → Y)
  3. Independence: Instrument is unrelated to unobserved confounders

Two-Stage Least Squares (2SLS)

2SLS Estimation
# Stage 1: Regress treatment on instrument
T_hat = a + b*Z + e

# Stage 2: Regress outcome on predicted treatment
Y = alpha + beta*T_hat + u

# The coefficient beta is the causal effect estimate

from statsmodels.sandbox.regression.gmm import IV2SLS

result = IV2SLS(y, X, endog=T, instruments=Z).fit()
print(result.summary())

Common Instruments

🎲 Random Assignment

In RCTs, randomization creates a natural instrument. Random assignment affects treatment but is independent of outcomes.

📍 Geographic Variation

Policy changes that vary by region can serve as instruments for local treatment exposure.

📅 Time Variation

Natural experiments like policy changes over time can create instrumental variation.

🏛️ Legal Requirements

Legal changes that affect treatment but not outcomes directly can serve as instruments.

Chapter 6: Causal Discovery

Learning causal structures from data

Causal discovery aims to learn the causal structure underlying observed data. This involves identifying directed edges in a causal graph and understanding which variables cause which.

Methods for Causal Discovery

1. Constraint-Based Methods

Use conditional independence tests to identify causal structures.

Example: PC algorithm, GES (Greedy Equivalence Search)
Strengths: Foundation on mathematical principles
Limitations: Requires faithfulness assumption

2. Score-Based Methods

Search over causal graphs and score them using likelihood or information criteria.

Example: GES with BIC score, LiNGAM
Strengths: Can handle complex structures
Limitations: Computationally expensive

3. Functional Causal Models

Assume specific functional forms and use them to infer causality.

Example: ANM (Additive Noise Models), Causal Transfer
Strengths: Identifiable under certain assumptions
Limitations: Requires specific model assumptions

4. Deep Learning Approaches

Use neural networks to learn causal structures from observational data.

Example: Neural Causal Models, DAG-GNN
Strengths: Can capture complex nonlinear relationships
Limitations: Requires large datasets

Practical Tools

Causal Discovery Libraries
# Using PyCausal (Python)
import pydotplus
from pycausal import pycausal as pc
pc = pc.PyCausal()

# Learn causal graph
graph = pc.search(cmhc, depth=2, verbose=True)
print(graph.getNodes())

# Using DoWhy (Python)
from dowhy import CausalModel

model = CausalModel(
    data=df,
    treatment='treatment',
    outcome='outcome',
    graph=causal_graph
)

# Identify estimand
identified_estimand = model.identify_effect()
print(identified_estimand)

# Estimate effect
causal_estimate = model.estimate_effect(
    identified_estimand,
    method_name="backdoor.propensity_score_matching"
)

Chapter 7: Causal Inference in ML

Applications and implementations

Key ML Applications

🎯 Uplift Modeling

Identify customers who respond to treatment (offer, ad, etc.) to optimize marketing campaigns and personalization.

🔍 Feature Importance

Understand which features actually causally affect outcomes, beyond correlation-based importance.

⚖️ Fairness & Bias

Detect and mitigate causal discrimination in ML models, ensuring fair treatment across groups.

🔮 Counterfactual Explanations

Provide explanations like "What would happen if we changed X?" for model predictions.

Recommended Libraries

📦 DoWhy

End-to-end causal reasoning library with identification, estimation, and refutation methods.

pip install dowhy

📦 EconML

Microsoft's library for causal inference using machine learning methods.

pip install econml

📦 CausalML

Uber's library for uplift modeling and causal inference with ML.

pip install causalml

Chapter 8: Best Practices

Guidelines for causal inference

✅ Do's:

  • • Test your identification assumptions
  • • Use multiple estimation methods
  • • Conduct sensitivity analyses
  • • Document your causal model
  • • Validate findings with experiments
  • • Consider confounders carefully
  • • Report uncertainty properly
  • • Be transparent about limitations

❌ Don'ts:

  • • Confuse correlation with causation
  • • Ignore unobserved confounders
  • • Overinterpret observational studies
  • • Skip assumption checks
  • • Use methods without understanding
  • • Neglect sensitivity analysis
  • • Misinterpret instrumental variables
  • • Ignore domain expertise

Common Pitfalls

Pitfall 1: Selection Bias

When treated and control groups differ systematically, standard methods can give biased estimates. Use randomization or careful matching.

Pitfall 2: Post-Treatment Bias

Including variables affected by treatment in your analysis can bias estimates. Only adjust for pre-treatment variables.

Pitfall 3: Weak Instruments

Weak instruments can lead to large standard errors and biased estimates. Always test instrument strength.