Back to Portfolio

Analyzing Wage Determinants with Advanced Causal Inference

Analyzing Wage Determinants with Advanced Causal Inference

Project Overview

This project conducted a comprehensive econometric and machine learning analysis on simulated wage data to understand the causal relationship between education, experience, and wages. It demonstrated a full spectrum of modern causal inference techniques, from OLS and Random Forest for baseline and prediction, to Double Machine Learning (DML) and Causal Forest for robust causal effect estimation and identifying heterogeneous treatment effects.

Methodology

The methodology involved generating synthetic data to ensure a known ground truth for validating outcomes. It then systematically applied several analytical frameworks: OLS for baseline econometric relationships, Random Forest for predictive power assessment, Double Machine Learning to estimate average causal effects while controlling for confounders, and Causal Forest to explore and identify heterogeneous treatment effects across different population subgroups, supported by comprehensive visualizations.

Results

The analysis demonstrated strong convergence across methods, with OLS (0.085), DML (0.0847), and Causal Forest (0.0842) all closely approximating the true simulated education effect of 0.08. While Random Forest showed good predictive MSE (0.711), causal methods were essential for effect identification. Significant heterogeneity in education's impact was found across experience levels, with Causal Forest revealing complex interaction patterns and systematic variations in returns to education across different attainment groups.

Visualizations

Histogram of HTE (using Experience as effect modifier)

Scatter plot: HTE vs. Interaction of Education & Experience

Conclusion

This project successfully demonstrated the robustness and enhanced insight gained from combining traditional econometrics with modern machine learning for causal inference. The consistent results across diverse methods underscore the reliability of the estimated causal effects. Furthermore, identifying significant heterogeneous treatment effects emphasizes the critical need to look beyond average effects to understand nuanced impacts on different subgroups, offering a valuable framework for policy evaluation and data-driven decision-making.

Related Projects