Analyzing Wage Determinants with Advanced Causal Inference

Project Overview
This project conducted a comprehensive econometric and machine learning analysis on simulated wage data to understand the causal relationship between education, experience, and wages. It demonstrated a full spectrum of modern causal inference techniques, from OLS and Random Forest for baseline and prediction, to Double Machine Learning (DML) and Causal Forest for robust causal effect estimation and identifying heterogeneous treatment effects.
Methodology
The methodology involved generating synthetic data to ensure a known ground truth for validating outcomes. It then systematically applied several analytical frameworks: OLS for baseline econometric relationships, Random Forest for predictive power assessment, Double Machine Learning to estimate average causal effects while controlling for confounders, and Causal Forest to explore and identify heterogeneous treatment effects across different population subgroups, supported by comprehensive visualizations.
Results
The analysis demonstrated strong convergence across methods, with OLS (0.085), DML (0.0847), and Causal Forest (0.0842) all closely approximating the true simulated education effect of 0.08. While Random Forest showed good predictive MSE (0.711), causal methods were essential for effect identification. Significant heterogeneity in education's impact was found across experience levels, with Causal Forest revealing complex interaction patterns and systematic variations in returns to education across different attainment groups.
Visualizations
Histogram of HTE (using Experience as effect modifier)
Scatter plot: HTE vs. Interaction of Education & Experience
Conclusion
This project successfully demonstrated the robustness and enhanced insight gained from combining traditional econometrics with modern machine learning for causal inference. The consistent results across diverse methods underscore the reliability of the estimated causal effects. Furthermore, identifying significant heterogeneous treatment effects emphasizes the critical need to look beyond average effects to understand nuanced impacts on different subgroups, offering a valuable framework for policy evaluation and data-driven decision-making.
