Econ One’s expert economists have experience across a wide variety of services including antitrust, class certification, damages, financial markets and securities, intellectual property, international arbitration, labor and employment, and valuation and financial analysis.
Econ One’s expert economists have extensive industry specific experience. Our industry experience spans numerous industries including electric power markets, financial markets, healthcare, insurance, oil and gas, pharmaceutical, and more
Econ One’s resources including blogs, cases, news, and more provide a collection of materials from Econ One’s experts.
Ph.D. in Statistics, University of California, Los Angeles
M.S. in Statistics, University of California, Los Angeles
B.A. in Mathematics/Economics, Claremont McKenna College
Econ One, August 2008 ā Present
University of Pennsylvania, 2007 ā 2008
University of California Los Angeles, 2007 ā 2008
Self-Employed Statistical Consultant, 2004 ā 2008
RAND Statistics Group, 2006
Lockheed Martin Missiles and Space, 2001 ā 2003
U.S. District Court
State Court
Arbitration
Private Mediation
While this blog focuses on gender wage disparities between men and women, the methods described herein could be extended to non-binary, transgender, and other gender-diverse individuals.
Boosted regression, also known as boosting or generalized boosted models, is a statistical data mining tool that has proven highly effective in modeling an outcome variable as a function of a set of predictor variables. This non-parametric, data-adaptive technique allows the practitioner to uncover both linear and nonlinear relationships within data.
Furthermore, a series of boosted regression model diagnostics aid in quantifying (i) the importance of a given predictor variable, (ii) the relationship between the outcome variable and each predictor variable (e.g., linear, stepwise, piecewise, etc.), and (iii) the extent to which the predictor variables interact with one another.
In this blog post, we discuss the application of boosted regression as a means for evaluating wage gaps across genders. Actual data from an anonymized case study are used to demonstrate how to interpret boosted regression output.
Boosted regression modeling entails an iterative process in which the model grows little by little. They can be run using computational programs such as R or Stata. Textbooks covering boosted regression include but are not limited to āThe Elements of Statistical Learningā by Hastie, Tibshirani, and Friedman (2001), as well as āStatistical Learning from a Regression Perspectiveā by Richard A. Berk (2008).
The steps described below allow the data to identify the relationship of each predictor variable with the outcome variable, capture potential interactions, and reveal which predictor variables are most important.Ā Hereās how it works:
In an analysis of employeesā earnings, boosted regression can be used to model wages as a function of job attributes along with gender.
A boosted regression model can be informative in a number of respects. For example:
Consider a dataset that includes the following pieces of information about executives at a company that has offices scattered across the country:
In the case study below, boosted regression reveals a substantial gender wage gap between men and women among executives after accounting for differences across productivity, geography, and annual adjustments.1
Once the boosted regression model is constructed, one analytical task is to assess how well the model fits the data. This entails (i) calculating each predicted (i.e., estimated) outcome in the dataset, and (ii) comparing the predicted outcomes to the corresponding actual outcomes. The graph below shows that predicted earnings tracks actual earnings among executives at this company.2
The boosted regression model diagnostics reveal that earnings increases as productivity improves. The graph below suggests that for a given level of productivity, the average wage gap between men and women in this example ranges from $23,000 to $38,000.
The next graph shows the average difference in earnings across genders at each of the six office locations, holding productivity and calendar year constant. On average, the wage gap between men and women in this example is between $20,000 and $42,000.
The graphs by gender and year reveal that earnings increased from 2018 to 2022 and was followed by slightly lower earnings in 2023 and 2024. On average, the wage gap between men and women in this example is between $25,000 and $30,000 year over year.
Next, we examine the relative influence of each predictor variable in the boosted model. For a given number of iterations, the importance of a given predictor variable is measured based on how much the inclusion of that variable improves the boosted modelās performance. This is expressed as a percentage, where the total importance across all variables adds up to 100.
In this case, productivity is the most influential variable, accounting for 80% of the total improvement in model fit. The second most influential variable, geographic location, contributes approximately 15%, followed by fiscal year at 4%. Together, these three variables explain over 99% of the total influence.
Although gender accounts for less than one percent of the modelās error reduction, the previously discussed graphs suggest a wage gap between men and women amounts to tens of thousands of dollars. How much of the observed differences in earnings across genders is due to chance? Is this wage gap statistically significant? In a future blog post, we will explore a methodology for answering this question and revisit our case study.
Boosted regression offers a data-adaptive tool for analyzing an outcome variable as a by-product of a given set of predictor variables. This algorithmic technique can be applied to gender wage gap analyses, providing detailed insights into the factors that drive wage disparities. By modeling wages as a function of various job attributes along with gender, we can uncover complex relationships and quantify the impact of different predictors.
Boosted regression is an iterative process that enhances a model by correcting errors through a series of smaller models. This approach has proven to be effective at providing a representative depiction of the data.
Boosted regression can be used to model wages as a function of job attributes along with gender. This approach helps quantify the relationship between wages and gender, as well as the interaction between job attributes and gender.
Boosted regression quantifies the relative importance of each predictor based on the percent reduction in error. A predictor with a relatively high percent reduction in error is considered to have a greater impact on the accuracy of the model.
Boosted regression is indeed versatile and can be effectively used to analyze wage disparities across various demographics and job attributes. For example, the method could be used to compare wages across races and/or age brackets.
1 In this instance, the boosted regression model was constructed using the āgbmā library in R. The total number of iterations was set to 2,000, and a learning rate of 1 percent was applied. Subsequently, the cross-validation technique described in Step 8 suggested that the cumulative error was at a minimum after 790 iterations.
2 The R-squared value from a simple linear regression of predicted earnings (generated using boosted regression) against actual earnings is approximately 80%.