Los economistas expertos de Econ One tienen experiencia en una amplia variedad de servicios, como defensa de la competencia, certificación colectiva, daños y perjuicios, mercados financieros y valores, propiedad intelectual, arbitraje internacional, trabajo y empleo, y valoración y análisis financiero.
Los economistas expertos de Econ One cuentan con una amplia experiencia en sectores específicos. Nuestra experiencia abarca numerosos sectores, como los mercados de la energía eléctrica, los mercados financieros, la sanidad, los seguros, el petróleo y el gas, la industria farmacéutica, etc.
Los recursos de Econ One, que incluyen blogs, casos, noticias y mucho más, ofrecen una colección de materiales de los expertos de Econ One.
Doctorado en Estadística, Universidad de California, Los Ángeles
Máster en Estadística, Universidad de California, Los Ángeles
Licenciatura en Matemáticas/Economía, Claremont McKenna College
Econ One, Agosto 2008 - Presente
Universidad de Pensilvania, 2007 - 2008
Universidad de California en Los Ángeles, 2007 - 2008
Consultor estadístico autónomo, 2004 - 2008
RAND Statistics Group, 2006
Lockheed Martin Misiles y Espacio, 2001 - 2003
Tribunal de distrito de EE.UU.
Tribunal del Estado
Arbitraje
Mediación privada
While this blog focuses on gender wage disparities between men and women, the methods described herein could be extended to non-binary, transgender, and other gender-diverse individuals.
Boosted regression, also known as boosting or generalized boosted models, is a statistical data mining tool that has proven highly effective in modeling an outcome variable as a function of a set of predictor variables. This non-parametric, data-adaptive technique allows the practitioner to uncover both linear and nonlinear relationships within data.
Furthermore, a series of boosted regression model diagnostics aid in quantifying (i) the importance of a given predictor variable, (ii) the relationship between the outcome variable and each predictor variable (e.g., linear, stepwise, piecewise, etc.), and (iii) the extent to which the predictor variables interact with one another.
In this blog post, we discuss the application of boosted regression as a means for evaluating wage gaps across genders. Actual data from an anonymized case study are used to demonstrate how to interpret boosted regression output.
Boosted regression modeling entails an iterative process in which the model grows little by little. They can be run using computational programs such as R or Stata. Textbooks covering boosted regression include but are not limited to “The Elements of Statistical Learning” by Hastie, Tibshirani, and Friedman (2001), as well as “Statistical Learning from a Regression Perspective” by Richard A. Berk (2008).
The steps described below allow the data to identify the relationship of each predictor variable with the outcome variable, capture potential interactions, and reveal which predictor variables are most important. Here’s how it works:
In an analysis of employees’ earnings, boosted regression can be used to model wages as a function of job attributes along with gender.
A boosted regression model can be informative in a number of respects. For example:
Consider a dataset that includes the following pieces of information about executives at a company that has offices scattered across the country:
In the case study below, boosted regression reveals a substantial gender wage gap between men and women among executives after accounting for differences across productivity, geography, and annual adjustments.1
Once the boosted regression model is constructed, one analytical task is to assess how well the model fits the data. This entails (i) calculating each predicted (i.e., estimated) outcome in the dataset, and (ii) comparing the predicted outcomes to the corresponding actual outcomes. The graph below shows that predicted earnings tracks actual earnings among executives at this company.2
The boosted regression model diagnostics reveal that earnings increases as productivity improves. The graph below suggests that for a given level of productivity, the average wage gap between men and women in this example ranges from $23,000 to $38,000.
The next graph shows the average difference in earnings across genders at each of the six office locations, holding productivity and calendar year constant. On average, the wage gap between men and women in this example is between $20,000 and $42,000.
The graphs by gender and year reveal that earnings increased from 2018 to 2022 and was followed by slightly lower earnings in 2023 and 2024. On average, the wage gap between men and women in this example is between $25,000 and $30,000 year over year.
Next, we examine the relative influence of each predictor variable in the boosted model. For a given number of iterations, the importance of a given predictor variable is measured based on how much the inclusion of that variable improves the boosted model’s performance. This is expressed as a percentage, where the total importance across all variables adds up to 100.
In this case, productivity is the most influential variable, accounting for 80% of the total improvement in model fit. The second most influential variable, geographic location, contributes approximately 15%, followed by fiscal year at 4%. Together, these three variables explain over 99% of the total influence.
Although gender accounts for less than one percent of the model’s error reduction, the previously discussed graphs suggest a wage gap between men and women amounts to tens of thousands of dollars. How much of the observed differences in earnings across genders is due to chance? Is this wage gap statistically significant? In a future blog post, we will explore a methodology for answering this question and revisit our case study.
Boosted regression offers a data-adaptive tool for analyzing an outcome variable as a by-product of a given set of predictor variables. This algorithmic technique can be applied to gender wage gap analyses, providing detailed insights into the factors that drive wage disparities. By modeling wages as a function of various job attributes along with gender, we can uncover complex relationships and quantify the impact of different predictors.
Boosted regression is an iterative process that enhances a model by correcting errors through a series of smaller models. This approach has proven to be effective at providing a representative depiction of the data.
Boosted regression can be used to model wages as a function of job attributes along with gender. This approach helps quantify the relationship between wages and gender, as well as the interaction between job attributes and gender.
Boosted regression quantifies the relative importance of each predictor based on the percent reduction in error. A predictor with a relatively high percent reduction in error is considered to have a greater impact on the accuracy of the model.
Boosted regression is indeed versatile and can be effectively used to analyze wage disparities across various demographics and job attributes. For example, the method could be used to compare wages across races and/or age brackets.
1 In this instance, the boosted regression model was constructed using the “gbm” library in R. The total number of iterations was set to 2,000, and a learning rate of 1 percent was applied. Subsequently, the cross-validation technique described in Step 8 suggested that the cumulative error was at a minimum after 790 iterations.
2 The R-squared value from a simple linear regression of predicted earnings (generated using boosted regression) against actual earnings is approximately 80%.