Home » Uncategorized » Using Boosted Regression to Quantify Gender Wage Gaps: Is the Difference Statistically Significant?

Services

Econ One’s expert economists have experience across a wide variety of services including antitrust, class certification, damages, financial markets and securities, intellectual property, international arbitration, labor and employment, and valuation and financial analysis.

Resources

Econ One’s resources including blogs, cases, news, and more provide a collection of materials from Econ One’s experts.

Blog
Get an Inside look at Economics with the experts.
Managing Director
Education

Ph.D. in Statistics, University of California, Los Angeles

M.S. in Statistics, University of California, Los Angeles

B.A. in Mathematics/Economics, Claremont McKenna College

Econ One, August 2008 – Present

University of Pennsylvania, 2007 – 2008

University of California Los Angeles, 2007 – 2008

Self-Employed Statistical Consultant, 2004 – 2008

RAND Statistics Group, 2006

Lockheed Martin Missiles and Space, 2001 – 2003

U.S. District Court

State Court

Arbitration

Private Mediation

Share this Article
May 21, 2025

Using Boosted Regression to Quantify Gender Wage Gaps: Is the Difference Statistically Significant?

Author(s): Brian Kriegler

While this blog focuses on gender wage disparities between men and women, the methods described herein could be extended to any comparison of specific gender groups such as non-binary or other gender-diverse individuals.

Table of Contents

Introduction

My last blog post was a primer on boosted regression that included a case study using actual anonymized data.  The boosted regression model estimated executives’ annual earnings as a by-product of calendar year, productivity, office location, and gender.  Among other findings, the analysis revealed (i) a gender wage gap over $20,000, and (ii) that the inclusion of the gender variable in the model resulted in a 0.68% reduction in prediction error.

Standard linear regression output includes, among other things, probability calculations signifying the likelihood that the actual results are due to random chance.  These probabilities are known as “p-values.”  Conversely with a boosted regression model, additional computational programming generally is needed in order to derive these probability calculations.

In a litigation setting such as employment discrimination, the attorneys and trier of fact may well be interested in statistical significance, i.e., where the p-value is below a pre-determined threshold.1  Is the observed wage gap statistically significant?  This blog post introduces a method for answering these statistical issues when employing boosted regression.  The same case study and data in the previous blog post are used for illustrative purposes.

A Recap of Boosted Regression Model Output

Two standard types of output from a boosted regression model include (i) a graph and/or table showing the marginal impact of a given predictor variable, and (ii) “variable influence” percentages.  Each was introduced in the previous blog post and will be described briefly below.

Marginal Impact2

From a boosted regression model, the practitioner is able to examine the marginal impact of individual predictor variables, holding all other predictor variables constant.  The typical output is a graph or table of estimated outcome values, given a set of potential predictor variable values.

Referring to the case study from the previous blog post, below is a bar graph showing estimated earnings for each combination of gender and office location.  The difference between a blue bar and the corresponding red bar signifies the estimated gender wage gap in each city.  This graph reveals a noticeable gap between men’s and women’s earnings in each city.  Across the entire dataset and holding all other predictor variables constant, the boosted regression model estimates the wage gap to be nearly $27,000.

Relative Influence

Also from a boosted regression model, the practitioner can see the reduction in total error attributed to each predictor variable.  This is referred to as “relative influence.”  Each predictor variable’s relative influence is between 0% and 100%, and the sum of all relative influence calculations must equal 100%.

Referring to the case study from the previous post, the combination of productivity, geographic location, and calendar year account for 99.32% of all relative influence.  The remaining 0.68% reduction in prediction error was attributed to the inclusion of the gender variable in the boosted regression model.

Remaining Statistical Questions

There are two topics that will drive the remainder of this blog post:

  • Is the observed wage gap larger than we would expect due to random chance?
  • Is the contribution of gender to reducing model error larger than we would expect due to random chance?

Methodology for Deriving P-Values and Assessing Statistical Significance

The statistical questions directly above can be answered using a combination of (i) random sampling, (ii) the definition of a p-value, and (iii) computational simulations.  To summarize:

  • Random sampling from the existing dataset allows the practitioner to gauge the amount of variability in the data.
  • By definition, a p-value measures the probability that one would observe a result as extreme or more, if one were to have access to a limitless number of sampled permutations from the population.
  • Computational simulations allow the practitioner to conduct a very large number of permutations.

The five steps below outline how to derive p-values for a predictor’s marginal impact and relative influence by simulating models with randomized gender values.  While focused on gender, this method can be applied to any predictor.

  • Step 1: For each observation in the original dataset, replace the original value of the gender variable with a randomly assigned value using one of the three methods described in the next sub-section.
  • Step 2: Run a new boosted regression model using the six steps described in the previous blog post, given the modified dataset created in Step 1.
  • Step 3A: Calculate the marginal impact of the variable of interest on the outcome variable from the boosted regression model in Step 2. In this instance, the variable of interest is gender, and the outcome variable is annual wages.
  • Step 3B: Additionally, calculate the relative influence for the variable of interest from the boosted regression model in Step 2.
  • Step 4: Repeat Steps 1 through 3 a very large number of times, g., 1,000. Each iteration yields the following:
    • A new dataset that includes randomly assigned gender values
    • A new boosted regression model
    • A new dollar amount for the marginal wage gap
    • A new percent reduction in error for the gender variable.

This creates a “baseline distribution” for both the marginal wage gap and gender variable’s relative influence.

  • Step 5A: Calculate the proportion of simulated boosted regression models in which the marginal wage gap is higher than the actual wage gap. Assess whether this result is “statistically significant.”  Broadly speaking:
    • If the proportion is greater than 10%, then the wage gap is statistically insignificant at the 10% level.
    • A proportion between 5% and 10% implies that the wage gap meets the threshold for statistical significance at the 10% level.
    • A proportion between 1% and 5% meets the threshold for statistical significance at the 5% level.
    • A proportion below 1% meets the threshold for statistical significance at the 1% level.
  • Step 5B: Similarly, calculate the proportion of simulated boosted regression models in which the gender variable’s relative influence is higher than the actual relative influence. Subsequently, assess whether the actual relative influence meets a threshold for statistical significance.

Three Approaches for Assigning Random Values for a Given Predictor Variable

One can generate a new predictor variable that is disassociated with the outcome using one of the following techniques:

  • “Coin Toss Approach” – Each observation has a 50% probability of being assigned a value of “male” and 50% probability of being assigned value of “female.”
  • “Shuffle Approach” – Take all of the gender values in the original data and place them in a random order. In each simulated dataset, the percentage of each gender remains the same as in the original dataset.
  • “Bootstrapped Shuffle Approach” – Take all of the gender values and sample with replacement. This allows the percentage of each gender to fluctuate from one simulated dataset to the next.  Compared to the Shuffle Approach, this likely increases the amount of variability across simulations.

The Case Study Revisited

The graphs below are derived using (i) the five steps described above, and (ii) the three techniques for randomly assigning gender to each observation in the dataset.  There are two sets of three graphs:

  • The first set of graphs show the “baseline distribution” of the wage gap between men and women.
  • The second set of graphs show the “baseline distribution” of the gender variable’s relative influence.

In each graph, the dotted line signifies the measurement from the original data.

The Probability of Observing a Wage Gap of Nearly $27,000 Due to Chance is Less than 1 in 1,000

The first set of graphs shows the baseline distribution for the gender wage gap.  Each marginal impact graph shows that the observed wage gap is more extreme compared to virtually all of the simulated wage gaps.3  Each dotted line signifies the original boosted regression model’s estimated wage gap of nearly $27,000.  Given that 1,000 simulations were conducted, this observed wage gap is statistically significant at the 1% level.

In both the simulated models and in general, it is not a given that the wage gap would necessarily favor men over women.  Therefore, when assessing statistical significance, the magnitude of the actual wage gap is compared to the absolute values of the simulated wage gaps, i.e., without regard to whether they favor men or women.

The Probability of Observing a Reduction in Error of 0.68% Due to Chance is Less than 2 in 100

The second set of graphs shows the baseline distribution for the gender variable’s relative influence.  Each of these graphs shows that the observed percent reduction in total error is the tail of the baseline distribution.  The “Coin Toss Approach” reveals statistical significance at the 5% level.  The other two approaches show statistical significance at the 1% level. 

Conclusion

P-values for boosted regression can be derived through sampling and simulation.  These diagnostic measurements can help assess whether observed patterns are likely due to random chance.  This approach provides a practical way to evaluate the statistical significance of boosted regression results, particularly in litigious applications such as discrimination analysis.

Frequently Asked Questions

  1. With boosted regression modeling, why are simulations needed to assess statistical significance?
    While several statistical software packages offer boosted regression (e.g., R and Stata), the “off the shelf” output does not necessarily include p-values. Simulations allow us to generate null distributions under the assumption of no effect, making it possible to evaluate statistical significance in a nonparametric, data-driven way.
  2. If a variable has a very small impact on the reduction in error, why look any further?
    Even a small reduction in error may be statistically significant, meaning it is unlikely to have occurred by chance. This distinction is potentially relevant in legal contexts, where statistical significance is of particular interest to the attorneys and/or trier of fact.
  3. What’s the difference between the Shuffle and Bootstrapped Shuffle Approaches?
    The Shuffle Approach reorders the original gender values without replacement, preserving the exact gender ratio. The Bootstrapped Shuffle Approach samples with replacement, introducing more variability and potentially offering a more robust test of statistical significance.
  4. How generalizable are these findings?
    The statistical significance applies to this specific dataset and model. While the methodology can be generalized to other settings, the specific findings are context-dependent.
References

1 Statistical significance is discussed in the Reference Manual on Scientific Evidence, and specifically, Reference Guide on Statistics by David H. Kaye and David A. Freedman (URL: https://nap.nationalacademies.org/catalog/13163/reference-manual-on-scientific-evidence-third-edition).

2 Some statistical software packages refer to this as “partial dependence.” See, e.g., https://journal.r-project.org/archive/2017/RJ-2017-016/RJ-2017-016.pdf

3 In both the simulated models and in general, it is not a given that the wage gap would necessarily favor men over women. Therefore, when assessing statistical significance, the magnitude of the actual wage gap is compared to the absolute values of the simulated wage gaps, i.e., without regard to whether they favor men or women.

Latest Related Resources and Insights