Is the use of sampling weights in regression always best or are there tradeoffs that need to be considered?

Andrew Kenny

Join Date: Sep 2017

Posts: 27
#1

Is the use of sampling weights in regression always best or are there tradeoffs that need to be considered?

29 Nov 2017, 18:00

The data that I am working with employed oversampling of subgroups & the data provides sampling weights to maintain generalizability in the results of analyses.

Is the use of those weights in regression (logistic) always best or are there tradeoffs that need to be considered?

Does use of the weights result in imprecise confidence intervals and/or imprecise p-values?
Tags: None
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#2

29 Nov 2017, 18:39

You may find this punny JHR paper by Solon, Haider, and Wooldrige relevant.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4951
#3

29 Nov 2017, 18:42

I suspect most people use weighting, especially for getting estimates of descriptive statistics like means. But for things like logit and OLS, not everyone thinks it should be universally done. See

http://www.annualreviews.org/doi/abs...-011516-012958 [Are Survey Weights Needed? by Bollen et al, 2016]

"At a time when most surveys have unequal probabilities of selection either by design or by other practical constraints, the question of whether to weight variables during the analysis takes on added importance. If weighting data were a cost-free option, then always weighting would be a reasonable strategy. But unnecessarily weighting means lower efficiency and lower statistical power. Tests that determine whether weights are required do exist, but they are rarely applied for several reasons. One is the lack of awareness among researchers. Another is the influence of tradition in different fields—some always weight and others never do. An additional reason is that some of these tests are not readily available in software packages. Furthermore, even when these tests are easy to implement, there is little guidance on which of the many tests to choose."

https://projecteuclid.org/euclid.ss/1190905511 [Struggles with Survey Weighting and Regression Modeling. by Gelman, 2007]

"Survey weighting is a mess. It is not always clear how to use weights in estimating anything more complicated than a simple mean or ratios, and standard errors are tricky even with simple weighted means. (Software packages such as Stata and SUDAAN perform analysis of weighted survey data, but it is not always clear which, if any, of the available procedures are appropriate
for complex adjustment schemes. In addition, the construction of weights is itself an uncodified process.)"

https://muse.jhu.edu/article/581177 [What Are We Weighting For? Gary Solon, Steven J. Haider, Jeffrey M. Wooldridge, 2015]

"When estimating population descriptive statistics, weighting is called for if needed to make the analysis sample representative of the target population. With regard to research directed instead at estimating causal effects, we discuss three distinct weighting motives: (1) to achieve precise estimates by correcting for heteroskedasticity; (2) to achieve consistent estimates by
correcting for endogenous sampling; and (3) to identify average partial effects in the presence of unmodeled heterogeneity of effects. In each case, we find that the motive sometimes does not apply in situations where practitioners often assume it does."

https://scholar.harvard.edu/cwinship...ng_weights.pdf [Sampling Weights and Regression Analysis. By Winship and Mare, 1994]

"When a researcher is going to perform a regression analysis with data that have sampling weights, what should be done? First, the analyst should estimate two models: one with unweighted data (OLS) and one using the sampling weights (WOLS). If the parameter estimates are substantively similar, then the OLS estimates are preferable because they are more efficient and the estimated standard errors will be correct... When OLS and WOLS produce different parameter estimates, the researcher needs to carefully consider the possible reasons. One possibility is that the model may be missing linear, nonlinear, or interaction terms."

Variations of that last piece of advice appear in various places. At least in an OLS regression, if weighted and unweighted analyses give very different results, that may be a sign your model is mis-specified.

Having said all that, I tend to just go ahead and weight, partly because I don't think I'm smart enough to figure out how and when to not weight.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
3 likes
Comment
Kerstin Schmidt

Join Date: Apr 2017

Posts: 120
#4

26 Jul 2019, 02:57

With regard to "Tests that determine whether weights are required do exist, but they are rarely applied for several reasons. One is the lack of awareness among researchers. Another is the influence of tradition in different fields—some always weight and others never do. An additional reason is that some of these tests are not readily available in software packages. Furthermore, even when these tests are easy to implement, there is little guidance on which of the many tests to choose":

How can one perform a test that checks whether the weighted and unweighted probit models are significantly different from zero? I am looking for something like "wgttest", which is for linear models, for the nonlinear case. Does anyone know of such a test?
Comment

Announcement

Is the use of sampling weights in regression always best or are there tradeoffs that need to be considered?

Comment

Comment

Comment