Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Guidance on Multilevel Modeling

    Hello everyone,

    I am working towards building a statistical model that predicts basic income preferences amongst voters using data from Wave 8 of the European Social Survey (2016). To control for country-level variation, I am using a multilevel logistic regression model, where observations at the level of individual respondents (level 1 units) are nested within countries (level 2 units). I have a fair amount of experience conducting regression analysis in STATA, but I am effectively a novice at multilevel modeling and lack a conceptual background in the technique.

    The dependent variable is a binary indicator of whether a respondent supports or opposes basic income. I have three sets of independent variables that I am looking to test in a series of models: individual demographic variables (such as age and income,) individual attitudinal variables (various preferences about redistribution,) and country-level contextual variables (such as the level of means-tested welfare spending in the respondent's country.) The first two sets of variables represent level 1 units, while the third set of variables represent level 2 units. To be clear, I am looking to (1) isolate the effects of individual-level predictors while controlling for country-level heterogeneity as well as (2) estimate the effect of country-level predictors on basic income preferences.

    1) I've read a decent amount of statistical literature suggesting that multilevel models with a small level-2 sample size leads to biased estimates for the level-2 standard errors. There are 21 countries included in my analysis. Will this pose an issue for the reliability of my country-level estimates? Is there another model or regression structure that I should consider in order to circumvent this problem? Some papers have suggested that a fixed effects model would allow me to estimate the “moderating effect” of a country-level variable through cross-level interactions. However, I am interested in measuring the direct effect of each country-level variable, particularly because I have many level 1 and level 2 variables, and it would be too time-constraining to measure moderating effects between a country-level variable and every individual-level variable.

    2) Is it necessary to specify a random slope in the regression model? In what situation would it be appropriate to do so? Currently, the syntax of my base regression structure looks like

    Code:
     melogit basicincome agea age2 gndr hinctnta eduyrs uemp5yr mbtru_curr mbtru_prev RTIscore [pw=pweight]|| cntry:
    where there is no specification of a coefficient in the random effects syntax, introduced after the "||".

    3) Is there any way to calculate or determine the explained variance of each of the models?

  • #2
    With 21 countries, you are probably at some risk for small sample bias, and I don't think there is any way around that other than expanding the data set to include more countries. Your country level estimates will be somewhat inflated. At 21 countries, it probably isn't terrible, though.

    Random slopes, like any other variable, should be included in the model when you believe that they are an important aspect of the data generating process. If you think that the effect of any of your variables on support for basic income will differ materially across countries, then you should include a random slope for that variable (even if it is an individual level variable) in the model. For variables whose effects you expect will be, for practical purposes, similar across countries, no random slope is warranted.

    That said, models with large numbers of random slopes can be difficult to estimate, and even become unidentifiable in smaller data sets. If you find yourself encountering convergence problems, the first things you should try removing from the model would be the random slopes that you think are least important contributions to outcome variation.

    There probably is no useful way to calculate anything that is analogous to ordinary least squares regression's R2 statistic. First of all, you have a logistic model, and there aren't really any good analogs for R2 in single-level logistic regressions. All the less so when we go to multilevel models. If there is something useful along these lines, I'm not aware of it and would be interested to learn of it from somebody else responding here.

    Comment

    Working...
    X