Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multilevel model - panel data syntax and using with marginsplot

    Hello. I have a few questions about multilevel modeling using the mixed command and would very much appreciate some guidance. I am using Stata 15.0.

    I am working with cross-national survey data that was collected first in 2006 and then again in 2016. I've combined the survey data and added several national level variables. The data is slightly unbalanced since the countries surveyed are slightly different for 2006 and 2016. For the variables below, the ones at the individual level are in lower case and the ones at the national level are in uppercase.

    1) How should I structure the multilevel model? I've assumed that individuals nested within countries, which are nested within year years. Does the following look right?

    mixed taxattitude sex age education gini sclass TAX || year: || country:

    Alternatively, I'm not that interested in year effects, so would it alternatively make sense just to keep the year effect in the fixed effects
    portion using year dummies?

    mixed taxattitude sex age education gini sclass TAX YEARDUMMY || country:


    2) If I use a cross-level interaction, should I include the variable at both the country and year level of the random effects portion like:

    mixed taxattitude sex age education gini c.class##c.TAX || year: sclass || year: sclass

    3) When using marginsplot to graph a cross-level interative effect, how do the graphs take into account the random effect part of the model? For the fixed portion, my understanding is that that other variables are held at their mean. How would the random portion be treated using the following code?

    mixed taxattitude sex age education gini c.class##c.TAX || year: sclass || year: sclass
    margins, dydx(sclass) at(taxattitude=(7(10)49)) vsquish
    margins, dydx(sclass) at(taxattitude=(7 49.37 )) vsquish
    margins, at(taxattitude=(7 49.37 ))
    set more off
    margins, at(taxattitude=(7(10)49) topbot=(1(1)10))
    marginsplot, noci xdimension(topbot) plotdimension(taxattitude, allsimple) legend(subtitle(Tax level on low income earners) rows(2)) recast(line) scheme(s1mono)

    4) I have about 23 countries and of those about two are not in the second wave of the survey. Is there a good way to get Stata to handle this level of unbalanced data?

    Thank you in advance for any help!

  • #2
    1) How should I structure the multilevel model? I've assumed that individuals nested within countries, which are nested within year years. Does the following look right?

    mixed taxattitude sex age education gini sclass TAX || year: || country:
    You only have two years. Including a year: level in the model is like doing a study with an N of 2. You definitely should not have a year: level in your model, whether you are interested in the year effects or not. Drop || year: and include i.year in your fixed effects part of the model. Do not, by the way, create a YEARDUMMY variable. Use the i.year factor variable notation, so it looks like:
    Code:
    mixed taxattitude i.sex age i.education /*?*/ gini sclass TAX i.year || country:
    Note that I have added an i. in front of sex. I assume here that age, gini, sclass, and TAX are all continuous variables. If any of them is in fact categorical, then it needs i. in front of it as well.

    Be sure to read -help fvvarlist- so as to get fully acquainted with factor-variable notation. You must use factor-variable notation in order for -margins- to work correctly. -margins- will run if you fail to use factor-variable notation, but in many instances the results you get will be wrong.

    Added: By the way, you do not have country nested in year (even if you had many years). You have country (almost) crossed with year. This almost crossed relationship is properly referred to as a multiple-membership model. If you were to model this data as if country were nested in year, the implication would be that the country called, say France, in 2006 is not the same country as the one called France in 2016.

    2) If I use a cross-level interaction, should I include the variable at both the country and year level of the random effects portion like:

    mixed taxattitude sex age education gini c.class##c.TAX || year: sclass || year: sclass
    First, we've already established that there should be no year: level in your model. But if there were, the question of whether to have a random slope (cross-level interaction) at that level would depend on the scientific model. If you expected the slope of sclass to vary across years, then you would include it; otherwise not. It's a question of selecting the analytic model for your data that corresponds properly to the science. If you aren't sure about the science, consult a colleague in your discipline, as that would not be a question about Stata or statistics.

    Next, the model you show here has two year: levels. That's clearly wrong. I'm guessing you meant || year: || country:. In any case, you would include the random slope at whichever level(s) you expected the slope to actually vary at based on the science.

    You have a variable c.class in the fixed effects part of your model. Is class the same as sclass, and you made a mistake, or is it a different variable? If sclass is a different variable from class, the sclass, in addition to appearing in the random effects part of the model must also appear in the fixed effects part. So assuming sclass is different from class, and that you want to model random slope at the country level, it would look like this:
    Code:
    mixed taxattitude i.sex age education gini c.class##c.TAX sclass || country: sclass
    If class and sclass are in fact the same variable (and assume its real name is sclass) then:
    Code:
    mixed taxattitude i.sex age education gini c.sclass##c.TAX || country:sclass
    suffices since the mention of c.sclass##c.TAX will cause Stata to automatically include sclass by itself in the fixed effects, so there is no need to mention it again there.

    3) When using marginsplot to graph a cross-level interative effect, how do the graphs take into account the random effect part of the model?
    They don't. -margins- after mixed deals only with the fixed portion of the model. This is usually not a problem, however, since the average of all random effects is always zero.

    For the fixed portion, my understanding is that that other variables are held at their mean.
    That is incorrect. If you want other variables held at their mean, you can get that by specifying the -atmeans- option. But the default is to leave all other variables at their actual observed values in the data. Consequently the results from -margins- are understood to be adjusted to the actual distribution of all variables (other than those with an -at()- specification) in the estimation sample.

    How would the random portion be treated using the following code?
    It wouldn't. And there is nothing you can do about that.

    margins, at(taxattitude=(7(10)49) topbot=(1(1)10))
    This will break your code, as there is no variable topbot in your -mixed- command.

    4) I have about 23 countries and of those about two are not in the second wave of the survey. Is there a good way to get Stata to handle this level of unbalanced data?
    You don't need to do anything. -mixed- automatically handles unbalanced data correctly. In fact, nearly all Stata commands do so. (The few that don't will simply halt with an error message if you try to use them with unbalanced data.) People often seem to make a point of distinguishing balanced from unbalanced data, but with modern software this distinction is hardly ever important. It certainly is not a concern at all with -mixed-. That said, sometimes when using -margins- people prefer not to adjust the results to the actual distribution of variables in the sample but to an artificial distribution in which the categorical variables were balanced. -margins- has an -asbalanced- option that does that, if that is what you want to do. As best I can tell, however, it is little used. After all, most analyses are attempts to find answers to questions about the real world, and in the real world hardly anything is balanced!

    Last edited by Clyde Schechter; 16 Aug 2018, 20:44.

    Comment


    • #3
      Wow, this really answers my questions...and more. Thank you so much, Clyde. I made a few typos, but I think this is close to what I want:
      mixed taxattitude i.sex age i.education gini i.sclass##c.TAX i.year || country:sclass But I have one quick follow up question. First, I have theoretical reasons to think that the slope will vary depending on the level of TAX, which is measured at the national level. So, should I rewrite as: mixed taxattitude i.sex age i.education gini c.sclass##c.TAX i.year || country:T AX

      Comment


      • #4
        mixed taxattitude i.sex age i.education gini i.sclass##c.TAX i.year || country:sclass
        Well, there is a problem here. Because you have used i.sclass in the fixed effects part, I infer that sclass is a categorical variable. Unfortunately, categorical variables do not work easily with random slopes in Stata's -mixed- command. This is one of the rare situations where it is necessary to bypass factor-variable notation and create your own indicators for the categories of sclass and use them. And they must then be used in the random effects part of the model. As written, with i.sclass in the fixed effects and sclass in the random part, you just have a mangled model (mangled because in the fixed effects part sclass is treated as discrete, but in the random-effects part it is treated as continuous) that might well run, but would be uninterpretable. Unfortunately, Stata syntax does not allow you to fix this by writing -|| country: i.sclass-. So you need to do something like this:

        Code:
        // CREATE INDICATORS ("DUMMIES") FOR LEVELS OF SCLASS
        tab sclass, gen(scls)
        drop scl1 // LEAVE OUT ONE
        // MULTI-LEVEL MODEL WITH SCLASS REPRESENTED BY
        // THE scls* INDICATORS
        mixed taxattitde i.sex age i.education gini i.(scls*)##c.TAX i.year || country: scls*
        I have theoretical reasons to think that the slope will vary depending on the level of TAX, which is measured at the national level. So, should I rewrite as: mixed taxattitude i.sex age i.education gini c.sclass##c.TAX i.year || country:T AX
        I assume here that by "the slope" you mean the slope of sclass (or, as corrected above, the scls* indicators). No, you would not make this modification. If you include || country: TAX, you are positing a model in which the slope of TAX itself varies among the countries. The variation of the slope of sclass depending on TAX is already captured in the model with the i.sclass##c.TAX (or, as corrected, i.(scls#)##c.TAX) term in the fixed effects.

        An important principle to bear in mind when coding multi-level models in Stata is that which level(s) to place a variable at in a multi-level model has nothing at all to do with the level at which that variable is defined. It depends entirely and exclusively on the levels of the model which contribute variation to the slope of that variable as a predictor of the outcome. The fact that TAX is a country-level variable gives it no claim at all to appearing in a term like || country: TAX. That would be warranted if and only if you want to model the effect of TAX on taxattitude as varying from country to country.

        Comment


        • #5
          This helps a lot. Thank you very much.

          Comment

          Working...
          X