Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction Variable or Subgroup Comparison

    Good afternoon,

    I am working on a project that investigates the effect of media consumption on perception of corruption, and I would like to also examine the prevalence of such causal relationships among different gender and race group. My analytic approach is to first compare the effects on men and women, and then women by race and men by race. I used two ways to do the gender comparison, the interaction and the "suest". Here are the example commands (I took out all the control variables to make it look cleaner):

    Code:
    xi: regress perception black asian latino i.male*local_print_newspaper
    Code:
    reg perception black asian latino local_print_newspaper if male==1
    est store male
    reg perception black asian latino local_print_newspaper if male==0
    est store female
    suest male female
    test [male_mean]local_print_newspaper=[female_mean]local_print_newspaper
    Unfortunately, I noticed that the results of each model look drastically different. The interaction variable is significant, while the "suest" test shows that the difference between male and female is not significant. I suppose it's because of my small sample size (1,000 men and 1,000 women), and when using suest I lost statistical power since it separates men and women into two regression models. So here are my questions: first, does my assumption make sense, that the interaction variable is perhaps a better way to analyze the subgroup differences? Second, if indeed, by separating men and women I lose statistical power, what would be a better way to do the comparison of men by race and women by race?

    I'll put my commands of men by race here too. I separate men and women into two datasets because I can't really think of another way to do the men by race and women by race analysis.
    Code:
    forval i=0/1 {
    preserve
    keep if male==1
    save male, replace
    restore
    }
    clear
    use "C:\Users\male.dta"
    Code:
    by race: reg perception local_print_newspaper
    And to test if the coefficients are significantly different from one race to the other:
    Code:
    xi: regress perception black asian latino local_print_newspaper i.black*local_print_newspaper
    xi: regress perception black asian latino local_print_newspaper i.asian*local_print_newspaper
    xi: regress perception black asian latino local_print_newspaper i.latino*local_print_newspaper
    xi: regress perception white asian latino local_print_newspaper i.asian*local_print_newspaper
    xi: regress perception white asian latino local_print_newspaper i.latino*local_print_newspaper
    xi: regress perception white black latino local_print_newspaper i.latino*local_print_newspaper
    Please let me know. I'm really concerned if I'm actually doing the right thing.
    Any advice is hugely appreciated.

    Best,
    Kevin

  • #2
    Unfortunately, I noticed that the results of each model look drastically different. The interaction variable is significant, while the "suest" test shows that the difference between male and female is not significant.
    Perhaps one of the most important, and most under appreciated, rules in statistics is that the difference between statistically significant and not statistically significant is, itself, not statistically significant!

    You don't show the actual results you got from these commands, so it is hard to comment in specific terms. Here are some general considerations:

    1. If your results are just barely statistically significant with one method and nearly so but not quite with another, that means nothing at all. Such findings are perfectly consistent with each other.

    2. Statistical significance is probably meaningless in this context anyway. The null hypothesis that the difference between men and women would be zero for a phenomenon like this is highly implausible and is just a straw man, so why waste time testing it. Quantify the magnitude of the male-female difference instead. And then decide whether it's large enough to be of practical importance. Quantify the uncertainty around the difference with a confidence interval.

    3. The use of the interaction model supposes that the residual variance is homoscedastic. So, look at your sex-specific regression outputs to see if the residual variance is substantially similar in both sexes. If it is, you can rely on the interaction method. If the residual variances are appreciably different then conclusions based on separate regressions are more reliable.

    4. Unless the effect size you are trying to estimate is very, very small, I would not consider a sample of 1000 men and 1000 women to be small. If the effect size is so small you can't pick it up in a sample of that size, it is hard to imagine it is of any practical importance.

    Next, you shouldn't be using -xi- for this (assuming you are using a recent version of Stata--if you are using an old version you are supposed to tell us that in your post). In fact, you should make strenuous efforts to forget you ever heard of -xi-.* -xi- has been superseded by factor-variable notation (-help fvvarlist-), which enables you to use the -margins- command after estimation. The -margins- command, and its companion, -marginsplot-, greatly facilitate the interpretation of interaction models by enabling you to effortlessly calculate the predicted values in the various combinations of the interacted variables and draw helpful graphs. The -margins- command is very powerful and does lots of things, so it is somewhat complicated to learn. The manual chapter devoted to it has extensive examples. But it may be simpler to begin with an overview of the simplest cases (which are most of you need here) written by Richard Williams: http://www.stata-journal.com/sjpdf.h...iclenum=st0260. So, for example:

    Code:
    regress perception i.race##i.sex##c.local_print_newspaper
    margins race#sex, at(local_print_newspaper = (numlist of interesting values of this variable))
    // MAYBE ALSO AS ABOVE WITH -contrast- or -pwcompare- OPTIONS
    marginsplot

    *Well, there are still a few commands that do not support factor variable notation and need -xi-. But they are mostly, themselves, old commands that are little used today because they have been superseded by more modern commands that subsume their function and do support factor variable notation. There are also occasional situations where the -xi:- prefix is needed to handle exotic situations in multi-level modeling.
    Last edited by Clyde Schechter; 25 Oct 2016, 13:49.

    Comment


    • #3
      Hi Clyde,

      Thank you so much for your suggestions. I'm using Stata 13.1 indeed. And in response to your comments:
      1. I doubt any of the magnitude differences will look "large enough". I apologize for not including the coding of variables in my original post. The perception variable is a ranking scale from 1-10. That's why I thought including the significance test will help in terms of making an argument on the gender differences. The p-value of interaction variable is 0.02, while the suest test is 0.06. Is the difference big enough to be concerned?
      In addition, if I decide to use the interaction variable on gender, how do I tell if the coefficients are different for men of different races?

      2. Second, thank you for the reminder because I've forgotten about the homoscedasticity assumption.

      3. I haven't heard of the margins command, and I'll go ahead and read the description. Thank you for introducing it.

      Comment


      • #4
        Just a few additions to Cylde's excellent points.

        I would like to also examine the prevalence of such causal relationships among different gender and race group.
        I seriously doubt that you can get even close to testing a causal relationship with what seems to be cross-sectional observational data and a plain vanilla OLS regression. Be careful with such claims when writing your paper or report.

        My analytic approach is to first compare the effects on men and women, and then women by race and men by race.
        Note that with separate regression models you are already doing both in a way. When you specify the interaction between gender and media consumption you restrict the differences in race to be the same for both sexes. In the separate models on the other hand you are not only allowing the effect of media consumption to vary by gender but also the differences in race. This is equivalent to a model where gender is interacted with media consumption and race.

        You might want to test a three-way interaction here, and factor variables combined with margins and possibly marginsplot make this rather simple. But I cannot really tell if this is what you want, as you do not state explicitly which hypotheses you want to test.

        Best
        Daniel

        Comment


        • #5
          Hi Daniel,

          Thank you for your suggestions. I will be aware of the use of terms in my analysis. I didn't think about the possibility of three-way interaction, and I might have to read into it. In terms of hypotheses, I treat this more as a exploratory study, as not much past studies have examined the prevalence of media within different race/gender intersections.

          Best,
          Kevin

          Comment

          Working...
          X