Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • do year effects in xtreg capture year effects for subpopulations?

    In some ways, this is more of a model-building question than a Stata question, but it is related to Stata procedures, so I hope it is acceptable. (If not, let me know and I can take it to StackExchange).

    My research question: What is the effect of (United States) state election policies on voter registration, both overall and for specific demographic groups?

    I am using the Census Bureau’s Current Population Survey (CPS) November Supplement, which asks about voter registration in even-numbered years. The survey is commonly used in election studies due to its size (~80,000 adult citizens each November, distributed across all states and DC) and a large number of demographic variables. Also, the CPS apparently has less of a problem with inflated responses about registration and voting (social desirability bias) compared to other surveys.

    I am using the individual-level observations (some researchers collapse the data to the state-year) in a logistic or OLS regression model (registration rates are about 70 to 80%). The dependent variable is “registered to vote” (binary, 1=yes) regressed on individual-level predictors, state-level policies and context, and state and year indicators. I’m clustering standard errors on the state, not the state-year (I think that is right).

    The total number of observations is >400,000 with 45 states and seven election cycles (“years”). The data cover 2008 to 2020. (Excluding 2020 on COVID grounds does not matter and perhaps not necessary given the record registration rates in 2020.)

    I usually run the model like this (Stata/MP 15.1):

    Code:
    xtreg registered i.year ($xvar)##($zvar), cluster(statefip) i(statefip) re
    
    margins i.race#i.policy1
    margins [email protected], contrast (eff)
    Where “statefip” is a state id, $xvar is a macro containing a set of categorical individual-level predictors (gender, race/ethnicity, etc.) and $zvar contains a few state-level predictors, including the policy of interest.

    Unexpected result: The interaction of the policy with the indicator for black citizens results in a negative, highly statistically significant, and substantial in size (two to three times that for most other individual-level predictors) impact of the policy. This is contrary to the results for most other race/ethnicity groups and surprising on theoretical grounds.

    Additional background:
    1. Registration rates for black citizens declined after 2012 (Obama’s last election) but rebounded in 2020 to the 2012 rate. (Rates for white citizens increased by more and are now higher than for black citizens.)
    2. Of possible importance: the policy of interest was first implemented in 2016 (except for one earlier state). By 2018, 12 states implemented the policy. By 2020, 18 states had.
    Questions:
    1. To what degree should I expect the year indicators (and state effects?) to control for any year(s)-specific trends in race registration?
    2. Should I interact race with year?
    3. I assume I don't need to interact race X year X policy. Yes? Doing so produces many empty cells (see background item 2 above) which cause -margins- to not produce results ("not testable"/"not estimable").
    Thank you!

  • #2
    Doug:
    what I can get here is why going -xtreg, re- if you have a categorical regressand: -xtlogit, re- seems more appropriate.
    In addition, whenever "weird" results come alive, personally I check whether the model has all the predictors and/or interactions included in the right-hand side of my regression equation.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Doug:
      what I can get here is why going -xtreg, re- if you have a categorical regressand: -xtlogit, re- seems more appropriate.
      In addition, whenever "weird" results come alive, personally I check whether the model has all the predictors and/or interactions included in the right-hand side of my regression equation.
      Hi, Carlo. The results are very similar with xtlogit or xtreg (see "Mostly Harmless Econometrics" for an unorthodox take on logistic regression; they're against it). Perhaps better reason, -xtlogit- with 400,000 variables takes forever, and -margins- with interactions afterward takes longer than forever. However, I might run xtlogit later. I've tested on a small sample of the full data set. Results are very close to -xtreg-. In any event, that is not the source of the problem.
      I'm not sure what you mean by "all the predictors." The individual-level predictors, other than i.year, are all interacted with the state-level predictors.

      Comment


      • #4
        Doug:
        a) yes, I'm aware of the Pischke and Nobel winner Angrist's take on -logit- (ubi maior...). However, most of their praise in favour of OLS event when the regressand is binary, seems to focus on the 2SLS comparison (Table 4.6.1, page 203 of Mostly Harmless Econometrics).
        In addition, the predicted values issue does not seem to be covered in that paragraph. Hence, if you're interested in marginal effects only, you can probably go -xtreg- (even though some reviewers may bark at that). Conversely, predicted values may bring about some nasty issues.
        Eventually, if you actually have rates, you may want to consider -xtpoisson- (with the burden of issues that it brings about).
        b) I meant to be sure that all the predictors/interactions needed to give a fair and true view of the data generating process you're investigating are included in the right-hand side of your regression equation (something that you probably double-checked already). This is what I usually do when "weird" results creep up.

        As far as your more practical issues are concerned:
        1) I would be more interested in the joint statistical significance of -i.state- and -i.year-, that you can test via -testparm-;
        2) you may give a shot to
        Code:
        i.race##i.year
        3) I usually shy away from three-way interactions, as they are difficult to get and even more cumbersome to disseminate in a meaningful way (the latter depends on your expected audience, though).
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Thank you, Carlo. (I particularly appreciate the ubi maior line )

          Comment

          Working...
          X