Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best way to introduce categorical control variables

    Hi, I am trying to find the effect of financial distress on subjective well being. I have already built an Index of financial distress from eight ordinal variables using factor analysis. I have rescaled both life satisfaction and index of financial distress at a 0-1 scale. Now I want to introduce some categorical control variables before running the fixed effect panel data model. Control variables are for example (sex - male/female, job status - emp, selfemp, unemp, retired, fulltimestudent, others, marital status - unmarried, married, livingascouple, widow, divorced, education - higherdegree, A level, O level, others). My question is what would be the best way to introduce these categorical variables in fixed effect panel data. I have 12 year unbalanced panel for 110,000 observations (in total). Any suggestion is highly appreciated.

  • #2
    Note that gender and any other time-invariant variables will drop out in a fixed effects model (although you could still have interactions with gender). Other than that, I am not sure what else to tell you, other than to use factor variable notation, e.g.

    Code:
    xtreg y i.empstat i.marstat i.educ, fe
    If you are talking about sequencing of models, I suppose you could have one model with the control variables followed by the model with the explanatory variables, or vice versa. The main thing to be careful of is that missing data doesn't change the cases analyzed as you add more variables.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 19.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thanks Richard, i.empstat - will it be an intercept dummy or interaction dummy? I generated intercept dummy using tab, gen (e)....e1 e2 e3. Should I use interaction dummy?

      Comment


      • #4
        i.empstat will give you an "intercept" dummy. In modern Stata there are many advantages to using this factor-variable approach to categorical variables, rather than using -tab, gen()-. So I would get rid of e1 e2 and e3, and use i.empstat for that.

        As for whether you should use interaction dummies, that depends on your theoretical model of the data generating process. If you expect there to be interactions between empstat and other variables, then include interactions. If you think the effect of empstat will be the same, independent of the values of other variables, then don't include interactions. That's a question of the science of your field that requires expertise in that domain, not just statistical expertise, to answer..

        Comment


        • #5
          I agree with Clyde, factor variables are almost always the way to go. For more on them, from within Stata type

          help fvvarlist

          Besides saving you the trouble of computing the dummy variables yourself, factor variables can be very useful with post-estimation commands. If interested, see

          http://www3.nd.edu/~rwilliam/stats3/Margins01.pdf

          Disclaimer: I link to my own handouts, not because they are so spectacular, but because I can find their URLs in 2 seconds. There are lots of other sources on the web and elsewhere that you may find more helpful.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment

          Working...
          X