Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorical variable with numerous categories

    Hello again STATA Friends,

    another (minor) questio regarding catgeorical Control variables with numerous categories.
    We try to implement nationality and education as Control variables.
    There are 19 different countries which we coded 1 to 19 and 21 educational Backgrounds which we coded 1 to 21. The question is how we can include this arbitray ordere in our OLS Regression so it makes sense. Do we have to generate a new dummy variable for each category, respectively each country/ educational Background?

    Code:
    sum $ylist $xlist gender nationality education download vt
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
        accuracy |        578    .3564014    .4793506          0          1
            ISVO |        578     .200692     .400865          0          1
            CSVO |        578    .5155709    .5001904          0          1
            ASVO |        578     .283737    .4512012          0          1
          gender |        578    .4463668    .4975457          0          1
    -------------+---------------------------------------------------------
     nationality |        578    8.769896    3.782748          1         19
       education |        578    9.249135    5.994238          1         21
        download |        578    .1608997    .3677566          0          1
              vt |        578    .5363322    .4991102          0          1
    Many thanks and kind regards,
    Konstantin

  • #2
    Konstantin:
    the usual advice is to try to group together all categorical levels that could be safely grouped together.
    As far as education is concerned, I do not remeber I've ever seen more that 6-7 levels.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Carlo :D ok great already had a similar thought...let me try this for nationality and education.

      Comment


      • #4
        I'm sry I cant figure out how to griup the catgeorical Levels...somehow with egen Group!? :/

        Comment


        • #5
          Konstantin:
          -egen- with the -group- function will do the trick.
          I would then -lebel- the -new_education_ categorical variable.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            To build on Carlo's suggestions, if all you want is dummies to run a regression, then use factor variable notation i.country. If you want all the interactions between country and education, you do something like i.country##i.education . If you need the actual dummies, xi may help. But implementing Carlos' suggestion to reduce the number of categories is probably easiest using recode (although it can be done with one generate and a large number of replace statements).

            Comment

            Working...
            X