Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • what is the purpose of adding year dummy variable?

    Hi guys,

    I found there are many papers including years as dummy variables in the form of:

    xtreg y x1 x2 x3 y1997 y1998 y1999 y2000

    when data is in 1997, then 1 otherwise o
    the same rule apply for 1998, 1999 and 2000

    May I ask what is the purpose of adding such year dummy variables?

    I think it is used for controlling some marco-economic events, such as depression, stock crash, etc. Thank you very much.

  • #2
    It is to capture any time-related effects that are not already in the model. Your example of a stock crash is a good one.

    Comment


    • #3
      DI:
      just an aside to Ben's helpful explanation: whenever you're intended to deal with dummies in Stata, take a look at -help fvvarlist- first.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Hi,

        I am still confused about the dummy variables, especially with year dummies.
        When we use a dummy, does it for the categorical variable? what I mean is, for example, related to a family with dependent or child/children. When a family with a child is 0; family with 2 children is 1; family with 3 children is 2; and family with more than 3 children is 3

        My questions are:
        1. What about the year dummy? does it the same as above, the year 2006 is 0, the year 2007 is 1, the year 2008 is 2 and so on. However, why some scholars use year dummy as 0 and 1 only even though they use year data for more than 2 years.
        2. What about in the panel data with unbalanced data. For example,

        Code:
         
        ID YEAR X1 X2 X3
        1 2000 2 21 32
        1 2001 3 31 42
        1 2002 4 41 52
        1 2003 5 51 62
        1 2004 6 61 31
        2 2000 3 54 53
        2 2001 4 63 43
        2 2002 5 73 54
        2 2003 7 73 65
        3 2002 4 31 65
        3 2003 5 51 34
        3 2004 6 51 23
        How is the year dummy for each ID?

        Appreciate your help and if there is an article that I can read, please inform me to make me easy to understand.

        Regards,

        Annur Wijayakusuma

        Comment


        • #5
          Annur:
          1) your idea about how year dummy work is correct. Probably the schoolars you mentioned created a categorical variable for each year and code 1 if the observation belongs to that year and 0 otherwise (it usually happens when the dataset is in -wide- format);.
          2) 2000=0; 2001=1 and so on, no matter whether the panel is balanced/unbalanced, For instance, the level of the categorical variable -year- for the panel #3 will be 2002=2, 2003=3; 2004=4 (assuming 2000=0).
          That said, onece you have informed Stata that you want -year- categorical (ie -i.year-) and your dataset is in the -long- format, Stata will create the right categorical variable automatically.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Hi Carlo,

            Thank you for your hlep.
            I am not quite sure what is the difference between using a categorical variable for each year that code 1 for the observation belongs to that year, the other is zero and categorical variable level, zero, 1, 2, and so on?

            in terms of STATA, whether it will process differently or the same if I use 1 and 0 or categorical level 0, 1, 2, and so on? If it is no difference, why some scholars created a year dummy as below, for example (Y2000, Y2001, etc)

            Code:
             
            ID YEAR Y X1 X2 X3 Y2000 Y2001 Y2002 Y2003 Y2004
            1 2000 24 2 21 32 1 0 0 0 0
            1 2001 52 3 31 42 0 1 0 0 0
            1 2002 65 4 41 52 0 0 1 0 0
            1 2003 23 5 51 62 0 0 0 1 0
            1 2004 54 6 61 31 0 0 0 0 1
            2 2000 64 3 54 53 1 0 0 0 0
            2 2001 52 4 63 43 0 1 0 0 0
            2 2002 63 5 73 54 0 0 1 0 0
            2 2003 73 7 73 65 0 0 0 1 0
            3 2002 25 4 31 65 0 0 1 0 0
            3 2003 42 5 51 34 0 0 0 1 0
            3 2004 62 6 51 23 0 0 0 0 1
            Regards,

            Annur Wijayakusuma

            Comment


            • #7
              Annur:
              basically the issue is that is easier to let Stata create categorical variable for you instead of creating them by hand,
              The way Stata processes them is the same.
              Probably other scholars cannot rely on -fvvarlist- analogue with their statistical packages.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Hi Carlo,

                Thank you for your feedback.
                It means that the year dummy with 0 and 1 or level dummy such as 0, 1, 2, and so on, STATA does not process it differently. For example, using

                Code:
                reg Y X1 X2 X3 i.YEAR
                the same as

                Code:
                reg Y X1 X2 X3 Y2000 Y2001 Y2003 Y2004
                Hopefully, my understanding is correct

                Regards,

                Annur Wijayakusuma

                Comment


                • #9
                  Annur:
                  the best way to clarify your thoughts about a Stata issue, is to try yourself and learn from the results.
                  From the following toy-example, you can see how the results are displayed differently with and without using -fvvarlist- notation (and understand why they behave so: the reason is the change of the reference category):
                  Code:
                  use "C:\Program Files\Stata16\ado\base\a\auto.dta"
                  . regress price i.foreign
                  
                        Source |       SS           df       MS      Number of obs   =        74
                  -------------+----------------------------------   F(1, 72)        =      0.17
                         Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
                      Residual |   633558013        72  8799416.85   R-squared       =    0.0024
                  -------------+----------------------------------   Adj R-squared   =   -0.0115
                         Total |   635065396        73  8699525.97   Root MSE        =    2966.4
                  
                  ------------------------------------------------------------------------------
                         price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                       foreign |
                      Foreign  |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
                         _cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
                  ------------------------------------------------------------------------------
                  
                  .
                  g domestic=foreign if foreign==0
                  replace domestic=1 if domestic==0
                  replace domestic=0 if domestic==.
                  . regress price domestic
                  
                        Source |       SS           df       MS      Number of obs   =        74
                  -------------+----------------------------------   F(1, 72)        =      0.17
                         Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
                      Residual |   633558013        72  8799416.85   R-squared       =    0.0024
                  -------------+----------------------------------   Adj R-squared   =   -0.0115
                         Total |   635065396        73  8699525.97   Root MSE        =    2966.4
                  
                  ------------------------------------------------------------------------------
                         price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                      domestic |  -312.2587   754.4488    -0.41   0.680    -1816.225    1191.708
                         _cons |   6384.682   632.4346    10.10   0.000     5123.947    7645.417
                  ------------------------------------------------------------------------------
                  
                  .
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Hi Carlo,

                    It seems to me there is no difference when I tried applying my example data.

                    First, I defined the years' dummy by clearly referring to Y2000, Y2001, and so on. The second regression, I defined using i.YEAR. The result has nothing different. I hope this can be applied through different analysis methods such as a GMM.

                    Code:
                    reg  Y X1 X2 X3  Y2000 Y2001 Y2002 Y2003 Y2004
                    note: Y2000 omitted because of collinearity
                    
                          Source |       SS       df       MS              Number of obs =      12
                    -------------+------------------------------           F(  7,     4) =    0.98
                           Model |  2137.23279     7  305.318969           Prob > F      =  0.5406
                        Residual |  1243.68388     4   310.92097           R-squared     =  0.6321
                    -------------+------------------------------           Adj R-squared = -0.0116
                           Total |  3380.91667    11  307.356061           Root MSE      =  17.633
                    
                    ------------------------------------------------------------------------------
                               Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                              X1 |   20.66079   17.72432     1.17   0.309    -28.54982     69.8714
                              X2 |   .1196712   .6973844     0.17   0.872    -1.816578    2.055921
                              X3 |  -.4526624   .6709333    -0.67   0.537    -2.315472    1.410147
                           Y2000 |          0  (omitted)
                           Y2001 |  -13.79767   21.72797    -0.64   0.560    -74.12418    46.52885
                           Y2002 |  -25.61095   28.58072    -0.90   0.421    -104.9638    53.74186
                           Y2003 |  -60.86425   44.68625    -1.36   0.245    -184.9332    63.20466
                           Y2004 |  -67.54295   59.87209    -1.13   0.322    -233.7745    98.68862
                           _cons |   7.098507   29.98312     0.24   0.824    -76.14799    90.34501
                    ------------------------------------------------------------------------------
                    
                    . reg  Y X1 X2 X3 i.YEAR
                    
                          Source |       SS       df       MS              Number of obs =      12
                    -------------+------------------------------           F(  7,     4) =    0.98
                           Model |  2137.23279     7  305.318969           Prob > F      =  0.5406
                        Residual |  1243.68388     4   310.92097           R-squared     =  0.6321
                    -------------+------------------------------           Adj R-squared = -0.0116
                           Total |  3380.91667    11  307.356061           Root MSE      =  17.633
                    
                    ------------------------------------------------------------------------------
                               Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                              X1 |   20.66079   17.72432     1.17   0.309    -28.54982     69.8714
                              X2 |   .1196712   .6973844     0.17   0.872    -1.816578    2.055921
                              X3 |  -.4526624   .6709333    -0.67   0.537    -2.315472    1.410147
                                 |
                            YEAR |
                           2001  |  -13.79767   21.72797    -0.64   0.560    -74.12418    46.52885
                           2002  |  -25.61095   28.58072    -0.90   0.421    -104.9638    53.74186
                           2003  |  -60.86425   44.68625    -1.36   0.245    -184.9332    63.20466
                           2004  |  -67.54295   59.87209    -1.13   0.322    -233.7745    98.68862
                                 |
                           _cons |   7.098507   29.98312     0.24   0.824    -76.14799    90.34501
                    ------------------------------------------------------------------------------
                    Regards,

                    Annur Wijayakusuma

                    Comment


                    • #11
                      Annur:
                      it simply depends on the way you created the categorical variables by hand.
                      That said, I would still exploit -fvvarlist- if the layout of your dataset supports it.
                      Last edited by Carlo Lazzaro; 08 Jun 2021, 08:57.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Annur: Unless T is large relative to N -- and you haven't explained much about your situation -- very few will believe your results if you don't allow a full set of time period dummies. For policy interventions there is usually a before-after period. If, say, the aggregate economy changes in the later years and you don't account for that using year dummies, you could easily conclude that an ineffective policy was effect. And the other way could happen, too: you could miss a policy effect by not controlling for aggregate fluctuations. This is what the time dummies do without having to think more about it.

                        One can be both completely flexible and lazy by using i.year or putting in d2001, d2002, d2003. These situations are rare in what we do, so take advantage of them when they arise.

                        Comment


                        • #13
                          Dear Carlo and Jeff,

                          Thank you for your explanation.
                          The data that I used is just a sample that I created by myself, so there is no specific situation for this. The reason I created this due to as I asked on item#4 which related to using years dummy either 0 and 1 or 0,1, 2, and so on.

                          Related to your explanation on the policy intervention

                          For policy interventions there is usually a before-after period. If, say, the aggregate economy changes in the later years and you don't account for that using year dummies, you could easily conclude that an ineffective policy was effect. And the other way could happen, too
                          This means that the year dummy could be 0 as before the policy change and 1 is after the change. For example, the policy changed in 2005, and the data we have is from 2000 - 2010. when the data is between 2000 - 2005, the year dummy is 0 and the remainder is 1. That is what I am understanding about the dummy variable. Please correct me if I am wrong.

                          However, some scholars put the year dummy as 1 when the data is for 2000, otherwise 0. The same rule as the other year 2001 and so on. The reason to put it that way because some scholars want to control the macroeconomy fluctuation. I am still confused about this and how to explain the result of the analysis.

                          Can you please explain this?
                          I apologize for my confusion, I want to learn and understand better about the year dummy. I appreciate your help.

                          Regards,

                          Annur Wijayakusuma

                          Comment

                          Working...
                          X