Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choosing the right strategy to introduce time and geographical dummies in a regression

    Dear All,

    I am running a regression in which I have reasons to believe that the dependent variable could be potentially affected by the end of cold war i.e., 1990. I have a panel data of 70 countries across different regions from 1980-2015.

    Given this background, I have two questions:

    Q1) Which of the following strategy is the right one to introduce the dummy variable for Cold War.

    Strategy I
    Code:
    reg DV IV 1990.year
    Strategy II
    Code:
    generate CWDummy = 0
    replace CWDummy =1 if (Year>1990)
    reg DV IV CWDummy
    Strategy III
    Code:
    generate CWDummy = 0
    replace CWDummy =1 if (Year<1990)
    reg DV IV CWDummy
    DV stands for dependent variable and IV represent a set of independent variables.
    I am struggling to differentiate between the logic of above three regressions. I am getting very different results from each of these. In short, which one is most appropriate to evaluate the influence of the end of Cold War (Year 1990) on the dependent variable?


    Q2) As I have 70 countries in my data across 4 geographical regions. I want to control the heterogeneity in terms of countries. The idea behind introducing geographical dummies is three folds: First, I want to examine if the key or main IV is affecting the DV differently in each region. Second, if geographical location of the countries get some favourable treatment in terms of DV. Third, if geographical location of the countries get some favourable treatment in terms of IV.

    Strategy I
    Code:
    generate d1= 0
    generate d2= 0
    generate d3= 0
    generate d4= 0
    replace d1 =1 if (region ==1)
    replace d2 =2 if (region ==2)
    replace d3 =3 if (region ==3)
    replace d4 =4 if (region ==4)
    reg DV IV d1 d2 d3 
    reg DV IV d2 d3 d4
    Strategy II
    Code:
    reg DV IV i.region
    Strategy III
    Code:
    reg DV IV if region==1
    reg DV IV if region==2
    reg DV IV if region==3
    reg DV IV if region==4
    I dropped one dummy to avoid dummy variable trap from Strategy I and II. Now, my confusion is again to choose the right strategy as well as its interpretation. The above 3 regressions produced very different results. How to choose between them?
    Moreover, what does the sign and coefficient on the dummy tell us?

    Looking forward to your response.

    Best regards,
    Imran Khan


  • #2
    Imran:
    life gets way too complicated as it comes that you do not need to create yourself other nuisances.
    Hence:
    -why insisting with -regress- when -xt- commands were conceived to deal with panel dataset?
    - why creating categorical variables/interactions yourself when -fvvarlist- can do it for you?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,

      Many thanks for your reply.

      I am actually using -xtreg, forgot to write it here.

      So if I have got you correctly, do you suggest to stick to the following two commands for Cold War and geographical dummies?

      Code:
      xtreg DV IV 1990.year
      xtreg DV IV i.region
      Best regards,
      Imran Khan

      Comment


      • #4
        Imran:
        assuming that -re- is the right specification for your regression model (see -hausman- in this respect), I would go:
        Code:
        xtreg DV IV i.region i.CWDummy
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Dear Carlo,

          Many thanks for your reply.

          Can’t I use the following command directly instead of creating CWDummy?

          Code:
          xtreg DV IV i.region 1990.year
          If no, then which strategy I should follow to generate CWDummy (see Strategy II and II from the first message)?

          Best regards,
          Imran Khan


          Comment


          • #6
            Imran:
            try your last code and see if Stata gives back what you're after.
            I'm used to code factor variables with
            -i.<categoricalvariable>- (personal taste, you know...).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear Carlo,

              This is where the confusion arises. The following codes give different results.
              Code:
               
               xtreg DV IV i.region 1990.year  
               xtreg DV IV i.region i.CWDummy
              So, how to choose between them?

              Best regards,
              Imran Khan

              Comment


              • #8
                Imran:
                results (that you should have posted, as you did with your codes) do not differ when I apply your approach to a toy-example:
                Code:
                use "http://www.stata-press.com/data/r15/nlswork.dta"
                . g flag=1 if year==70
                (26,848 missing values generated)
                
                . replace flag=0 if year!=70
                (26,848 real changes made)
                
                . xtreg ln_wage i.flag
                
                Random-effects GLS regression                   Number of obs     =     28,534
                Group variable: idcode                          Number of groups  =      4,711
                
                R-sq:                                           Obs per group:
                     within  = 0.0086                                         min =          1
                     between = 0.0180                                         avg =        6.1
                     overall = 0.0077                                         max =         15
                
                                                                Wald chi2(1)      =     244.04
                corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                
                ------------------------------------------------------------------------------
                     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                      1.flag |  -.1313552   .0084085   -15.62   0.000    -.1478355   -.1148749
                       _cons |   1.664364   .0060783   273.82   0.000      1.65245    1.676277
                -------------+----------------------------------------------------------------
                     sigma_u |  .38266832
                     sigma_e |  .31892099
                         rho |  .59011733   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                
                . xtreg ln_wage 70.year
                
                Random-effects GLS regression                   Number of obs     =     28,534
                Group variable: idcode                          Number of groups  =      4,711
                
                R-sq:                                           Obs per group:
                     within  = 0.0086                                         min =          1
                     between = 0.0180                                         avg =        6.1
                     overall = 0.0077                                         max =         15
                
                                                                Wald chi2(1)      =     244.04
                corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                
                ------------------------------------------------------------------------------
                     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                     70.year |  -.1313552   .0084085   -15.62   0.000    -.1478355   -.1148749
                       _cons |   1.664364   .0060783   273.82   0.000      1.65245    1.676277
                -------------+----------------------------------------------------------------
                     sigma_u |  .38266832
                     sigma_e |  .31892099
                         rho |  .59011733   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Dear Carlo,

                  Many thanks for your reply.

                  I was doing a mistake in generating the dummy variable. It worked fine now.

                  Best regards,
                  Imran Khan.

                  Comment

                  Working...
                  X