Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference Model estimation

    The following equation is estimating a difference in difference model for a panel data.
    Y=a+b*Group+c*time+d *(Group*time)+error, where d is the treatment effect. My hypothesis says that the treatment effect is negative.
    I use :
    xtreg y Group time Group*time, fe
    I found that the coefficient for Group*time is positive.
    To make a cross check, when I just use
    reg y Group time Group*time
    I found that the treatment effect is negative. I don’t know what would be right command.
    Again, when I use xi: reg y Group time Group*time
    I found that the treatment effect is negative. To be honest, I don’t know the exact meaning of xi: reg command.


  • #2
    To be honest, I don’t know the exact meaning of xi: reg command.
    Just as well. In the particular instance you show, the xi: actually does nothing at all, and it is just the same as "reg y Group time Group*time." (Or, more accurately, it would be if that were legal syntax. Stata does not allow expressions like Group*time in the regressor list of an estimation command.) And, in any case, if you are using the current or a recent version of Stata, the -xi:- command is essentially obsolete. It has been superseded by factor variable notation. See -help fvvarlist- and the associated manual section for an explanation of that. Factor-variable notation is far better to use than the old -xi- (especially for a difference in differences model) because then you can use -margins- to calculate adjusted means and marginal effects. Nearly all estimation commands in Stata (and many non-estimation commands as well) support factor variable notation, and most of those that don't are ancient commands whose functions are better carried out with more modern commands that do support it. Yes, there do remain a few odd-ball situations where you really need -xi:-, but they are really unusual. So basically, it's best if you nearly forget you ever heard about -xi-.

    So what you want to do is:
    Code:
    xtreg y i.Group##i.time, fe
    (I'm assuming here that time is a dichotomous variable, or at least a discrete one.)

    Now, you are perplexed that you get results with different sign for the treatment effect when you use -regress- instead of -xtreg-. There are two possible causes of this.

    1. -xtreg- estimates within panel effects only, whereas -regress- estimates a blend of within and between panel effects. Here's an illustration of data where the within and between panel effects have opposite signs. It's not a DID model, but the principle is the same:

    Code:
    clear
    set obs 5
    gen panel_id = _n
    expand 2
    
    set seed 1234
    by panel_id , sort: gen y = 4*panel_id - _n + 3 + rnormal(0, 0.5)
    by panel_id: gen x = panel_id + _n
    
    xtset panel_id 
    
    xtreg y x, fe
    regress y x
    
    //    GRAPH THE DATA TO SHOW WHAT'S HAPPENING
    separate y, by(panel_id)
    
    graph twoway connect y? x || lfit y x
    This produces the same kind of contradiction in sign that you found with your data. The graph makes it clear what's going on. Within panel, y decreases as x increases. But the overall trend of the data is one of increasing y with increasing x.



    2. In -xtreg-, any panels for which there is only a single observation contribute nothing to the estimation. In a full difference-in-differences design you shouldn't have any such panels, but if your design is incomplete or there is a lot of missing data, then you might end up with that. In that case, it is possible that your -xtreg- analysis is being carried out on a smaller sample than the -regress- analysis is. How do the N's look for both of those? Are they the same? If not, then what I showed above may not be your situation and it may just be a matter of a biased sample in your -xtreg- analysis.

    Comment


    • #3
      Thank you so much for your detail explanation. It really makes me clear.

      Regarding N (if you mean the number of observation), it is same 74445 in both of the case (xtreg and reg). There are 449 missing values. The missing values are not considered in each of the regression.

      I really appreciate your valuable time.

      Thank you.

      Comment

      Working...
      X