Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in difference method for panel data

    Hi
    I am writing my first thesis paper. I have a panel data of 140 developing countries for the year 1997-2015 and my main explanatory variable is implementation of paternity leave law which is dummies but not time invariant. My purpose is to see the impact of the law on gender employment gap.
    Now my problem is I want use diff and diff method to graph and see if there is a parallel trend or not between the countries having paternity leave from 2004-2012 and 2013-2015. Here I need your suggestion too how can I set the model and work with the data?
    At first I did run OLS to see the impact where I got highly significant results. Then ran fe and got insignificant result at the end.

    xtreg emp_gap1 pat_law (control variables), fe cluster(id)
    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	4.5 KB
ID:	1385097




    gen treatment = 1 if id==1 | id==2| id==4 | id==5 | id==6 | id==7 | id==8 | id==9 | id==14| id==16 | id==19 | id==20 | id==22 | id==24| id==25 | id==26 | id==28 | id==29 | id==31 | id==32 | id==33 | id==35 | id==38 | id==41 | id==43 | id==44| id==47 | id==49 | id==54 | id==55 | id==56 | id==57 | id==60
    gen time = (t>=2004) & !missing(t)
    gen did = time*treatment
    reg emp_gap1 time treatment did, r
    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	12.8 KB
ID:	1385098






    1) should I group the countries for specific year? if so then how?
    2) Planning to set the countries who implemented the law from 2013 to 2015 as controls and the others as a treatment group. Can I? if so then how?

    pardon for mistakes as I am very new here..
    Best
    Nuzaba

  • #2
    Something is wrong here: your did variable should not end up being omitted. It is the key variable for your analysis. One clue is in your definition of the time variable: your data begins in 2004 and you define time as (year >= 2004), so your time variable is actually a constant: it will be 1 in every observation. That being the case, did and treatment will be equal (because did = treatment*time = treatment*1 = treatment), so one of those will drop. But it is not clear to me why the other is also being omitted.

    Be that as it may, you need to redefine your time variable so that it distinguishes before the implementation from after. Then you say
    2) Planning to set the countries who implemented the law from 2013 to 2015 as controls and the others as a treatment group.
    But that seems to contradict what you say earlier, where you have already defined the treatment and controls by:
    Code:
    gen treatment = 1 if id==1 | id==2| id==4 | id==5 | id==6 | id==7 | id==8 | id==9 | id==14| id==16 | id==19 | id==20 | id==22 | id==24| id==25 | id==26 | id==28 | id==29 | id==31 | id==32 | id==33 | id==35 | id==38 | id==41 | id==43 | id==44| id==47 | id==49 | id==54 | id==55 | id==56 | id==57 | id==60
    So then the controls are just all of the id's that do not appear in that command. And in any case, why would you include a country that implemented the law as a control: by definition they're in the treatment group.

    So when was the law actually implemented? Was it implemented in the same year in all countries that implemented it? If so, your time variable should be defined as year >= that particular year. If not, do you have a variable identifying which year it was implemented in each country? And then we have the problem of how to define the time variable for the countries that are not in the treatment group.

    Comment


    • #3
      Hi Clyde
      Thanks for your answers. I am also aware of defining my treatment in a wrong way. I do have the first implementation year for each country and wanted to set to groups of year as example, 2004-2012 and 2013-2015. Here wanted to set the later group as a control for the first group and to see whether they have parellel trend or not. There will be no threshold but only the starting year 2004 and ending 2015.
      But the confusion is how can I set that by command.
      And another problem is if I am having high significant result in OLS and also same after FGLS then why it is insignificant in FE. Am I giving wrong command or something else I need to check?
      Best
      Nuzaba

      Comment


      • #4
        And another problem is if I am having high significant result in OLS and also same after FGLS then why it is insignificant in FE.
        Well, there are several possiblities to consider here:

        First and foremost, you are talking about three different models and there is no particular reason to think that they will give the same, or even generally similar results. In particular, the FE model adjusts for any time-invariant attributes of whatever you declared as the panel variable in -xtset-. It may well be that once those are taken into account, your originally "significant" result goes away because it was really just serving as a proxy for something else that you have now taken into account. So, this would mean that you have corrected the original analyses' omitted variable bias.

        Second, the sooner you abandon thinking about effects as being "significant" or not "significant," the sooner you will be able to function well in the world of data. How different are the coefficients themselves? "Significant" is an arbitrary cut-point imposed upon what is in fact something continuous: the effect size as measured by the coefficient. You may find that the coefficients have not varied much, but because of artificial things like changes in degrees of freedom, the same (or nearly the same) coefficient changes from "significant" to "not significant" or vice versa. Such changes are really not meaningful. By focusing on statistical "significance" you are falsely led to think of them as important when they are not. Focus on your coefficients and the confidence intervals. Have those actually changed in any consequential amount? If not, then the models are, as it happens, really in agreement.

        Third, the FE model is a purely within-panel estimator. It is possible for the effects of a variable over time within an entity are different from the effects of that same variable between different entities. This came up in another thread just yesterday. See http://www.statalist.org/forums/foru...y-the-opposite for a fuller explanation.

        Here wanted to set the later group as a control for the first group and to see whether they have parellel trend or not. There will be no threshold but only the starting year 2004 and ending 2015.
        I'm still a bit confused what you are trying to do here. Let me guess that it goes something like this:

        Code:
        gen byte group = .
        replace group = 1 if inrange(implementation_year, 2004, 2012)
        replace group = 2 if inrange(implementation_year, 2013, 2015) // "CONTROLS"
        collapse (mean) outcome, by(group year)
        separate outcome, by(group)
        graph twoway line outcome* year, sort
        This will create a graph with two lines (curves) on it, each representing the trajectory of mean outcome over the years in each group.


        Comment


        • #5
          Hi Clyde
          Thanks a lot. Your information about choosing the model is very helpful for me.
          I did successfully created the groups as you suggested but when I separate outcome and graph it, there are many lines. I am having this problem for all my graphs. How can I solve that?
          graph.PNG

          This is happening after creating lots of id.

          graph 2.PNG


          Best
          Nuzaba

          Comment


          • #6
            Nuzaba, you did something very different from what I suggested.

            You don't show your code, but looking at the results, it is clear that you did -separate emp_gap, by(id)-. My suggestion was -separate emp_gap, by(group)-, where group is a variable that takes on only two values: 1 and 2, as generated by the code in #4. That variable, group, would have distinguished those who implemented between 2004 and 2012 from those who implemented later. You have instead plotted a separate curve for each id in the data set, and, unsurprisingly, the result is a mess.

            In order to make good use of help on a forum like this you need to a) be more forthcoming about your data by posting examples, b) be more forthcoming about the code you are using by showing that as well, and, when given a suggested solution, implementing it directly. If you make a major change to it, as you did here, you should not be surprised if the results are unsatisfactory. Even making minor changes can be risky: details in coding are extremely important. When you do make changes, you need to understand how the suggested code works and make sure that the changes you make don't break it.

            Comment


            • #7
              Hi again
              Yes I understood what went wrong with my commands. Pardon for misunderstanding. I will keep your suggestion in mind.
              Thanks again
              Best
              Nuzaba

              Comment

              Working...
              X