Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to use lag in stata

    I have an unbalanced panel data and all the variables are defined from year 2004 to 2012.
    It is a biennial survey dataset. I created it using xtset kidid Year, delta(2)

    I want to evaluate whether health of the parents in 2004 , 2006 and 2008 has impacted on the help received from the children in year 2008 , 2010 and 2012.
    That is to say, independent variables and dependent variable belong to different years.
    Or the idea is like this :
    regress (dependent in 2008 ,2010, 2012) on ( independent in 2004,2006 , 2008)

    The independent control variables are defined in a local macro using factor like below:

    local Var_Robustness i.Year ib(first).ParentAge i.ParentAgesquared ib(first).Race ib(first).Education

    The health indicator of the parents is
    HStay(hospital stay)


    The dependent variable is FinTrFromChild

    I tried code below:
    gen FinTrFromChildlead=L2.FinTrFromChild
    gen HStayl=F2.HStay
    foreach Var_Robustness in Year ParentAge ParentAgesquared Race Education {
    gen l_`Var_Robustness' = L2.`Var_Robustness'
    }
    xtreg FinTrFromChildlead ib(first).HStayl `l_Var_Robustness' if Gender==1 ,fe cluster(hhid)

    stata reports : insufficient observations

    parts of the summary in stata output:

    . tab HStay Year

    | Year
    HStay | 2004 2006 2008 2010 2012 | Total
    ----------------------+-------------------------------------------------------+----------
    No Hospital Stay | 11,096 10,870 10,543 12,378 12,771 | 57,658
    One or More Hospital | 4,663 4,731 4,800 6,098 5,206 | 25,498
    ----------------------+-------------------------------------------------------+----------
    Total | 15,759 15,601 15,343 18,476 17,977 | 83,156


    . tab HStayl Year

    | Year
    HStayl | 2004 2006 2008 | Total
    -----------+---------------------------------+----------
    0 | 6,847 6,337 6,641 | 19,825
    1 | 3,275 3,402 2,891 | 9,568
    -----------+---------------------------------+----------
    Total | 10,122 9,739 9,532 | 29,393




    . tab FinTrFromChildlead Year

    FinTrFromC | Year
    hildlead | 2008 2010 2012 | Total
    -----------+---------------------------------+----------
    0 | 9,635 8,783 9,080 | 27,498
    1 | 393 284 347 | 1,024
    -----------+---------------------------------+----------
    Total | 10,028 9,067 9,427 | 28,522


    . tab FinTrFromChild Year

    k7fcany:w7 |
    any |
    transfer | Year
    from kid | 2004 2006 2008 2010 2012 | Total
    -----------+-------------------------------------------------------+----------
    0.no | 14,980 13,581 14,534 17,575 17,046 | 77,716
    1.yes | 569 452 527 557 508 | 2,613
    -----------+-------------------------------------------------------+----------
    Total | 15,549 14,033 15,061 18,132 17,554 | 80,329




    Appreciate your help on how to correct this.

    Last edited by ichiro nakamoto; 30 May 2018, 10:46.

  • #2
    Since you specified -delta(2)- in your -xtset- command, when you specify the L2. operator, Stata looks four years back (4 = 2*2). And that means that your variables calculated using L2 are going to have missing values whenever the year is 2006 or earlier. Similarly, F2 will return missing values in years 2010 and 2012 because data four years ahead is not available. Remember that any observation that has a missing value in any variable mentioned in the regression is automatically excluded from the regression. So a lot of your observations are being wiped out by this. That combined with other missing values presumably is leaving you with too few observations to go on.

    To get the result from 2 years earlier, use L1. For two years later use F1.

    Also, your use of L and F in #1 seems to be reversed from what you see you want. As you wrote the code you are trying to predict past values of FinTrFromChild from future values of HStay. I think what you really want is to predict present values from past values, no?

    Also, there is no need to calculate these lagged variables. You can use the lag operator in the regression itself and Stata will calculate the appropriate values on the fly.

    So I think what you want is something more like this:

    Code:
    xtreg FinTrFromChild ib(first).L1.HStay L1.Var_Robustness if Gender==1 ,fe cluster(hhid)
    This code will model FinTrFromChild in terms of the two-year lags of HStay and Var_Robustness.

    Comment


    • #3
      Hi Dr. Clyde, thank you very much for your comment. You hit the points that I want to know.

      And you are right: I do want to predict present dependent values from past independent values because I hope to see
      whether there is a long-term impact. Whether past health of parents impact current help from the children Or
      equivalently whether current health status of parents influence future help received from the children.

      I redid the regression based on your advise and stata returned no error this time.

      One further puzzle that I still can not understand is that for example, when I use the code below :

      gen HStayl=F2.HStay

      And then tab the variable with year, the stata returns the following outcome, which seems like 4 lagged years (2004 is the lagged year of 2008 ?) but not leaded years.

      . tab HStayl Year

      | Year
      HStayl | 2004 2006 2008 | Total
      -----------+---------------------------------+----------
      0 | 6,847 6,337 6,641 | 19,825
      1 | 3,275 3,402 2,891 | 9,568
      -----------+---------------------------------+----------
      Total | 10,122 9,739 9,532 | 29,393

      In my understanding, when using --F2.--- with --delta(2)---- command, the outcome will show [2008, 2010,2012] if
      the total time periods in the sample are [2004,2006,2008,2010,2012]?

      Does it mean that I did something wrong here?

      Comment


      • #4
        So with -delta(2)-, F2.X means the value of X four years in the future (4 = 2*2). So in an observation with year 2008, F2.X is the year 2012 observation of X. No problem. And observations from years 2004 and 2006 are similarly easy. But in an observation with year = 2010 or 2012, F2.X means the value of X in 2014 or 2016, respectively, and you don't have any observations in those years, so F2.X will be missing values. That's why 2010 and 2012 don't appear in the header of that -tab- output, but 2004, 2006, and 2008 do.

        Comment


        • #5
          Thanks Dr. Clyde for your detailed explanation. I am clear with it now.

          Comment

          Working...
          X