Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Syntax to calculate growth rates in panel data

    Hi Statalists,

    I want to calculate the growth rate of a variable over 5 periods of time for 27 states, but I'm not able to develop a syntax that provides me.
    The database I am using is attached. In it "ano" is the time variable, "uf" is the variable representing each state and "rendomedpc" is the variable whose growth rates want to get.
    The growth rates that want to get to each state concern the following periods: 95-98; 99-2002; 2003-2006; 2007-2009; 2011-2014.

    I hope I can count on your help in solving this problem more.

    Kind regards,

    Girlan Oliveira
    Attached Files

  • #2
    Please don't post .dta files.

    This is explained in the advice you were asked to read before posting.

    By your 73rd post, you should (please) have read the FAQ Advice!



    Finally, we ask that in general you don't post .dta or .zip files either. This is because
    • as above, it obliges members to fire up Stata (and/or some other program) with your file to see the problem, which could be difficult or time-consuming if you have a large or complicated dataset
    • members may have versions of Stata earlier than yours such that they can not read your .dta files anyway
    • threads become more difficult to understand if they depend on people reading in a dataset: short code and data examples are much easier to work with, as explained above.
    For the "above", see http://www.statalist.org/forums/help#stata

    Here is a sample of your data as you are asked to show them.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(ano uf rendomedpc)
      95 11  975.6389
      98 11  968.2878
      99 11  896.3077
    2002 11  718.9622
    2003 11  663.1624
    2006 11  774.7658
    2007 11  635.8459
    2009 11  797.6917
    2011 11    780.16
    2014 11  832.6358
      95 12 1050.6359
      98 12  826.6296
      99 12  885.5529
    2002 12  804.8505
    2003 12  656.3138
    2006 12  660.4998
    2007 12  740.1223
    2009 12  840.8483
    2011 12  684.5925
    2014 12  753.7031
      95 13  870.5806
      98 13  648.1306
      99 13  617.0092
    2002 13  638.8595
    2003 13 550.37836
    2006 13  605.0857
    2007 13  631.5233
    2009 13  698.4061
    2011 13  673.6045
    2014 13  796.6893
      95 14 1187.5206
      98 14  766.7264
      99 14  901.2971
    2002 14 553.67804
    2003 14  627.1276
    2006 14  680.9172
    2007 14 592.47943
    2009 14  767.9472
    2011 14  926.4311
    2014 14  839.9559
      95 15  797.8191
      98 15  710.6692
      99 15  694.0292
    2002 15  610.5915
    2003 15  493.7732
    2006 15  550.4216
    2007 15  592.3801
    2009 15  573.3862
    2011 15  659.4438
    2014 15  669.9393
      95 16  936.5588
      98 16  655.9751
      99 16  703.6441
    2002 16  719.6787
    2003 16  692.7316
    2006 16  683.3405
    2007 16  748.8529
    2009 16  840.7599
    2011 16  692.6713
    2014 16  955.5656
      95 17  515.8236
      98 17  523.9055
      99 17  475.4421
    2002 17  502.4276
    2003 17  491.9682
    2006 17   606.397
    2007 17  677.1903
    2009 17  757.0548
    2011 17  688.0908
    2014 17  837.4318
      95 21  393.7482
      98 21  405.5603
      99 21  388.2256
    2002 21  417.8217
    2003 21  393.8078
    2006 21  455.4535
    2007 21  491.6816
    2009 21  533.8673
    2011 21  484.4335
    2014 21 593.55304
      95 22  395.6797
      98 22 348.31555
      99 22  345.7753
    2002 22  453.2411
    2003 22  387.5361
    2006 22  493.4386
    2007 22 506.75305
    2009 22  536.2724
    2011 22  572.2326
    2014 22   662.446
      95 23  560.4462
      98 23 518.66595
      99 23  487.1562
    2002 23 434.26755
    2003 23  382.6499
    2006 23  469.0469
    2007 23  472.5932
    2009 23  554.7625
    2011 23  541.7267
    2014 23  600.7077
    end
    We see immediately two problems. First, the year is stored as 95, 98, 99 when before 2000. You understand this; we understand this easily; Stata has not got a chance of seeing this for itself. Thus this needs fixing:


    Code:
    replace ano = ano + 1900 if ano < 100
    Second, the data are irregularly spaced. This is soluble, but needs care.

    I define growth rate as change / (original * duration)). You might want other definitions, e.g. to multiply by 100.

    Code:
     
    bysort uf (ano): gen growth = (rendo - rendo[_n-1])/(rendo * (ano - ano[_n-1]))

    Comment


    • #3
      You didn't specify whether you want the growth rate between each date, or the average annual growth rate (since all your period of times are not of the same length)
      Moreover, do you want it in a single variable or not?

      Anyway, you could start by:
      Code:
      by uf (ano) : gen growth= (rendomedpc / rendomedpc[_n-1])-1 if ano!=ano[_n-1]+1 & ano!=2011
      if you want anual average growth rate, you simply weight the fraction by the inverse numbers of years
      Code:
      by uf (ano) : gen aa_growth= (rendomedpc / rendomedpc[_n-1])^(1/(ano-ano[_n-1]))-1 if ano!=ano[_n-1]+1 & ano!=2011
      These codes will create one variable, whose observations indicate the (average) growth rate up to the current year (starting from the boundaries you wanted).

      Best,
      Charlie

      Comment


      • #4
        Nick, I really did not think the presentation of years would generate problems, so I ignored this difference. However, following his guidance we established a standard for the variable.
        As for the length difference of periods, this really was the problem I saw as more complicated to resolve.
        I made the calculation of growth rates manually to check with the syntax of the results you suggested, but the results were different, then tried the syntax suggested by Charlie and this got results similar to those obtained by manual calculation.

        Once again thank you for your contributions and guidance.

        Girlan

        Comment


        • #5
          If you think that 93 and 1993 will give the same answers, I can only express surprise. But sorry that

          Code:
           bysort uf (ano): gen growth = (rendo - rendo[_n-1])/(rendo[_n-1] * (ano - ano[_n-1]))
          is likely to be closer to what you want.
          Last edited by Nick Cox; 24 Oct 2016, 13:01.

          Comment


          • #6
            Charlie, my goal was to get the growth rate between every date and not the average annual growth rate, so I used the first syntax you suggested changing the "by" by "bysort". It worked very well. To understand it better, however, I wonder why the expression:

            year! = year [_n-1] +1 & Year! = 2011
            Thank you very much.

            Girlan

            Comment


            • #7
              Girlan, I should warn you that I second Nick on the year issue, I first didn't noticed it, but he is right.

              These conditions are only to suit your data (so not a general formula):
              Here I noticed that if two years are followings (e.g. 2002 and 2003) you didn't wanted the growth rate between those two dates, but 2002 was the end of 99-2002 and 2003 is the beggining of 2003-2006 period.
              So I ask Stata not to compute the growth rate if the two years are consecutive (this is the first part.)
              Doing so you cover all the non-wanted years, except for 2011, because it would compute the 2009-2011 growth rate, which you didn't indicate to want, so I add the & ano!=2011 condition.

              Some remarks : I wrote "ano" not "year" since your specific variable is cold that way, so keep the denomination in line with your dataset.
              Then, I don't understand what you get by doing "manual" growth rate computation, there is no such things as approximate formula for this. The average anual growth rate is the formula I gave you, not any other.
              Finally, I remain wondering about comparing the total growth rate between period of times that do not share the same length. I don't know much about your topic, but I find it likely to causes troubles.

              I hope this is clearer now,
              Best,
              Charlie

              Comment


              • #8
                Thanks, Charlie for clarification.

                I just fooling myself when I write a new message and wrote a year term, not "year" as shown in my database and as you correctly stated in your syntax.
                As I have done the manual way of Calculations to compare with the results presented by Stata understand this "manual" as I have done the calculations in Excel.
                As regards the comparison of growth rates, this will not be done, so I think there will be no major problems in my work. The ideal for me was having to year 2010 data instead of 2009, however, the research that I am referring to was not held in 2010.
                Finally, I found that two consecutive years would be "year" and "year [_n-1]" and not "year" and "year [_n-1] +1". I do not understand this "+1".
                If I had data for the year 2010, the syntax that you produced would be the same that I should use?

                Again, thank you.

                Girlan

                Comment


                • #9
                  Girlan,
                  The syntax
                  Code:
                  if ano!=ano[_n-1]+1
                  specifies that the two following years (in rows _n-1 and _n) should not be consecutive, then _n should not be equal to (_n-1)+1

                  This is because in your example, two following years are two boundaries of different time periods you wanted to study, but a single year doesn't interest you.
                  You could remove this condition, and see what happens, you'll have the growth rate between 2002 and 2003 displaying, which you didn't ask for.

                  Without the "+1", the code simply check whether two following observations of ano variables correspond to the same value, but this doesn't appear in your data so I didn't check for that.
                  To also cover this issue, you could try:
                  Code:
                  if ano-ano[_n-1]>1
                  For the differences between two following observations of ano being superior to one (and thus to zero).
                  However, here you'll have to recode the pre-2000 years into 19xx not to lead to mistakes.

                  I hope this is clearer now,
                  Best,
                  Charlie

                  Comment

                  Working...
                  X