Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating New Variable in Panel Data that takes the value of certain (observation variant) years

    Im working with a set of panel data which includes data on children, their parents, education, etc, as well as parents income over years, parents education over years, etc... I just recently noticed that instead of using data on average income over time, I need to find average parental income over the first 3 years of a childs life. Because children are born in random years throughout the data set I'm having a huge problem figuring out how to get this info into another variable. I have all the information needed in my set, but given that i have thousands of observations there is no way that I can go through and average over each childs first few years individually. Is there a way to loop in which i call info from a certain year? Thanks so much for the help!

  • #2
    Please post a dataset example: http://www.statalist.org/forums/help#stata 12.1

    Comment


    • #3
      Sorry about that! My real data is under a confidentiality agreement so i just made a simplified example. Basically what I need to do is generate a new variable that equals the average income from the year the child is born and the 2 years following. My actual data set has tons of observations and tons of years and Im having a lot of difficulty figuring out what my next step would be. I think the real issue is that i dont now how to link a variables outcome to another (observation variant) piece of data. Any hlep is greatly appreciated
      Attached Files

      Comment


      • #4
        As with most things in Stata, this is much easier if you convert the data to long layout.
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(personid childid income2000 income2001 income2002 income2003 income2004 income2005 income2006 income2007 income2008 income2009 income2010 child_yob)
         1  11 500  400 500 600  700  540 324 479 500 400 800 2006
         2  21 345  355  56 746   34 4665   7 667 667 366 900 2002
         3  31 765 3578 667 657  225  488 656  45 457 455 678 2004
         4  41 567  464 747  46 5656  324 437 743 436 467   5 2004
         5  51   7  748  87 546   47  345   9   6 347 647 564 2009
         6  61 758  367 799 863   67  754 346 647  86 543 784 2000
         7  71 345  535 757 567 4567   76 466 756 435 678 784 2001
         8  81 566  224 856  34  546  746 778 765 676 346 345 2005
         9  91 663  535 453 564  754   45 890  56 675 321 325 2008
        10 101 568  354 456 745  463 6457  57 346 324 234 768 2008
        end
        
        reshape long income, i(personid childid child_yob) j(year)
        egen three_years_income = ///
            total(income*inrange(year, child_yob, child_yob+2)), by(personid)
        
        reshape wide
        I have included a -reshape wide- at the end to bring you back to the original wide layout. But, honestly, whatever you're going to do after this is probably also easier in long layout, so if I were you I'd omit the -reshape wide- and just stick with long layout. It really works much better for almost everything in Stata.

        Note also that I have posted your data above using the -dataex- command, which you can get from -ssc install dataex-. It is the preferred way of showing example data. While attaching a Stata file is an acceptable alternative, some people are reluctant to download files from strangers. -dataex- is completely safe and provides a 100% faithful replica of your example data that someone who wants to experiment with it can use almost effortlessly.

        By the way, I renamed all of your variables to lower case for my convenience. The use of capitalization in the original was very inconsistent and I kept making mistakes every time I tried to type variable names. You can do the same, if you like, with -rename *, lower-, or you can stick with your names if you're comfortable with them and just make the corresponding changes in the code.

        Comment


        • #5
          Thanks for the quick response, this should be really helpful and sorry about the tricky variable names!

          Comment


          • #6
            I got the first part to work on my data which is exciting, but now i need to do basically the same thing for a variable called "hours worked." I tried to use the same command, with variable names changed obviously changed, but on the second iteration of the command it returns an error that says "variable year contains all missing values." Is this due to my return to wide formatting? Thanks!

            Comment


            • #7
              Michelle as Clyde note without reproducible example we can't say much. Clyde suggestion to reshape to long form is your best way to achieve your goals with minimum mistakes and time. However, you can keep working with the wide form as follows.
              Few notes, you asked for average income over three years and the results show the sum of income over that period, so you should divide it by three. You should also decide how to calculate the average for those who born after 2008 (less then 3 years window).

              Code:
               clear
              input float(personid childid income2000 income2001 income2002 income2003 income2004 income2005 income2006 income2007 income2008 income2009 income2010 child_yob)
               1  11 500  400 500 600  700  540 324 479 500 400 800 2006
               2  21 345  355  56 746   34 4665   7 667 667 366 900 2002
               3  31 765 3578 667 657  225  488 656  45 457 455 678 2004
               4  41 567  464 747  46 5656  324 437 743 436 467   5 2004
               5  51   7  748  87 546   47  345   9   6 347 647 564 2009
               6  61 758  367 799 863   67  754 346 647  86 543 784 2000
               7  71 345  535 757 567 4567   76 466 756 435 678 784 2001
               8  81 566  224 856  34  546  746 778 765 676 346 345 2005
               9  91 663  535 453 564  754   45 890  56 675 321 325 2008
              10 101 568  354 456 745  463 6457  57 346 324 234 768 2008
              end
              
              * The preferred way to address this issue
              
              reshape long income, i(personid childid child_yob) j(year)
              egen three_years_mean_income = ///
                  total(income*inrange(year, child_yob, child_yob+2)/3), by(personid)
              
              reshape wide
              If you are reluctant to reshape the data, then this is your way with long form

              Code:
               g three_years_mean_income2 =.
               forval i=2000/2010 {
                  cap egen child_yob_mean_`i'=rowmean(income`i' income`=`i'+1' income`=`i'+2')
                  cap replace three_years_mean_income2=child_yob_mean_`i' if child_yob==`i'
                  cap drop child_yob_mean_`i'
              }
              Last edited by Oded Mcdossi; 09 Jun 2016, 04:52.

              Comment

              Working...
              X