Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating Variables in Loops

    Hello,

    This is my first time posting and I'm a stata novice so please bear with me.

    I'm trying to construct a synthetic panel using the Survey of Consumer Finances from the Federal Reserve. They've put out a survey every three years since 1989 (so there are 8 surveys in all).
    My goal is to create 15 cohorts of people based on their age, and group three ages together in each cohort. For example this is my original code for my first cohort in 1989.

    gen cohort=age
    replace cohort=1 if age==27 | age==28 | age==29
    replace cohort=2 if age==30 | age==31 | age==32
    replace cohort=3 if age==33 | age==34 | age==35
    replace cohort=4 if age==36 | age==37 | age==38
    replace cohort=5 if age==39 | age==40 | age==41
    replace cohort=6 if age==42 | age==43 | age==44
    replace cohort=7 if age==45 | age==46 | age==47
    replace cohort=8 if age==48 | age==49 | age==50
    replace cohort=9 if age==51 | age==52 | age==53
    replace cohort=10 if age==54 | age==55 | age==56
    replace cohort=11 if age==57 | age==58 | age==59
    replace cohort=12 if age==60 | age==61 | age==62
    replace cohort=13 if age==63 | age==64 | age==65
    replace cohort=14 if age==66 | age==67 | age==68
    replace cohort=15 if age==69 | age==70 |age==71
    replace cohort=. if age>61 & if age<27
    label define cohort 1 "27-29" 2 "30-32" 3 "33-35" 4 "36-38" 5 "39-41" 6 "42-44" 7 "45-47" 8 "48-50" 9 "51-53" 10 "54-56" 11 "57-59" 12 "60-62" 13 "63-65" 14 "66-68" 15 "69-71" 16 ">71"

    In 1992, each age group in each cohort would be three years older, and so on.

    I've managed to create a loop to do commands that are uniform for every version of the survey, but I can't figure out how to automate the creation of these cohorts (and any calculations I want to do on them).

    Here is my code for the original loop:

    local filelist: dir . files "H:\stata\scf`y'.dta",
    foreach y in 1989 1992 2001 2004 2007 2010 2013 2016 {
    display "`y'"
    use "H:\stata\scf`y'.dta", clear


    Is there a way to create these groups of ages in each survey, increasing by 3 years each time using a loop? It's a huge amount of data so I am trying to avoid merging all 8 datasets.


    Thank you in advance for all of your help!

  • #2
    Your code for creating the variable cohort is far more complicated than it needs to be, and it is also self-contradictory. It is self-contradictory in that your label includes a value 16, but the code will never create a value 16 for cohort: 15 is the highest it goes, and anything older than that is replaced by missing value, not 16.

    The code to create a cohort for a single wave of this survey can actually be reduced to just two lines. (Actually, the two lines can be combined into a single line, but the resulting code is a bit opaque, so I won't do that.) So you only need to wrap that in a loop over 1992 through 2016. The code below does not deal with a separate data file for each of those waves but rather pretends that you wanted to create a series of variables cohort1992 cohort1995 ... cohort 2016 in a single data set. You can adapt the code by changing the -forvalues y = ...- loop to a loop over the separate files:

    Code:
    //    CREATE A TOY DATA SET TO ILLUSTRATE THE CODE
    clear*
    set obs 100
    gen age = _n
    
    //  ASSIGN COHORT AS A FUNCTION OF AGE
    //  IN EACH YEAR FROM 1989 THROUGH 2016, BY 3
    forvalues y = 1989(3)2016 {
        local base_age = 27 + `y' - 1989
        gen cohort`y' = floor((age-`base_age')/3) + 1
        replace cohort`y' = . if !inrange(cohort`y', 1, 15)
    }
    The above code does not generate value labels for these variables. That would be a bit more complicated than the cohort variables themselves. But I also think that in the end you will just find it confusing to label the cohorts by their age ranges, when those age ranges change from one survey to the next. The age variable itself will also be in your data set, right?
    Last edited by Clyde Schechter; 16 Aug 2018, 11:47.

    Comment


    • #3
      Yes, the age variable exists in the public dataset already.

      Sorry, but would you mind expanding on how to loop to a loop over the separate files? I can't seem to get it to work so all of my cohort are reflecting the 2016 survey.
      For example, if I tab age and cohort1989, I'm seeing the right ages have been grouped into the cohort categories, but reflecting the frequencies in 2016.

      Thanks for all of your help so far, this has helped me so much!

      Comment


      • #4
        I didn't intend for you to use that code to create a cohort1989 variable in the 2016 wave of the survey. I set up a loop creating such variables in a single data set because I don't have a series of data sets indexed by years from 1989 through 2016 to illustrate the code on. But let's create one, so you can see what I intended for you to do.

        Code:
        //    CREATE A  SERIES OF TOY DATA SETS TO ILLUSTRATE THE CODE
        clear*
        set seed 1234
        forvalues y = 1989 (3) 2016 {
            clear
            set obs 100
            gen age = runiformint(20, 90)
            tempfile wave_`y'
            save `wave_`y''
        }
            
        
        //  ASSIGN COHORT AS A FUNCTION OF AGE
        //  IN EACH YEAR FROM 1989 THROUGH 2016, BY 3
        forvalues y = 1989(3)2016 {
            display _newline(2) as result "Calculating cohort variable in `y' wave data"
            use `wave_`y'', clear
            local base_age = 27 + `y' - 1989
            gen cohort = floor((age-`base_age')/3) + 1
            replace cohort = . if !inrange(cohort, 1, 15)
            tabstat age, by(cohort) statistics(N min max)
            save `wave_`y'', replace
        }
        Now, in your situation, you will not create demonstration files: you already have them. And so you will start your code from where it says // ASSIGN COHORT AS A FUNCTION OF AGE. Moreover, instead of -use `wave_`y'', clear- you will have
        -use "H:\stata\scf`y'.dta", clear, and, similarly, the save command at the end of the loop will say -save "H:\stata\scf`y'.dta", replace-.

        Comment


        • #5
          Got it! Thank you, really appreciate your help.

          Comment

          Working...
          X