Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merging 20 years panel data -selecting variables of interest from each year file


    Dear statalisters!

    I want to ask if I can use foreach loop to keep variables of interst from each year and create one big file that contains income and expenses variables only.

    1. File names include year , eg. data1990svy.dta; data1991svy.dta ; data1992svy.dta etc


    2. in each file, income and expenses variables have also year suffix (e.g totinc1990 netinc1990 medexp1990 pocexp1990


    I tried to use foreach loop. but I am lost.

    I appreciate any help, thanks in advance
    Rob

    foreach file in C:\project\data1990svy.dta `

    do I have to list all the data file names after foreach

    foreach file in `C:\project\data*svy.dta' {
    keep totinc* netinc* med* exp*
    sort id
    merge 1:1 using `C:\project\data*svy.dta'
    save "`C:\project\ALL_SVY.dta'
    }

    I beleive this loop will save me time, instead of creating a file for each year, then merge them. Thanks


  • #2
    You don't say how many years you have, but let me assume they run 1990 through 2000. Then you could do the following:

    Code:
    cd C:\project
    use data1990svy, clear
    keep id totinc* netinc* med* exp*
    isid id  // VERIFY id IS UNIQUE IDENTIFIER OF RECORDS
    forvalues y = 1991/2000 {
         merge 1:1 id using data`y'svy, keepusing(totinc* netinc* med* exp*)
         drop _merge
    }
    save ALL_SVY, replace
    A couple of thoughts: 1. whatever you're going to be doing with this data, it is likely that it will be easier to do with the data in long format. So consider doing a -reshape long- before the final save. 2. Does each data set contain exactly the same values of "id" or do the panels differ somewhat from year to year? If the former, you might verify this by adding -assert(match)- to the -merge- command. If the panels grow or shrink monotonically over time, then you might similarly verify that with -assert(match using)- or -assert(match master)- respectively. And in any case, if there is not an exact match of all values of id at each phase, you need to decide which unmatched records (if any) you want to -keep-.

    Comment


    • #3
      Thanks Clyde. very helpful.

      Comment

      Working...
      X