Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Building a panel dataset from individual level data using loops

    Hi guys,

    I am quite new to Stata and I can't find a solution to the following problem:

    I am working with individual level survey data. I have generated many new variables in this dataset such as (mean wage for low-skill people, mean wage for high-skill people, mean hours worked for low- skilled people etc.) for each year and region.

    For further analysis I would like to create a new frame where for each variable (e.g. low-skill wage) I pick the mean value for each region and year.

    So, actually I want to create a Panel Dataset with mean values from the individual dataset by region and year.

    My code so far looks as follows:

    frame create resultsregion region year ls1_L ls1_ln ls1_lf ls1_M ls1_mn ls1_mf ls1_H ls1_hn ls1_hf ls2_L ls2_ln ls2_lf ls2_M ls2_mn ls2_mf ls2_H ls2_hn ls2_hf rrhwage_L rrhwage_ln rrhwage_lf rrhwage_M rrhwage_mn rrhwage_mf rrhwage_H rrhwage_hn rrhwage_hf

    frame post (mean(region)) (mean(year)) (mean(ls1_L)) (mean(ls1_ln)) (mean(ls1_lf)) (mean(ls1_M)) (mean(ls1_mn)) (mean(ls1_mf) (mean(ls1_H)) (mean(ls1_hn)) (mean(ls1_hf)) (mean(ls2_L)) (mean(ls2_ln)) (mean(ls2_lf)) (mean(ls2_M)) (mean(ls2_mn)) (mean(ls2_mf)) (mean(ls2_H)) (mean(ls2_hn)) (mean(ls2_hf)) (mean(rrhwage_L)) (mean(rrhwage_ln)) (mean(rrhwage_lf)) (mean(rrhwage_M)) (mean(rrhwage_mn)) (mean(rrhwage_mf)) (mean(rrhwage_H)) (mean(rrhwage_hn)) (mean(rrhwage_hf))


    I think I should use a foreach loop with the frame post command but I have no idea how to specify it.

    Thank you for your help.

    Kind regards,

    Leandro

  • #2
    Apart from needing to wrap this in a loop, youru -frame post- syntax is incorrect. You might approach this as follows:

    Code:
    frame create resultsregion region year 
    ls1_L ls1_ln ls1_lf ls1_M ls1_mn ls1_mf ls1_H ls1_hn /// ls1_hf ls2_L ls2_ln ls2_lf ls2_M ls2_mn ls2_mf ls2_H ls2_hn ls2_hf rrhwage_L rrhwage_ln /// rrhwage_lf rrhwage_M rrhwage_mn rrhwage_mf rrhwage_H rrhwage_hn rrhwage_hf levelsof region, local(regions) foreach r of local regions { levelsof year if region == `r', local(years) foreach y of local years { local topost (`r') (`y') foreach v of varlist ls1_L ls1_ln ls1_lf ls1_M ls1_mn ls1_mf ls1_H ls1_hn ls1_hf ls2_L/// ls2_ln ls2_lf ls2_M ls2_mn ls2_mf ls2_H ls2_hn ls2_hf rrhwage_L rrhwage_ln rrhwage_lf /// rrhwage_M rrhwage_mn rrhwage_mf rrhwage_H rrhwage_hn rrhwage_hf { summ `v', meanonly local topost `topost' (`r(mean)') frame post resultsregion `topost' } } }


    In other words, you cannot use the -egen, mean()- function embedded in the -frame post- syntax. You must calculate each mean separately and built up the list of expressions to be posted to the new frame.

    All of that said, there is a much simpler way to go about this:

    Code:
    frame copy default resultsregion
    frame change results region
    collapse (mean) ls1_L ls1_ln ls1_lf ls1_M ls1_mn ls1_mf ls1_H ls1_hn ls1_hf ls2_L///
        ls2_ln ls2_lf ls2_M ls2_mn ls2_mf ls2_H ls2_hn ls2_hf rrhwage_L rrhwage_ln  ///
        rrhwage_lf rrhwage_M rrhwage_mn rrhwage_mf rrhwage_H rrhwage_hn rrhwage_hf ///
        by(region year)
    Note: I assume that your working frame at the start of this code is default. If it is something else, change the -frame copy- command accordingly.

    Notice that there are no explicit loops in this code, nor any resort to using macros. It's clean and transparent.

    Comment


    • #3
      Dear Clyde,

      Thank you very much for your answer! This was exactly what I was looking for!

      Best regards,

      Leandro

      Comment

      Working...
      X