Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating multiple variables from loops within loops

    Sorry in advance for the long and convoluted question.

    I have a series of variables that represent various family members (the 'b' variables in the data example below) and their occupation (the 'e' variables). So in the data example, fl1b represents Family Member 1's specific relation to the subject (coded 1-9 for mother, sibling, cousin, etc.) and fl1e represents Family Member 1's occupation (coded 1-5 for various occupational categories).

    What I have been trying to do is generate new variables that pulls out only parent occupation.

    So, the new variable would be something like parent_occupation and its values would be the 1-5 values from the 'e' variables. My first attempt was something like this (in the 'b' variables, parents are represented with values 1 - 4):

    Code:
    gen parent_occupation = .
    foreach v of var fl*b {
                   replace parent_occupation = fl*e if inrange(`v', 1, 4)
    }
    I was basically trying to tell Stata to pull out the 'e' variable value only if its corresponding 'b' variable value is between 1 and 4 (i.e., if it's a parent), but I think I may need a loop within that loop in order to do that?

    One additional challenge is that there are subjects with multiple parents represented in the data (the first subject in the data below, for example, has 3 different parents represented because step-parents are included), and therefore I'd need some way to generate multiple parent_occupation variables, which I have not figured out as of yet.

    Here is a bit of my data. If anyone has any advice I would be very greatful!




    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id byte(fl1b fl1e fl2b fl2e fl3b fl3e)
    777851 1 3 1 1 1 4
    777834 2 2 1 5 5 5
    777852 3 1 1 1 7 4
    777835 2 5 7 4 8 1
    777853 2 4 8 1 9 4
    777836 1 3 9 5 1 4
    777854 2 3 1 2 1 4
    777837 3 5 7 5 8 5
    777855 4 4 8 2 3 2
    777838 7 2 3 5 4 1
    777856 6 1 4 4 3 4
    777839 7 2 3 5 4 1
    777857 8 3 2 5 2 2
    777840 9 3 2 5 3 2
    777858 1 3 6 3 2 2
    777841 8 2 7 2 1 5
    777859 8 5 8 2 7 1
    777842 3 2 9 5 8 1
    777860 4 4 7 3 9 2
    777843 3 5 2 3 1 3
    777861 2 4 8 1 3 3
    777844 2 5 3 4 2 3
    777862 1 1 1 5 3 3
    777845 3 2 3 2 4 2
    777863 7 5 2 1 3 1
    end


  • #2
    Code:
    isid id, sort
    frame copy default parents
    
    frame change parents
    reshape long fl@b fl@e, i(id)
    keep if inrange(flb, 1, 4) // KEEP ONLY PARENTS
    drop flb*
    by id (_j), sort: replace _j = _n
    rename fle parent__occupation
    reshape wide parent_@_occupation, i(id) j(_j)
    
    frame change default
    frlink 1:1 id, frame(parents)
    frget parent_*_occupation, from(parents)
    This code will create as many parent occupation variables as are needed to accommodate the maximum number of parents in a household.

    Notice that the real work in this code is done after -reshape-ing the data to long layout. In the end, we return to the wide layout you started with. But most Stata data management and analysis commands work best, or only, with long layout. So you might consider going to long layout and staying there, depending on what you will be doing with this data going forward.

    Added: Notice also that there are no loops within loops required. In fact, there are no loops at all!

    Comment


    • #3
      Thank you Clyde Schechter. I had not thought about doing it this way, but it makes perfect sense now that I see it.

      We have been debating wide vs. long layout and were planning on waiting until a future junction for a number of reasons--this dataset is quite large (in terms of both observations and number of variables) and going to be combined with other datasets in the future. But perhaps this gives me more ammunition to advocate for reshaping to long sooner rather than later, because we'll need to do some similar data wrangling on related variables as well.

      Thanks again for your help!

      Comment

      Working...
      X