Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating a weighted asset index to determine wealth stata 17

    Hi everyone! I'm trying to create a wealth index for a number of households from the information I have on the assets that they possess. A portion of my dataset is:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str14 household_id byte(hh_s10q011 hh_s10q012 hh_s10q013 hh_s10q014)
    "01010101601002" 2 0 0 0
    "01010101601017" 0 0 0 0
    "01010101601034" 1 0 0 0
    "01010101601049" 1 0 0 0
    "01010101601064" 0 0 0 2
    "01010101601080" 0 0 0 0
    "01010101601087" 0 1 0 1
    "01010101601101" 1 0 0 0
    "01010101601116" 0 0 0 1
    "01010101601131" 1 0 0 1
    "01010101601146" 2 2 2 2
    "01010101601162" 2 2 2 1
    "01010201403001" 0 0 0 1
    "01010201403011" 0 0 0 2
    "01010201403016" 0 0 0 1
    "01010201403026" 0 0 0 0
    "01010201403031" 0 0 0 4
    "01010201403036" 0 0 0 1
    "01010201403046" 0 0 0 2
    "01010201403061" 0 0 0 7
    "01010201403091" 0 0 0 0
    "01010201403106" 0 0 0 1
    "01010201403121" 0 0 0 2
    "01010201403136" 0 0 0 1
    "01010300106004" 0 0 0 0
    "01010300106016" 0 0 0 1
    "01010300106028" 1 0 0 7
    "01010300106040" 0 0 0 1
    "01010300106052" 0 0 0 7
    "01010300106064" 0 0 0 1
    "01010300106076" 0 0 0 3
    "01010300106088" 0 0 0 3
    "01010300106100" 1 0 0 1
    "01010300106112" 1 0 1 7
    "01010300106124" 0 0 0 1
    "01010300106136" 0 0 0 5
    "01010301804004" 0 0 0 0
    "01010301804019" 0 0 0 0
    "01010301804034" 0 0 0 0
    "01010301804049" 0 0 0 0
    "01010301804051" 0 0 0 1
    "01010301804054" 0 0 0 1
    "01010301804069" 0 0 0 0
    "01010301804085" 0 0 0 0
    "01010301804100" 0 0 0 0
    "01010301804115" 0 0 0 0
    "01010301804130" 0 0 0 0
    "01010301804145" 0 0 0 0
    "01010500204001" 0 0 0 1
    "01010500204025" 0 0 0 1
    "01010500204049" 0 0 0 1
    "01010500204073" 0 0 0 1
    "01010500204097" 0 0 0 0
    "01010500204121" 0 0 0 0
    "01010500204142" 0 0 0 0
    "01010500204145" 0 0 0 0
    "01010500204169" 0 0 0 2
    "01010500204182" 0 0 0 1
    "01010500204193" 2 0 0 1
    "01010500204217" 1 0 0 0
    "01010600402002" 0 0 0 2
    "01010600402014" 0 0 0 1
    "01010600402021" 0 0 0 3
    "01010600402039" 0 0 0 2
    "01010600402057" 0 0 0 5
    "01010600402075" 0 0 0 1
    "01010600402094" 0 0 0 1
    "01010600402112" 0 0 0 0
    "01010600402130" 0 0 0 5
    "01010600402148" 0 0 0 2
    "01010600402160" 2 2 2 1
    "01010600402167" 0 0 0 0
    "01020100207002" 0 0 0 2
    "01020100207012" 0 0 0 1
    "01020100207017" 0 0 0 1
    "01020100207027" 0 0 0 2
    "01020100207032" 0 0 0 1
    "01020100207047" 0 0 0 1
    "01020100207062" 0 0 0 2
    "01020100207077" 0 0 0 3
    "01020100207092" 0 0 0 2
    "01020100207107" 0 0 0 2
    "01020100207122" 0 0 0 2
    "01020100207137" 0 0 0 2
    "01020200802002" 0 0 0 1
    "01020200802017" 1 0 0 2
    "01020200802028" 0 0 0 2
    "01020200802033" 0 0 0 3
    "01020200802048" 0 0 0 2
    "01020200802063" 0 0 0 5
    "01020200802078" 0 0 0 5
    "01020200802108" 0 0 0 3
    "01020200802120" 0 0 0 1
    "01020200802124" 0 0 0 3
    "01020200802139" 0 0 0 1
    "01020200802180" 0 0 0 2
    "01020301001002" 0 0 0 3
    "01020301001017" 0 0 0 3
    "01020301001032" 0 0 0 2
    "01020301001047" 0 0 0 4
    end
    I would like to create a weighted sum of the each asset type based on the frequency of each asset. I have an idea of how I can do this mathematically however I am unsure of the code to use to actually make this happen on stata. For reference I am trying to multiply how many of each asset is owned by the inverse of proportion of households that own the asset for each asset type and then add each weighted asset to create a wealth index.

    tried to explain it as best as I can but sorry if this is confusing or hard to follow! would please appreciate any help for how to code this and many thanks in advance! ))

  • #2
    This code assumes, and verifies, that each household is represented by only a single observation in the data set. I also assume that the variables in the full data set that identify these assets are all and only the hh_s10q* variables. If not, the code will need modifications, possibly extensive.

    Like many things in Stata, this calculation is much easier to do with the data in long layout than wide. The final line of code restores the original wide layout. But as it is likely that the subsequent work you do with this data will also be easier to do in long, you should consider skipping that final step, and only use it if you know that you will really need the wide layout for next steps.

    Code:
    isid household_id
    
    reshape long hh_s10q, i(household_id) j(asset) string
    by asset (household_id), sort: egen weight = mean(inrange(hh_s10q, 1, .))
    replace weight = 1/weight
    by household_id (asset), sort: egen numerator = total(weight*hh_s10q)
    by household_id (asset): egen denominator = total(cond(!missing(hh_s10q), weight, .))
    gen index = numerator/denominator
    
    //    OPTIONAL IF YOU NEED TO GO BACK TO WIDE LAYOUT
    reshape wide hh_s10q weight, i(household_id) j(asset) string

    Comment


    • #3
      This is perfect Clyde thank you for all the help!

      Comment

      Working...
      X