Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Building an Wealth Index Based on Asset Possession (Survey Data)

    Dear Statalist members,

    I would be glad if you could help me out with this one.

    I am using (pooled) cross-country survey data on roughly 40k individuals across 18 Latin American countries (LAPOP surveys, Vanderbilt Univ.). I seek to build a variable measuring the wealth of a given respondent based on his or her possession of 10 different household items (a television set, a fridge, a landline, a cell phone, a vehicle, a washing machine, a microwave, an indoor bathroom, and a compute). Whether or not an interviewee possesses any one of these items is indicated through a dummy variable (w1-w10)

    I try to follow http://www.vanderbilt.edu/lapop/insights/I0806en.pdf in building the index. However, I do not know what the necessary steps to perform the corresponding principal component analysis (PCA) are. The rather brief instructions are as follows:

    "As suggested in the literature, all variables were first dichotomized (1=Yes, 0=No) to indicate the ownership of each household asset (Vyass and Kumaranayake 2006). Weights (effectively defined by factor scores) for each asset were computed separately for urban and rural areas for each country. Then, a “relative wealth” variable was created in the pooled dataset. Thus, the wealth index takes into account the distribution of assets in urban and rural areas within a given country in order to reflect each country’s economic conditions across urban and rural areas."

    Could someone tell me what the corresponding stata syntax would look like? country is the variable indicating the country of a respondent (1-18), while urban is a binary variable indicating whether a given respondent lives in an urban or rural area.

    Thanks so much and kind regards,

    Walter

  • #2
    Well, I think you need to decide first what you mean by "follow" previous authors. If you want to follow their method of developing the index, you will perform your own principal components analysis (-help pca-) and then generate an index after that using -predict-. But your PCA results are likely going to be different from what they found (unless you are working with the same data they used). So your resulting index will be different from theirs.

    Alternatively, you may want to define your index using the same weights that they did. In that case, you need to find out what the weights they got were and then generate your own index using them. If they don't show those weights in their article, you will probably have to contact the authors to get them.

    Comment


    • #3
      Clyde,

      thanks so much for your response. I guess what I meant was that I simply want to follow their "logic". So I do not expect my results to be the same as those of the referenced authors. However, I would like to get an understanding of the technical steps behind what is said in the text I cited.

      So do I understand correctly that defining the weights for each asset separately for urban and rural areas for each country would look like this:

      factor (w1-w10) if urban==0 & country==1, pcf factors(1)
      factor (w1-w10) if urban==1 & country==1, pcf factors(1)
      factor (w1-w10) if urban==0 & country==2, pcf factors(1)
      factor (w1-w10) if urban==1 & country==2, pcf factors(1)
      .
      .
      .
      , where w1-w10 = assets

      Is this correct from your viewpoint?

      "Then, a “relative wealth” variable was created in the pooled dataset." - So does that mean that I simply predict values for all observations in my dataset? As in:

      predict wealth_index

      Or do I predict by country and urban:

      predict wealth_index if urban==0 & country==1
      .
      .
      .

      Thanks so much for your input.

      Best,

      Walter
      Last edited by Walter Klausing; 12 Mar 2015, 05:20.

      Comment


      • #4
        -predict- only knows about the results of the most recent estimation command. So, to do this separately in each of four groups, you will need to -predict- for each group immediately after you do the analysis for each group, and then move on to do those steps for the next group until done.

        Also, the scoring of the indices can be done in two ways in Stata: regression scoring and Barlett's method. Again, you will need to see what the original source did if your goal is to follow their approach.

        Overall it will look something like this

        Code:
        gen index = .
        forvalues c = 1/2 { // LOOP OVER COUNTRIES
            forvalues u = 0/1 { // LOOP OVER URBAN/RURAL
              factor w1-w10 if urban == `u' & country == `c', pcf factors(1)
              predict temp if urban == `u' & country == `c' // SPECIFY REGRESION OR BARTLETT OPTION
              replace index = temp if urban == `u' & country == `c'
              drop temp
           }
        }

        Comment


        • #5
          Clyde,

          that's amazing. Thank you very much for taking the time to post this. I contacted the authors of the work regarding the method of scoring the indices. In the meantime, I proceeded with implementing your code which yielded an index ranging from -6.651358 to 6.142672. In a second step, I would like to assign respondents to "wealth quintiles". I propose the following code to do so

          Code:
          gen w_quintile=.
          forvalues country = 1/18 { // LOOP OVER COUNTRIES
              forvalues urban = 0/1 { // LOOP OVER URBAN/RURAL
                xtile temp=index if urban == `urban' & country == `country', nq(5)
                replace w_quintile = temp if urban==`urban' & country == `country'
                drop temp
              }
          }
          Would you see any issues with this sort of "binning" of observations? Sorry if this is a trivial step/question. Thanks again very much for your help.

          Regards,

          Walter
          Last edited by Walter Klausing; 13 Mar 2015, 09:44.

          Comment


          • #6
            Well, if you are trying to follow a previously used methodology and that is what they did, then what can I say?

            If you are now going off in your own direction, I can only opine that it is rarely a good idea to turn a continuous variable into a categorical variable. It just throws away information. Suppose two of the quintile boundaries are -3 and -2. Do you really mean that a person with index = -2.01 is radically different from a person with index = -1.99, but is to be regarded as similar to somebody with index = -2.99? The world rarely works like that. It is usually better to keep continuous variables continuous.

            Comment


            • #7
              It seems you factor analyze binary variables. Does the normal or conventional factor analysis work under this circumstance? I am in a similar position. My variables are all categorical ones with 5 categories. I was told to use the polychoric correlation based factor analysis. Everything goes well except the predictions of factor scores, which obviously violate the zero mean assumption. Anyone can help? Thanks!

              Comment

              Working...
              X