Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to construct a composite index from binary variables

    Hello everybody! I am using a public database named Fragile Families and child wellbeing study and I have to construct several variables that are indexes. For example, I am constructing a variable named Maternal Hardship that is an index/average of the mother's response to 14 different questions where possible answers are yes/no, thus binary. These questions are all differents and related to different kind of hardship. I have: In past 12 months, did you-Receive free food/meals?/ In past 12 months, did you-Not pay full gas/oil/electric bill?/In past 12 mons did you stay in place not meant for regular housing? and so on, all dummies. Now, my question is, in order to construct an index what can I do? Do i have to standardize each one of them and then create the composite index by summing them and standardize it or should I have to perform a PCA analysis in order to assign different weights to different questions and then create an index? Or maybe there is another simpler way to do it.... Any suggestion or advice is really more than welcomed because I have no idea how to solve this.

  • #2
    There is no single best way to proceed. It depends on what you understand the relationships among these variables and the construct representing the index to be. If you are trying to extract a concept of maternal hardship that you think is, in a sense, the common essence of all 14 questions, and your expectation is that the fourteen behaviors covered by the questions are correlated by virtue of being reflections of that common essence, then you probably should do a factor analysis. On the other hand, if these are various, loosely related behaviors that one might see among different mothers experiencing hardship but not necessarily various results of a common cause, then a summative index (just count how many are endorsed) might be best. There is a good discussion of this distinction in Alan Acock's Discovering Structural Equation Modeling using Stata.

    And although it is commonly used, what I recommend least is doing a principal components analysis. PCA is great when you need to decompose a set of variables into orthogonal variables that cover the same variance, or using the first (or first few) principal component as a way of reducing the dimensionality of your predictor set (at the cost of discarding information), but it's really not a great way to create a construct.

    Actually, the first thing I recommend is that you look in the literature to see what others have done. The Fragile Families study has been around for a while, and there are many publications based on it. There's a good chance somebody has already solved this problem, and there is no virtue in reinventing the wheel.

    Comment


    • #3
      Thank you so much Clyde for your help and tips! I found them really useful to proceed in the analysis!!

      Comment


      • #4
        As long as your variables are dichotomous, I would recommend you multiple correspondences analysis (mca), instead of PCA.
        To obtain the index you want, you have to write just two commands:
        Code:
        mca Q1-Q14
        predict F1
        If you want to carry out other analysis with the 14 dichotomous variables, I recommend you my program coin, whose beta version you can obtain as follows:
        Code:
        net install coin, from(http://casus.usal.es/stata/) replace
        coin Q1-Q14, frequencies xy(pca) //For example
        Once installed, you have a help (help coin) and a dialog box (db coin)
        Have a happy 2015!

        N.B The dichotomous responses must be coded as 0(No)/1(Yes) in coin.

        Comment

        Working...
        X