Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating index with a mix of ordered categorical variables and continuous variables for an explanatory variable, or using polychoricpca

    Dear statalist users,
    I am trying to assess the probability of getting the benefits of a government scheme for households using household-level characteristics. My outcome variable is a binary variable, either the household gets benefits or not. My explanatory variables have around 10 variables other than controls. Four of these variables are ordered multinomials. For example, "Does the household own a particular asset? The answer are 0 "no", 1 "one such asset", 2 "two such assets", and 3 "more than two assets". Five of the variables are binary, and one variable is continuous, recording the household consumption expenditure.

    I wish to summarize all these characteristics in one variable. I can see two possible options for doing that
    A) I create an index by converting the continuous variable into categorical. For example, households having expenditures below $1000 are coded as 1, from $1000 to $2500 as 2, and more than $2500 as 3. Then simply aggregate all these variables to come up with a number for all households. I am not convinced with this method as it makes many problematic assumptions about the substitutability (equal weights) between different categories of the variables and an unfounded way of generating categorical variables.

    B) I can use a polychoricpca to summarise these variables into one or a few components. I have doubts regarding using polychoricpca with a mix of such categorical and continuous variables. Also, I have modified all the categorical variables such that the higher values correspond to improved conditions or better provision. Can I use the component generated in place of the index and interpret a negative coefficient as with improvement in the household characteristics, the probability of receiving the government benefit reduces?


    I am open to all kinds of suggestions/comments regarding these two ways or any other way which can be better in such as situation. If there are any implicit assumptions/caveats that I should keep in mind while doing this, please let me know. Kindly ask for clarifications, if any required.
    Thanks in advance.

  • #2
    polychoric will do (handles dichotomous, categorical, and continuous variables) and factor analysis is a sensible choice.

    Just make sure everything goes from good-to-bad or whatever you want, which you seem to be doing. But do check to make sure predicted factor is going in the right direction (get correlation between income and the factor, e.g.).

    HTML Code:
    https://stats.oarc.ucla.edu/stata/faq/how-can-i-perform-a-factor-analysis-with-categorical-or-categorical-and-continuous-variables/

    Comment


    • #3
      Thank you George Ford for your response. I will check the correlations.

      Comment


      • #4
        Can somebody help me understand the intuition behind how polychoric PCA gives weights to different characteristics of the households so that it provides one measure for each household? How is this method superior to other methods, such as giving equal weights to different categories? Do I need to worry about calling its components as indices? I am using them as my main explanatory variables George Ford Clyde Schechter Nick Cox William Lisowski skolenik

        Comment


        • #5
          I’ve never used polychoric PCA; sorry, but I don’t have a useful opinion on it. Of the other people you mention William has retired from Statalist.

          Comment


          • #6
            Polychoric is merely a means by which to compute a correlation matrix with mixed-type variables.

            PCA is creating a linear combination of the variables that maximize the common variance from all the variables. That is, it turns multiple variables into one variable that captures (ideally) a large part of the variance of the multiple variables.

            The kmo stat is a summary stat to tell you if your data is up to the task.

            I'd definitely recommend you read up on factor analysis before proceeding. There's plenty online and this is a decent, cheap book (Factor Analysis: Statistical Methods and Practical Issues, Sage).

            Comment


            • #7
              Thank you George Ford, for the recommendation. Much appreciated.

              Comment

              Working...
              X