Dear statalist users,
I am trying to assess the probability of getting the benefits of a government scheme for households using household-level characteristics. My outcome variable is a binary variable, either the household gets benefits or not. My explanatory variables have around 10 variables other than controls. Four of these variables are ordered multinomials. For example, "Does the household own a particular asset? The answer are 0 "no", 1 "one such asset", 2 "two such assets", and 3 "more than two assets". Five of the variables are binary, and one variable is continuous, recording the household consumption expenditure.
I wish to summarize all these characteristics in one variable. I can see two possible options for doing that
A) I create an index by converting the continuous variable into categorical. For example, households having expenditures below $1000 are coded as 1, from $1000 to $2500 as 2, and more than $2500 as 3. Then simply aggregate all these variables to come up with a number for all households. I am not convinced with this method as it makes many problematic assumptions about the substitutability (equal weights) between different categories of the variables and an unfounded way of generating categorical variables.
B) I can use a polychoricpca to summarise these variables into one or a few components. I have doubts regarding using polychoricpca with a mix of such categorical and continuous variables. Also, I have modified all the categorical variables such that the higher values correspond to improved conditions or better provision. Can I use the component generated in place of the index and interpret a negative coefficient as with improvement in the household characteristics, the probability of receiving the government benefit reduces?
I am open to all kinds of suggestions/comments regarding these two ways or any other way which can be better in such as situation. If there are any implicit assumptions/caveats that I should keep in mind while doing this, please let me know. Kindly ask for clarifications, if any required.
Thanks in advance.
I am trying to assess the probability of getting the benefits of a government scheme for households using household-level characteristics. My outcome variable is a binary variable, either the household gets benefits or not. My explanatory variables have around 10 variables other than controls. Four of these variables are ordered multinomials. For example, "Does the household own a particular asset? The answer are 0 "no", 1 "one such asset", 2 "two such assets", and 3 "more than two assets". Five of the variables are binary, and one variable is continuous, recording the household consumption expenditure.
I wish to summarize all these characteristics in one variable. I can see two possible options for doing that
A) I create an index by converting the continuous variable into categorical. For example, households having expenditures below $1000 are coded as 1, from $1000 to $2500 as 2, and more than $2500 as 3. Then simply aggregate all these variables to come up with a number for all households. I am not convinced with this method as it makes many problematic assumptions about the substitutability (equal weights) between different categories of the variables and an unfounded way of generating categorical variables.
B) I can use a polychoricpca to summarise these variables into one or a few components. I have doubts regarding using polychoricpca with a mix of such categorical and continuous variables. Also, I have modified all the categorical variables such that the higher values correspond to improved conditions or better provision. Can I use the component generated in place of the index and interpret a negative coefficient as with improvement in the household characteristics, the probability of receiving the government benefit reduces?
I am open to all kinds of suggestions/comments regarding these two ways or any other way which can be better in such as situation. If there are any implicit assumptions/caveats that I should keep in mind while doing this, please let me know. Kindly ask for clarifications, if any required.
Thanks in advance.
Comment