Generating continuous variable from categorical responses

Hasan Bari

Join Date: May 2021

Posts: 6
#1

Generating continuous variable from categorical responses

08 May 2021, 06:05

Dear altruists,
I am working on a dataset and to assess depression, we asked respondents 9 questions (adapted from brief PHQ-9). I have those 9 questions response in my dataset now. I want to generate a continuous variable from these 9 responses to calculate the mean score. Any help? I have attached the picture for better understanding.

*Pardon me if I have used any term/word wrongly. I am a new stata learner. Thanks in advance.
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#2

08 May 2021, 08:02

You would need to explain better what you want to do.

If you are asking how to take an average of your 9 variables, see -egen, rototal()-, this will give you the sum of the variables and then you can divide by 9 to get the average.
Comment
Christopher Bratt

Join Date: May 2019

Posts: 144
#3

08 May 2021, 14:22

I would recommend against simply taking the average of the 9 items. You would first need to test whether you can defend using these items to assess a single latent variable.

It's not clear to me what your theory is when you want to use these items as a measurement. Could this be considered a test that counts scores? (Compare this to counting alcohol units from beer, wine, etc, etc as a measure of alcohol consumption - no correlation between the items is needed.)

Alternatively, you assume that an unobserved variable (a latent variable) is a direct cause of responses to each of these variables/items. In that case, you need to test that assumption. I would use confirmatory factor analysis to test the model (I'm confident you would find that the model would need further modifications). With such factor analysis, you can let the software generate factor scores if you need them. My guess is you will want to avoid confirmatory factor analysis (it's complex/difficult to learn).

At the very least test your items with some test of reliability. Cronbach's alpha is not so good but commonly used. A pragmatic choice could be to test your items with Cronbach's alpha and then choose an option that lies somewhere between confirmatory factor analysis (good) and simply estimating a mean score (not so good). I'm thinking of principal component analysis. This method will allow you to estimate component scores, which can be compared to the factor scores in factor analysis. (The caveat is that principal component analysis doesn't really test your model and doesn't consider measurement errors associated with every single item). But it would to some degree help you consider which items are better indicators of the latent variable you are interested in.

But you have ordinal data... Simply pretending the ordinal items are continuous would be problematic. See here for a post on the issue:
https://stats.stackexchange.com/ques...or-binary-data Others might suggest alternative techniques given your ordinal scales.

But again, if your theory does not assume correlations between items, you would probably be better off by simply counting scores. The problem here might be that some respondents may not have answered all questions, which would give you missing data, and counting scores would be problematic.

Last edited by Christopher Bratt; 08 May 2021, 14:32.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#4

08 May 2021, 14:35

As far as I can see this is the standard PHQ9 instrument. It has a standard scoring protocol. You code 0 for Not at all, 1 for Several days, 2 for More than half the days, and 3 for Nearly every day. Then you add up the 9 numeric scores. You can just google (or whatever is your favorite search engine) PHQ9 scoring for more information about interpreting the results.

If you need help with Stata code for implementing this standard scoring protocol, you need to post back with example data from your data set, using the -dataex- command. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment

Announcement

Generating continuous variable from categorical responses

Comment

Comment

Comment