Calculating global score for an indicator

Sonnen Blume

Join Date: Aug 2018

Posts: 342
#1

Calculating global score for an indicator

16 Dec 2018, 16:56

I'd like to request for some explanation on how to calculate GLOBAL SCORE in Stata. I am not sure if this is the standard term for it, but I see many authors report this method using Stata.

I am quoting the text from a study where the authors use 12 different types of disability to calculate the score:

The results of the 12 items are summed up to obtain a global score expressed on a continuous scale from 0 (no disability) to 100 (full disability)

The questions are whether or not the individual has difficulty in: 1) eating 2) cooking 3) bathing 4) walking 5) standing for long 6) going out 7) using public transport 8) meeting friends 9) social events 10) learning a new task 11) remembering an address and 12) getting dressed. All questions are coded as 1 (yes) and no (0).

I use Stata 14 and the following study was done using 15.1.

Thanks in advance!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

16 Dec 2018, 18:46

Assuming you have one observation per patient and variables called cog_comm, self_care, mobility, interpersonal, life_act, and participation, they probably did this:

Code:

egen global_score = rowtotal(cog_comm self_care mobility interpersonal life_act participation) replace global_score = global_score * 100/12

That said, they do not say what happens if there are missing values for some of these variables in some observations. The code I have shown treats missing values as if they were 0--which may or may not be how they handled it.

This code will work in all versions of Stata since at least version 4 (I don't know when -egen- was brought in, but it's been around since at least 4.0)!

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

16 Dec 2018, 20:36

Google is potentially your friend here:

https://www.who.int/classifications/icf/more_whodas/en/

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

16 Dec 2018, 23:22

While it does look very similar, there is one crucial difference between what is reported in #1 and what is found at the site linked in #3. The items are scored 0 through 4 according to the WHODAS website, but in #1 it is said that the variables are 0/1 variables. Also, I did not find anything in the WHODAS page about how non-response is handled.

The WHODAS page describes an alternative scoring based on item response theory--and it is this one that is said to be scaled 0 to 100, whereas the text quoted in #1 refers to summing the item scores and scaling 0 to 100. By contrast, the WHODAS page refers to using summation of the item scores, but does not say anything about rescaling the result.

So all in all, I'd say it's a very confusing picture here. It sounds like the authors of the article being cited in #1 did something that is not consistent with anything recommended by WHO, and did not describe it in complete enough terms to allow others to replicate what they did. I'm thinking that Sonnen Blume needs to contact the authors of that article to find out what they did. Clarification is also needed why Blume's data is coded 0/1 and not 0 through 4.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

17 Dec 2018, 08:35

Originally posted by Clyde Schechter View Post

While it does look very similar, there is one crucial difference between what is reported in #1 and what is found at the site linked in #3. The items are scored 0 through 4 according to the WHODAS website, but in #1 it is said that the variables are 0/1 variables. Also, I did not find anything in the WHODAS page about how non-response is handled.

The WHODAS page describes an alternative scoring based on item response theory--and it is this one that is said to be scaled 0 to 100, whereas the text quoted in #1 refers to summing the item scores and scaling 0 to 100. By contrast, the WHODAS page refers to using summation of the item scores, but does not say anything about rescaling the result.

So all in all, I'd say it's a very confusing picture here. It sounds like the authors of the article being cited in #1 did something that is not consistent with anything recommended by WHO, and did not describe it in complete enough terms to allow others to replicate what they did. I'm thinking that Sonnen Blume needs to contact the authors of that article to find out what they did. Clarification is also needed why Blume's data is coded 0/1 and not 0 through 4.

A good point. I missed that Sonnen said that the individual items were coded as binary items.

As to missing items, the webpage indeed doesn't give detail, which is a bit disappointing. A much longer instruction manual does go into some detail: for the 12-item version (which I think Sonnen has), if only one item (i.e. a question) has a missing value, you're allowed to calculate the mean of the other 11 items and assign that score to the missing item. If there is more than one missing item, you can't do that, and the manual doesn't say exactly what to do - presumably you would recalculate that respondent's global score as missing. The manual also says that the computer-administered survey won't allow the interview to progress if a question isn't answered.

Sonnen, just in case you aren't aware of this egen option:

Code:

egen global_score_miss = rowmiss(cog_comm self_care mobility interpersonal life_act participation) replace global_score = . if global_score_miss > 1

Furthermore, there is a suite of commands to help you get a sense of how many observations have missing data.

Code:

help missing misstable summarize cog_comm self_care mobility interpersonal life_act participation

In general, you will want to skim the WHODAS manual and the articles you reference. If you have specific questions on coding, we can help.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment

Sonnen Blume

Join Date: Aug 2018
Posts: 342

20 Dec 2018, 10:15

Originally posted by Clyde Schechter View Post

Assuming you have one observation per patient and variables called cog_comm, self_care, mobility, interpersonal, life_act, and participation, they probably did this:

Code:

egen global_score = rowtotal(cog_comm self_care mobility interpersonal life_act participation)
replace global_score = global_score * 100/12

That said, they do not say what happens if there are missing values for some of these variables in some observations. The code I have shown treats missing values as if they were 0--which may or may not be how they handled it.

This code will work in all versions of Stata since at least version 4 (I don't know when -egen- was brought in, but it's been around since at least 4.0)!

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Thanks so much Clyde for detailed response as always! I don't have the same data and thats why didn't use dataex this time. I have calculated the score using the formula you mentioned. Below are what my data look like:
[CODE]

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float move_around int concentrate float(learning standing bathing dressing carrying eating lying toiletting transportation going_out global_score)
0 0 0 1 0 0 1 0 1 0 1 0 33.333332
0 0 0 . . . . . . . . .         0
0 0 0 0 0 0 1 0 0 0 1 0 16.666666
1 0 0 1 0 0 1 0 0 1 1 0  41.66667
1 1 1 1 1 0 1 0 1 1 1 1  83.33334
0 0 1 1 0 0 1 0 1 0 1 1        50
1 1 1 1 1 1 1 1 1 1 1 1       100
1 1 1 1 1 1 1 0 1 1 1 1  91.66666
1 1 1 1 0 0 1 0 1 1 1 1        75
1 1 1 1 1 1 1 1 1 1 1 1       100
1 0 0 1 0 0 1 0 0 0 0 1 33.333332
1 0 0 1 0 0 1 0 1 0 0 1  41.66667
1 1 1 1 1 1 1 0 1 1 1 1  91.66666
0 1 1 0 0 0 1 0 1 1 1 0        50
0 1 0 0 0 0 1 0 0 0 0 0 16.666666
0 0 0 1 0 0 1 0 1 1 1 0  41.66667
1 1 1 1 0 0 1 0 1 1 1 1        75
0 0 0 0 0 0 0 0 0 0 0 0         0
1 0 0 1 1 0 1 0 1 0 1 1  58.33333
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
0 0 0 1 0 0 1 0 1 1 1 1        50
1 1 1 1 1 1 1 1 1 1 1 1       100
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
1 0 0 1 0 0 1 0 1 0 1 0  41.66667
1 0 0 0 1 0 1 0 1 1 0 0  41.66667
0 0 0 0 0 0 1 0 1 0 1 1 33.333332
0 1 1 0 0 0 1 0 1 0 1 0  41.66667
0 0 0 0 0 0 1 0 0 0 0 0  8.333333
1 0 0 1 1 0 1 0 1 1 1 0  58.33333
0 1 1 1 0 0 1 0 0 0 0 0 33.333332
1 1 1 1 0 0 1 0 1 0 1 1 66.666664
0 0 0 1 0 0 1 0 0 0 1 0        25
1 0 0 1 0 0 1 0 0 0 0 0        25
0 0 0 0 0 0 0 0 0 0 0 0         0
1 1 1 1 1 1 1 0 1 1 1 1  91.66666
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
1 0 1 1 0 0 1 0 1 0 1 1  58.33333
1 1 1 1 0 0 1 0 0 0 0 1        50
1 1 0 1 1 0 1 0 1 0 0 1  58.33333
1 1 1 1 1 1 1 0 1 1 1 1  91.66666
0 1 0 1 1 0 1 0 1 0 0 1        50
0 0 0 1 0 0 1 0 0 0 0 1        25
1 0 0 1 0 0 1 0 0 0 0 0        25
1 0 0 1 0 0 1 0 0 0 0 1 33.333332
0 1 1 1 0 0 1 0 1 0 0 0  41.66667
0 0 0 0 0 0 1 0 0 0 0 0  8.333333
0 0 0 0 0 0 0 0 0 0 0 0         0
1 1 1 1 1 1 1 1 1 1 1 1       100
1 1 1 0 0 0 1 0 1 1 1 1 66.666664
0 0 1 0 0 0 1 0 0 0 0 0 16.666666
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
1 0 0 1 1 1 1 0 1 1 1 1        75
0 1 1 1 1 0 1 0 0 1 0 0        50
0 0 0 1 0 0 1 0 0 0 0 1        25
0 0 0 0 0 0 0 0 0 0 0 0         0
1 1 1 1 1 1 1 0 1 1 1 1  91.66666
1 0 1 1 0 0 1 0 0 0 0 0 33.333332
1 0 0 1 0 0 0 0 1 0 0 0        25
1 1 1 0 0 0 1 0 1 0 0 0  41.66667
1 1 1 1 0 0 1 0 1 0 1 1 66.666664
0 0 0 0 0 0 1 0 0 0 0 0  8.333333
0 1 1 0 1 0 1 0 1 1 1 1 66.666664
1 0 1 1 0 0 1 0 1 0 1 0        50
1 1 1 1 1 1 1 0 1 1 1 1  91.66666
0 0 0 0 0 0 0 0 0 0 0 0         0
0 0 0 1 0 0 0 0 0 0 0 0  8.333333
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
0 0 1 1 0 0 1 0 1 0 0 1  41.66667
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
1 0 1 1 1 0 1 0 0 0 1 1  58.33333
1 0 0 1 0 0 1 0 0 0 0 0        25
0 0 0 0 0 0 0 0 0 0 0 1  8.333333
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
1 1 1 1 1 0 1 0 1 1 1 1  83.33334
1 1 1 1 1 1 1 0 1 1 1 1  91.66666
0 0 0 1 0 0 1 0 1 0 0 0        25
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
1 0 1 1 1 0 1 0 1 1 1 1        75
1 1 1 1 0 0 1 0 1 1 0 1 66.666664
1 1 1 1 1 1 1 0 1 0 1 1  83.33334
1 1 1 1 1 0 1 0 1 0 0 1 66.666664
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
0 0 0 0 0 0 1 0 0 0 0 0  8.333333
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
1 1 1 1 0 0 1 0 1 0 0 1  58.33333
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
1 0 0 1 0 0 1 0 1 1 1 1  58.33333
0 1 0 0 0 0 1 0 0 0 0 0 16.666666
0 1 1 1 0 0 1 0 0 0 1 1        50
1 1 1 1 0 0 1 0 1 1 1 1        75
1 0 1 1 1 0 1 0 0 0 1 1  58.33333
1 0 0 1 0 0 0 0 0 0 0 0 16.666666
1 1 1 1 1 0 1 0 1 1 1 1  83.33334
0 1 1 1 0 0 1 0 1 0 0 1        50
0 0 1 1 0 0 1 0 0 0 0 0        25
1 1 0 1 1 1 1 0 1 1 1 1  83.33334
0 0 0 1 0 0 1 0 1 1 1 0  41.66667
0 0 0 0 0 0 1 0 0 0 0 0  8.333333
0 0 0 1 0 0 1 0 0 0 0 0 16.666666
1 0 0 1 0 0 1 0 1 1 1 0        50
end

I recoded each item (None "1"/ Mild "2"/ Moderate "3"/ Severe "4" / Extreme "5") to [No (Mild, Moderate, Severe, Extreme) "0" and Yes(None) "1"] and missing values (98) to zero.
So I have one more question please, what it be the formula if I hadn't recoded the variables.

Thanks so much for the help!

Comment

Sonnen Blume

Join Date: Aug 2018

Posts: 342
#7

20 Dec 2018, 10:18

Originally posted by Clyde Schechter View Post

While it does look very similar, there is one crucial difference between what is reported in #1 and what is found at the site linked in #3. The items are scored 0 through 4 according to the WHODAS website, but in #1 it is said that the variables are 0/1 variables. Also, I did not find anything in the WHODAS page about how non-response is handled.

The WHODAS page describes an alternative scoring based on item response theory--and it is this one that is said to be scaled 0 to 100, whereas the text quoted in #1 refers to summing the item scores and scaling 0 to 100. By contrast, the WHODAS page refers to using summation of the item scores, but does not say anything about rescaling the result.

So all in all, I'd say it's a very confusing picture here. It sounds like the authors of the article being cited in #1 did something that is not consistent with anything recommended by WHO, and did not describe it in complete enough terms to allow others to replicate what they did. I'm thinking that Sonnen Blume needs to contact the authors of that article to find out what they did. Clarification is also needed why Blume's data is coded 0/1 and not 0 through 4.

I've never use " item response theory" but see it regularly on many papers. Is the same method as you suggested to calculate the Global Score?
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#8

20 Dec 2018, 10:21

Originally posted by Weiwen Ng View Post

A good point. I missed that Sonnen said that the individual items were coded as binary items.

As to missing items, the webpage indeed doesn't give detail, which is a bit disappointing. A much longer instruction manual does go into some detail: for the 12-item version (which I think Sonnen has), if only one item (i.e. a question) has a missing value, you're allowed to calculate the mean of the other 11 items and assign that score to the missing item. If there is more than one missing item, you can't do that, and the manual doesn't say exactly what to do - presumably you would recalculate that respondent's global score as missing. The manual also says that the computer-administered survey won't allow the interview to progress if a question isn't answered.

Sonnen, just in case you aren't aware of this egen option:

Code:

egen global_score_miss = rowmiss(cog_comm self_care mobility interpersonal life_act participation) replace global_score = . if global_score_miss > 1

Furthermore, there is a suite of commands to help you get a sense of how many observations have missing data.

Code:

help missing misstable summarize cog_comm self_care mobility interpersonal life_act participation

In general, you will want to skim the WHODAS manual and the articles you reference. If you have specific questions on coding, we can help.

Thanks so much for the insights Weiwen! I wasn't aware of the rowmiss function, it looks very handy...
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

20 Dec 2018, 11:10

Had you not recoded the variables as you did, the scoring would be:

Code:

egen rcount = rownonmiss(move_around-going_out) egen global_score = rowmean(move_around-going_out) if rcount >= 11 replace global_score = global_score*12/rcount replace global_score = (global_score-12) * 100/48

The first line of code identifies the number of non-missing responses for this subject. The second line averages the item responses, which individually range between 1 and 5, for those subjects reporting at least 11 responses. The third line scales that up to a sum (exactly if there were 12 responses, and pro-rated upward if there were only 11 responses.) The global scores at this point range from a minimum of 12 (1 on every item) to 60 (5 on every item). The final line linearly transforms it to range from 0 to 100.

Item Response Theory is not the same as what I recommended. Item Response Theory is a sophisticated way of looking at multi-item instruments. In particular, unlike the kind of global scoring being used here, in Item Response Theory each item is not assumed to be equally "important," and different weights are assigned to different items. The details of this are too complicated to explain here. The Wikipedia page on Item Response Theory is pretty good and will give you a sense of how it works.
1 like
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#10

20 Dec 2018, 12:25

Originally posted by Clyde Schechter View Post

Had you not recoded the variables as you did, the scoring would be:

Code:

egen rcount = rownonmiss(move_around-going_out) egen global_score = rowmean(move_around-going_out) if rcount >= 11 replace global_score = global_score*12/rcount replace global_score = (global_score-12) * 100/48

The first line of code identifies the number of non-missing responses for this subject. The second line averages the item responses, which individually range between 1 and 5, for those subjects reporting at least 11 responses. The third line scales that up to a sum (exactly if there were 12 responses, and pro-rated upward if there were only 11 responses.) The global scores at this point range from a minimum of 12 (1 on every item) to 60 (5 on every item). The final line linearly transforms it to range from 0 to 100.

Item Response Theory is not the same as what I recommended. Item Response Theory is a sophisticated way of looking at multi-item instruments. In particular, unlike the kind of global scoring being used here, in Item Response Theory each item is not assumed to be equally "important," and different weights are assigned to different items. The details of this are too complicated to explain here. The Wikipedia page on Item Response Theory is pretty good and will give you a sense of how it works.

This thread has been very informative. Thanks a lot Clyde!
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#11

20 Dec 2018, 13:17

Originally posted by Clyde Schechter View Post

Had you not recoded the variables as you did, the scoring would be:

Code:

egen rcount = rownonmiss(move_around-going_out) egen global_score = rowmean(move_around-going_out) if rcount >= 11 replace global_score = global_score*12/rcount replace global_score = (global_score-12) * 100/48

The first line of code identifies the number of non-missing responses for this subject. The second line averages the item responses, which individually range between 1 and 5, for those subjects reporting at least 11 responses. The third line scales that up to a sum (exactly if there were 12 responses, and pro-rated upward if there were only 11 responses.) The global scores at this point range from a minimum of 12 (1 on every item) to 60 (5 on every item). The final line linearly transforms it to range from 0 to 100.

Item Response Theory is not the same as what I recommended. Item Response Theory is a sophisticated way of looking at multi-item instruments. In particular, unlike the kind of global scoring being used here, in Item Response Theory each item is not assumed to be equally "important," and different weights are assigned to different items. The details of this are too complicated to explain here. The Wikipedia page on Item Response Theory is pretty good and will give you a sense of how it works.

Sonnen, in addition to this, most people would recommend that you do not dichotomize the underlying variables. You discard some information that way, and it is not consistent with the scoring manual.

The Excel scoring worksheet for the 12-item WHODAS actually shows raw scores from 0 (no difficulty) through 4 (extreme difficulty/can't do). Going through the Excel worksheet for the simple sum score method, it looks like each raw point on any item is worth 100/48 percentage points (4 points per item, 12 items total, divided by 100 points). Hence, if you restore the original scores, I believe Clyde's code can be reduced to:

Code:

egen rcount = rownonmiss(move_around-going_out) egen global_score = rowmean(move_around-going_out) if rcount >= 11 replace global_score = . if rcount < 11 replace global_score = global_score*12/rcount replace global_score = global_score * 100 / 48

Note: My code will recode the global score to missing if someone misses more than 1 domain. You may or may not wish to do so, but I think this approach is consistent with the section of the manual that I skimmed (and you should check my reading of the manual).

It was obvious in retrospect, but I actually wasn't aware of the rownonmiss function.

Item response theory is what Clyde said it is. If you are using Stata 14 or 15, there is actually a set of IRT commands, and you can inspect the manual if you want to learn more.

Code:

help irt

Last edited by Weiwen Ng; 20 Dec 2018, 13:24.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#12

20 Dec 2018, 15:30

Originally posted by Weiwen Ng View Post

Sonnen, in addition to this, most people would recommend that you do not dichotomize the underlying variables. You discard some information that way, and it is not consistent with the scoring manual.

The Excel scoring worksheet for the 12-item WHODAS actually shows raw scores from 0 (no difficulty) through 4 (extreme difficulty/can't do). Going through the Excel worksheet for the simple sum score method, it looks like each raw point on any item is worth 100/48 percentage points (4 points per item, 12 items total, divided by 100 points). Hence, if you restore the original scores, I believe Clyde's code can be reduced to:

Code:

egen rcount = rownonmiss(move_around-going_out) egen global_score = rowmean(move_around-going_out) if rcount >= 11 replace global_score = . if rcount < 11 replace global_score = global_score*12/rcount replace global_score = global_score * 100 / 48

Note: My code will recode the global score to missing if someone misses more than 1 domain. You may or may not wish to do so, but I think this approach is consistent with the section of the manual that I skimmed (and you should check my reading of the manual).

It was obvious in retrospect, but I actually wasn't aware of the rownonmiss function.

Item response theory is what Clyde said it is. If you are using Stata 14 or 15, there is actually a set of IRT commands, and you can inspect the manual if you want to learn more.

Code:

help irt

Thanks for the valuable suggestions Weiwen. I will keep that in mind for next analyses.
Comment

Announcement