Any way to standardize ordinal/categorical variables

Ujjwal

Join Date: Jul 2014

Posts: 56
#1

Any way to standardize ordinal/categorical variables

23 Jul 2014, 02:20

Hi, I have different ordinal/categorical variables measured in different scale, some are 1-7, 1-3, 1-10 etc. How can I standardize the variables to measure on the same scale say 0-1 in Stata 12?
Tags: None
Stefan

Join Date: Jul 2014

Posts: 4
#2

23 Jul 2014, 02:35

If you want to take them one by one, you can apply the following formula:
-for the one that is 1-7: (initial value -1)/6
-for the one that is 1-3: (initial value -1)/2
...and so on

Example
Initial Rescaled

1 0.00

2 0.17

3 0.33

4 0.50

5 0.67

6 0.83

7 1.00
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#3

23 Jul 2014, 02:40

You could use sheaf coefficients for that, see: http://www.maartenbuis.nl/software/sheafcoef.html

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Ujjwal

Join Date: Jul 2014

Posts: 56
#4

23 Jul 2014, 03:21

Hi Stefan, Thanks, could you elaborate the command please?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#5

23 Jul 2014, 03:57

He just made the range of all variables go from 0 to 1. If you added your ordinal variables linearly then the unit of all these variable represent going from minimum to maximum, which sometimes you can consider to be comparable.

However, I would be hesitent to do so for two reasons:
You would need to enter your ordinal variables linearly, which is often problematic for ordinal variables

Whether the minimum in a 3 category variable is really comparabel to the minimum in a 7 category variable (and similarly for the maxima) is often doubtful for substantive reasons. In a three category variable the minim is a generic negative response to the quesiton, while the minimum in a 7 category variable a severe negative response to the question. So the unit for the recoded 3 category variable is "generic negative to generic positive" while the unit for the 7 category variable is "severe negative to severe positive". I would not call that comparable.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Stefan

Join Date: Jul 2014

Posts: 4
#6

23 Jul 2014, 05:21

Suppose your variable named var1 is 1-7, then you have to use:
.generate var1_rescaled= (var1-1)/6

But keep in mind what Maarten wrote. Depending on what these numbers are and what do you want to do with them it's a good idea or not to do like in my example.
Comment
Ujjwal

Join Date: Jul 2014

Posts: 56
#7

24 Jul 2014, 01:00

Thanks. @Maarten Buis, I know there are problems with simple arithmetic standardization. I have tried to go through your paper. Sorry if I am too naive to say, it talks about post estimation standardization. I need a pre-estimation standardization, Could you please tell me how I can use sheaf coeff for rescaling the variables (0-1)?,
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#8

24 Jul 2014, 01:52

Why would you need pre-estimation standardization?

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Laurence Lester

Join Date: Apr 2014

Posts: 95
#9

24 Jul 2014, 21:54

~~Ignore me if I’m off track, but this discussion seems to be predicated on the assumption that the original scaling (ie 1-7, 1-3, 1-10 etc) has some underlying meaning. My guess is that they are arbitrary scaling and so any re-scaling seems to me to be neither good nor bad, just convenient.
Comment
Daniel Bela

Join Date: Apr 2014

Posts: 246
#10

25 Jul 2014, 01:57

Originally posted by Stefan View Post

generate var1_rescaled= (var1-1)/6

I think this could be stated more abstract as "(var1-(min(var1))/(max(var1)-(min(var1))". For three variables "var1", "var2" and "var3" it would be:

Code:

foreach var in var1 var2 var3 { summarize `var' , meanonly generate `var'_std=(`var'-`r(min)')/(`r(max)'-`r(min)') }

Regards
Bela
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#11

25 Jul 2014, 05:42

Another (simple) possibility here would be to replace each ordinal value with its approximate fractional rank, giving at least some crude capacity to compare scores from the two variables. One common approximation is so-called "ridit" scoring, which is conveniently available in the -egenmore- package from SSC.
(e.g. egen newvar1 = ridit(var1) )

Regards, Mike
2 likes
Comment
Ujjwal

Join Date: Jul 2014

Posts: 56
#12

25 Jul 2014, 06:33

@Maarten I want build an index after standardizing the variables. Building index is another story. I am not adroit enough to interpret MCA (multiple correspondence), I am trying factor analysis for this. If any help on interpreting MCA appreciated.

But as said, it is necessary for some variables to assume an underlying value behind the rank of the categorical variables.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#13

25 Jul 2014, 08:19

I am still not convinced that you need to standardize: None of the things you want to do require standardization. Could you tell us more why you think you need to standardize your ordinal variables?

My intuition is that you don't need to standardize. This could be a very good thing, as pre-estimation standardization of ordinal variables is very tricky. The concept of standardization fits much better with continuous variables, so any "standardization" of ordinal variables is going to be somewhat ad hoc, in a way that may work in some special situations, but certainly not in all or even the majority of sitiations. So if you can avoid the entire issue, so much the better.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#14

25 Jul 2014, 19:12

I'd side with Maarten on this. If you're trying factor analysis, then there won't be any need to standardize your ordered-categorical indicator variables. Stata's factor analysis command gsem handles indicator variables with different numbers of categories without any need to standardize (see do-file and associated log file and graph below). It seems to me that attempting to standardize beforehand will unnecessarily complicate interpretation of the factor loadings.

Attached Files

Ujjwal.do (1.8 KB, 1 view)

Ujjwal.smcl (11.3 KB, 1 view)
1 like
Comment
Ujjwal

Join Date: Jul 2014

Posts: 56
#15

26 Jul 2014, 00:59

Thanks for all the responses. Let me elaborate what exactly I am trying to do. From a large panel survey, I have picked a number of questions' responses that express peoples' state of financial distress. The responses are on different scales for example,

1. How do you think you manage finances now a day - a) b) c) d) d) e)
2. Have your situation changed since last year this time? - a) b) c)
3. Do you save from your current income? - a) b)
4. Having problem with housing payment? - a) b) c)
There are seven questions. I am trying to build an index of 'financial distress'. This index will be my main independent variable. My dependent variable will be Subjective Well being which is also measured on a seven point response, e.g., a) Not happy at all..........b) c) d) e) f) ......g) completely happy
I would like to standardize all these eight variables. Then build an index with seven variables that express state of financial distress. And then use panel data models to regress on Subjective well being.

I have 12 year unbalanced panel with more than 114,000 observations (over all these years), more than 8000 (varying) persons interviewed each year.
@Joseph, Thanks for your valuable help. I am using Stata 12 where gsem command would not probably work. How can I use Stata 12 for this?
Comment

Initial	Rescaled
1	0.00
2	0.17
3	0.33
4	0.50
5	0.67
6	0.83
7	1.00

Announcement

Any way to standardize ordinal/categorical variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment