Creating dummy variable using a categorical variable which is named by a letter/word

Jill Brown

Join Date: Apr 2022

Posts: 5
#1

Creating dummy variable using a categorical variable which is named by a letter/word

18 Apr 2022, 08:01

Hi all,

I am writing my thesis and struggling with creating dummy variables. I have made dummy variables which are named by numbers/years etc.

But now, I want to create a dummy variable based on energy certificates which are labeled with a letter from a to g. This variable is named 'energieklasse'.

I tried several options but mainly this:

gen energycertificateA = (energieklasse == 'a') -> 'a' invalid name
gen energycertificateA = (energieklasse == "a") -> type mismatch
gen energycertificateA = (energieklasse == a) -> a ambiguous abbreviation

Looking forward to hear from someone who can help me out with this problem!

Thanks in advance.
Tags: None
Ali Atia

Join Date: May 2020

Posts: 737
#2

18 Apr 2022, 08:06

It seems like your variable energieklasse is a numeric variable with a value label attached. If that is the case, below is the code you are looking for:

Code:

gen energycertificateA = "a":<label_name>

Replace <label_name> with the name of the label attached to energieklasse, which you can find by typing:

Code:

d energieklasse

And looking at the Value label column.
1 like
Comment
Jill Brown

Join Date: Apr 2022

Posts: 5
#3

18 Apr 2022, 08:26

Thank you for your response. The value label of the variable is ENERGIEK and I used this to fill in the code.

gen energycertificateA = "a":ENERGIEK

The variable is made, no errors about that. But now, all the observations are labeled as 1, also the observations with label b, c, d, e, f and g.

Do you maybe also have to answer to this?

Thanks in advance.
Comment
Jill Brown

Join Date: Apr 2022

Posts: 5
#4

18 Apr 2022, 08:29

Maybe good to explain, the variables are labels where label A is the best and label G is the worst, so an ordinal variable.
Comment
Ali Atia

Join Date: May 2020

Posts: 737
#5

18 Apr 2022, 08:29

Code:

gen energycertificateA = energieklasse == "a":<label_name>
1 like
Comment
Jill Brown

Join Date: Apr 2022

Posts: 5
#6

18 Apr 2022, 09:46

Thank you very much, this worked out! I have one other question in the same topic area:

I am trying to create a dummy variable of a numeric variable with 10 possible outcomes (locations, so not ordinal). I want a label 1 if the variable is in four of the 10 possible locations.

The locations are "Zuid-Holland" "Noord-Holland" "Utrecht" "Flevoland".

I tried to use the same code as before (and some others):

gen randstad = prov == "Zuid-Holland" & "Noord-Holland" & "Utrecht" & "Flevoland":PROV -> type mismatch

Where prov is the variable name and PROV is the value label. Randstad is the new variable I want to create.

Could you also help me with this?

Last edited by Jill Brown; 18 Apr 2022, 09:48.
Comment
Ali Atia

Join Date: May 2020

Posts: 737
#7

18 Apr 2022, 09:55

Code:

gen randstad = inlist(prov,"Zuid-Holland":PROV,"Noord-Holland":PROV,"Utrecht":PROV,"Flevoland":PROV)

For four values, it's not too laborious to type out the values by hand as above. However, if you want to do this for a larger set of match values, other solutions will be more appropriate.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

18 Apr 2022, 10:23

Ali Atia has given you excellent advice.

I would just like to go back to the original question raised in #1 and wonder why you want to create those separate variables for each energy class. The commonest use for indicator variables ("dummies") is to represent a categorical variable like energieklasse in a regression. But unless you are using a very old version of Stata, you don't actually need those separate variables for this purpose. You can accomplish it using factor variable notation (see -help fvvarlist- for details) and do thinks like:

Code:

regression_command outcome_variable other_explanatory_variables i.energieklasse

Then Stata will create "virtual" indicators to represent the different levels. Using this approach instead of hand-coded indicators you get several advantages: your data set does not get cluttered up with redundant junk variables, the regression output from Stata will be better labeled and formatted, and you can follow your regression command with the -margins- command to get other interesting results.
2 likes
Comment
Jill Brown

Join Date: Apr 2022

Posts: 5
#9

19 Apr 2022, 00:50

Thank you for the advices, everything worked out for me!
Comment

Announcement

Creating dummy variable using a categorical variable which is named by a letter/word

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment