Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating dummy variable using a categorical variable which is named by a letter/word

    Hi all,

    I am writing my thesis and struggling with creating dummy variables. I have made dummy variables which are named by numbers/years etc.

    But now, I want to create a dummy variable based on energy certificates which are labeled with a letter from a to g. This variable is named 'energieklasse'.

    I tried several options but mainly this:

    gen energycertificateA = (energieklasse == 'a') -> 'a' invalid name
    gen energycertificateA = (energieklasse == "a") -> type mismatch
    gen energycertificateA = (energieklasse == a) -> a ambiguous abbreviation

    Looking forward to hear from someone who can help me out with this problem!

    Thanks in advance.


  • #2
    It seems like your variable energieklasse is a numeric variable with a value label attached. If that is the case, below is the code you are looking for:

    Code:
    gen energycertificateA = "a":<label_name>
    Replace <label_name> with the name of the label attached to energieklasse, which you can find by typing:

    Code:
    d energieklasse
    And looking at the Value label column.

    Comment


    • #3
      Thank you for your response. The value label of the variable is ENERGIEK and I used this to fill in the code.

      gen energycertificateA = "a":ENERGIEK

      The variable is made, no errors about that. But now, all the observations are labeled as 1, also the observations with label b, c, d, e, f and g.

      Do you maybe also have to answer to this?

      Thanks in advance.

      Comment


      • #4
        Maybe good to explain, the variables are labels where label A is the best and label G is the worst, so an ordinal variable.

        Comment


        • #5
          Code:
           
           gen energycertificateA = energieklasse == "a":<label_name>

          Comment


          • #6
            Thank you very much, this worked out! I have one other question in the same topic area:

            I am trying to create a dummy variable of a numeric variable with 10 possible outcomes (locations, so not ordinal). I want a label 1 if the variable is in four of the 10 possible locations.

            The locations are "Zuid-Holland" "Noord-Holland" "Utrecht" "Flevoland".

            I tried to use the same code as before (and some others):

            gen randstad = prov == "Zuid-Holland" & "Noord-Holland" & "Utrecht" & "Flevoland":PROV -> type mismatch

            Where prov is the variable name and PROV is the value label. Randstad is the new variable I want to create.

            Could you also help me with this?
            Last edited by Jill Brown; 18 Apr 2022, 10:48.

            Comment


            • #7
              Code:
              gen randstad = inlist(prov,"Zuid-Holland":PROV,"Noord-Holland":PROV,"Utrecht":PROV,"Flevoland":PROV)
              For four values, it's not too laborious to type out the values by hand as above. However, if you want to do this for a larger set of match values, other solutions will be more appropriate.

              Comment


              • #8
                Ali Atia has given you excellent advice.

                I would just like to go back to the original question raised in #1 and wonder why you want to create those separate variables for each energy class. The commonest use for indicator variables ("dummies") is to represent a categorical variable like energieklasse in a regression. But unless you are using a very old version of Stata, you don't actually need those separate variables for this purpose. You can accomplish it using factor variable notation (see -help fvvarlist- for details) and do thinks like:

                Code:
                regression_command  outcome_variable other_explanatory_variables i.energieklasse
                Then Stata will create "virtual" indicators to represent the different levels. Using this approach instead of hand-coded indicators you get several advantages: your data set does not get cluttered up with redundant junk variables, the regression output from Stata will be better labeled and formatted, and you can follow your regression command with the -margins- command to get other interesting results.

                Comment


                • #9
                  Thank you for the advices, everything worked out for me!

                  Comment

                  Working...
                  X