Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Preserving value labels after encoding a string variable

    Dear Statalisters,

    I realise parts of this question have been addressed across several threads though I think this current request will help a lot of people with the same problem.

    I have a string variable treat where "1"=exposed and "0"=unexposed

    When I encode treat, gen(treat_cde) the value labels have changed to "2"=exposed and "1"=unexposed

    Some commands won't work with a treatment variable not labeled as "1" and "0" (eg. pbalchk).

    How can one encode the string variable whilst preserving the original value labels?

    I have attempted to use the command recode with little success.

    When I applied recode treat_cde (2=1) all the data are changed to "1"s.

    Many thanks,
    Alexander






    Last edited by Alexander Rodriguez; 03 Sep 2019, 16:35. Reason: Corrected spelling error in the title
    Many thanks,
    Alexander
    (Stata v14.2 IC for Mac)

  • #2
    How can one encode the string variable whilst preserving the original value labels?
    I don't understand the question; I can't even imagine the scenario in which this question could arise and make sense.

    In order for -encode- to apply to the variable treat, treat must be a string variable. If it is a string variable, then it doesn't have any value labels.

    I have a string variable treat where "1"=exposed and "0"=unexposed
    Again, I'm confused. If your variable takes on values 0 and 1, why would you create it as a string variable "0" and "1"? Or was this imported from, say, a spreadsheet, where there are also non-numeric values like "N/A" or something like that?

    Here's my best guess as to what has gone on. You got some data, probably from a spreadsheet, in which you have a string variable whose values include "0" and "1" and perhaps some other values. Your understanding of this data is that "0" corresponds to unexposed status and "1" corresponds to exposed status, and you would like to convert this to a numeric 0/1 variable with the same interpretation, and also with value labels attached.

    First, you need to verify that any values of treat other than "0" or "1" are truly synonyms for missing value, and not, say mistyped numbers:
    Code:
    tab treat if missing(real(treat))
    Assuming that you don't find anything problematic in that output, do this:
    Code:
    replace treat = "" if missing(real(treat))
    destring treat, replace
    label define treat 0 "Unexposed" 1 "Exposed"
    label values treat treat
    If this is not what you want, please post back and use -dataex- to give an example of your data, so that I can work in the reality of what you have, and not with imaginary data. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I don't understand the question; I can't even imagine the scenario in which this question could arise and make sense.

      In order for -encode- to apply to the variable treat, treat must be a string variable. If it is a string variable, then it doesn't have any value labels.


      Again, I'm confused. If your variable takes on values 0 and 1, why would you create it as a string variable "0" and "1"? Or was this imported from, say, a spreadsheet, where there are also non-numeric values like "N/A" or something like that?

      Here's my best guess as to what has gone on. You got some data, probably from a spreadsheet, in which you have a string variable whose values include "0" and "1" and perhaps some other values. Your understanding of this data is that "0" corresponds to unexposed status and "1" corresponds to exposed status, and you would like to convert this to a numeric 0/1 variable with the same interpretation, and also with value labels attached.

      First, you need to verify that any values of treat other than "0" or "1" are truly synonyms for missing value, and not, say mistyped numbers:
      Code:
      tab treat if missing(real(treat))
      Assuming that you don't find anything problematic in that output, do this:
      Code:
      replace treat = "" if missing(real(treat))
      destring treat, replace
      label define treat 0 "Unexposed" 1 "Exposed"
      label values treat treat
      If this is not what you want, please post back and use -dataex- to give an example of your data, so that I can work in the reality of what you have, and not with imaginary data. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      When asking for help with code, always show example data. When showing example data, always use -dataex-.
      Dear Clyde,
      Thank you for your helpful response. You are correct. I am importing these data from a spreadsheet. My apologies for misquoting the variables types.

      As suggested, to hopefully demonstrate the issue I am providing a sample of the data (which are themselves made-up data as I am practising some other code with them but then this issue arose which has frustrated me).

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int pnr byte sex str1 sex_str long sex_cde byte treat str1 treat_str long treat_cde byte af str1 af_str long af_cde
      1045 1 "1" 2 1 "1" 3 0 "0" 2
      1167 2 "2" 3 1 "1" 3 0 "0" 2
      1183 1 "1" 2 1 "1" 3 1 "1" 3
      1215 1 "1" 2 1 "1" 3 0 "0" 2
      1277 1 "1" 2 1 "1" 3 1 "1" 3
      1259 1 "1" 2 1 "1" 3 0 "0" 2
      1041 1 "1" 2 0 "0" 2 0 "0" 2
      1228 1 "1" 2 0 "0" 2 1 "1" 3
      1270 2 "2" 3 0 "0" 2 0 "0" 2
      1138 2 "2" 3 1 "1" 3 1 "1" 3
      end
      label values sex_cde sex_cde
      label def sex_cde 2 "1", modify
      label def sex_cde 3 "2", modify
      label values treat_cde treat_cde
      label def treat_cde 2 "0", modify
      label def treat_cde 3 "1", modify
      label values af_cde af_cde
      label def af_cde 2 "0", modify
      label def af_cde 3 "1", modify

      In the original numeric form of the variable, the data are expressed as "1" and "0" representing "yes" and "no" except for sex where male is "1" and female is "2".

      To get here, I computed:

      Code:
      tostring treat, gen(treat_str)
      Code:
      tostring sex, gen(sex_str)
      Code:
      tostring af, gen(af_str)
      and then

      Code:
      encode treat_str, gen(treat_cde)
      Code:
      encode sex_str, gen(sex_cde)
      Code:
      encode af_str, gen(af_cde)

      As you can hopefully appreciate, in the encoded form (for example treat_cde) where in the numeric and string form "1" represented "yes" it is now represented by "2" and thus this encoded variable cannot be used in some codes which require "0" or "1" such as the pbalchk I mentioned in my original post #1.

      I hope this better clarifies my issue. Any assistance is greatly appreciated.






      Many thanks,
      Alexander
      (Stata v14.2 IC for Mac)

      Comment


      • #4
        Yes, the default behavior of -encode- is to create a value label with the levels of the string variable in alphabetical order and number them starting from 1. As you correctly observe, such variables are not suitable for use with commands that require a 0/1 or 0/!0 variable.

        The workaround is to create the value label before you -encode- the variable and use the -label()- option of -encode- telling it to use the coding you specified rather than the default.

        So, rather than -encode treat_str, gen(treat_cde)- you need to do:

        Code:
        label define treat_cde 0 "unexposed" 1 "exposed
        encode treat_str, gen(treat_cde) label(treat_cde)

        Comment


        • #5
          Dear Clyde,
          Thank you indeed. However, the issue still remains. I am attaching a couple of screenshots as I believe this best illustrates the problem.

          Please also find here some data after applying your code

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte treat str1 treat_str long treat_cde
          1 "1" 3
          1 "1" 3
          1 "1" 3
          1 "1" 3
          1 "1" 3
          1 "1" 3
          0 "0" 2
          0 "0" 2
          0 "0" 2
          1 "1" 3
          end
          label values treat_cde treat_cde
          label def treat_cde 2 "0", modify
          label def treat_cde 3 "1", modify
          I feel like I am missing something obvious here which means my problem is persisting.
          Attached Files
          Many thanks,
          Alexander
          (Stata v14.2 IC for Mac)

          Comment


          • #6
            I cannot read the screenshots--that's one of the reasons they are discouraged, as they are often unreadable.

            But I just have to look at your code to see the problem. My code in #2 was mistaken. First thing is to get rid of the treat_cde variable, which is incorrect as it stands. Then we create a new one:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte treat str1 treat_str long treat_cde
            1 "1" 3
            1 "1" 3
            1 "1" 3
            1 "1" 3
            1 "1" 3
            1 "1" 3
            0 "0" 2
            0 "0" 2
            0 "0" 2
            1 "1" 3
            end
            label values treat_cde treat_cde
            label def treat_cde 2 "0", modify
            label def treat_cde 3 "1", modify
            
            label drop treat_cde
            drop treat_cde
            
            label define treat_cde 0 "unexposed" 1 "exposed"
            destring treat_str, gen(treat_cde)
            label values treat_cde treat_cde

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              I cannot read the screenshots--that's one of the reasons they are discouraged, as they are often unreadable.

              But I just have to look at your code to see the problem. My code in #2 was mistaken. First thing is to get rid of the treat_cde variable, which is incorrect as it stands. Then we create a new one:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input byte treat str1 treat_str long treat_cde
              1 "1" 3
              1 "1" 3
              1 "1" 3
              1 "1" 3
              1 "1" 3
              1 "1" 3
              0 "0" 2
              0 "0" 2
              0 "0" 2
              1 "1" 3
              end
              label values treat_cde treat_cde
              label def treat_cde 2 "0", modify
              label def treat_cde 3 "1", modify
              
              label drop treat_cde
              drop treat_cde
              
              label define treat_cde 0 "unexposed" 1 "exposed"
              destring treat_str, gen(treat_cde)
              label values treat_cde treat_cde
              Dear Clyde,
              My apologies for the screenshots. Thank you for the correction. I have employed this with success. I am very appreciative.

              Thank you again.






              Many thanks,
              Alexander
              (Stata v14.2 IC for Mac)

              Comment

              Working...
              X