Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • assign value using label

    Say, we have:

    label define boolean 0 "no" 1 "yes"

    To have more readable code, I would like to be able to write:

    gen boolean x = "no"
    replace x = "yes" if y > 50

    x would still be a numeric variable (having 0's and 1's in this case), but stata would be able to detect that you're using strings, so needs to make a translation using the defined value label.

    If that is not possible, it would also be very helpful to have an option to display both value and label in the same command, like:

    tab x, valueAndLabel

    Right now, when you need do data manipulations, you first need to find out where things are going wrong, then need to find out the matching code for that, eg:

    tab x
    tab x, nolabel

    That is cumbersome. But if we could just use labels in code, the underlying values are not needed anymore at all, which would remove the need to know codes of value labels.

  • #2
    You can do:
    Code:
    label define boolean 0 "no" 1 "yes"
    gen byte x = "no":boolean
    replace x = "yes":boolean if y > 50

    Comment


    • #3
      Hm, that requires typing the lable name in every command and doesn't use the inherit info when a variable has a value label. It's possible, but not that elegant unfortunately. Especially not from the perspective of maintainability: you don't want to ever change the name of the value label "boolean" now.

      Comment


      • #4
        I would post this in the Wishlist thread, so long as you link to replies here.

        I've seen requests for a bit variable or storage type, which seem interesting to me. But it would have to allow missings as well, I think, which would defeat the point.

        My reaction is that StataCorp is highly unlikely to change the rules on something so fundamental, but I am not an employee.

        I don't find your suggested code more readable at all! Experienced users as well as new users would see string arguments and it would take a lot of explaining, "No, this is a new thing now allowed".

        Stata like any other language is good at working out what you say and not so good as working out what you mean or should find useful.

        Given that the label boolean has been defined

        Code:
        generate byte x = y > 50
        label val x boolean
        is already possible, and no more typing than you want to be able to do.

        In each case

        Code:
        y > 50 if y < .
        is a better idea for most purposes, as otherwise missings would get mapped to 1 "Yes".

        Commands to define new variables and assign labels at the same time are perfectly programmable. For all I know you can already do this with one of daniel klein 's commands.
        Last edited by Nick Cox; 16 Sep 2022, 02:03.

        Comment


        • #5
          Having to do quite some recode stuff often, I can tell you that those numbers do not make any sense and are not readable at all, and hence extremely error prone. I always flag them with a lot of comments to tell readers what I'm actually trying to do. Code should be more readable by itself and not require so many comments. Referring to values using labels/variables like any real programming language, would sort that out.
          Last edited by Hendri Adriaens; 16 Sep 2022, 02:05.

          Comment


          • #6
            I am at a loss to follow #5. I can't fathom what you mean by "those numbers do not make any sense and are not readable at all". Which numbers?

            It's often hard to discuss what is confusing to anyone or claims based on experience.

            Even experienced people often have a habit of mistaking what is familiar as what is natural or intuitive. Any language or environment you use a great deal becomes familiar and what you thought was bizarre comes to seem standard. If you are used to X, Y can look really odd.

            I have a lot of experience with Stata too but I didn't find the value label idea or syntax hard to learn at the outset. (I struggled more with other constructs, such as by:.) Then again, I don't like the syntax used in #2 -- which is naturally is StataCorp's and may or may not be liked by Hemanshu Kumar -- but that boils down to taste, on which people can disagree.

            Backing up: As I understand it, you want -- once a value label is defined -- to be able to type two particular command lines, whereas both Hemanshu Kumar and I have shown that there already are two ways to do the same thing with two commands. So the number of commands is exactly the same.

            There can be no objection to adding comments to code if you doubt that your code is clear or can't assume that your readers will understand it. I never comment on value label assignments regarding them as basic syntax, but then I don't expect Stata learners to read my program code.

            Please tell me which "real programming languages" allow anything like

            Code:
            gen boolean x = "no"
            where the result of supplying a string in an assignment is a numeric variable! You want this to happen because a value label pre-exists. The more I think about it, the more it seems that would be terrible syntax.(a matter largely of taste) -- and (this is the important bit) it would be utterly at odds with how Stata parses syntax. Even if it were possible, it would be likely to cause as much confusion and as many bugs as anything it complements.

            Comment


            • #7
              I never said that a real programming language can do 'gen boolean x = "no"'. That syntax suggestion was made with taking into account all the limitations stata has.

              Furthermore, some examples of stuff where you could use far more logical and readable data structures in a real programming language:

              replace amrPlusG5 = 11 if gemeentecode == 363 // Amsterdam
              replace amrPlusG5 = 13 if gemeentecode == 34 // Almere
              replace amrPlusG5 = 18 if gemeentecode == 344 // Utrecht
              replace amrPlusG5 = 23 if gemeentecode == 518 // 's-Gravenhage
              replace amrPlusG5 = 27 if gemeentecode == 599 // Rotterdam

              Comment


              • #8
                "anything like" in #6 means what it says. I am curious to know which languages support mappings of the form

                numeric_result := string_argument

                with nothing else said.

                I agree with your example: a mapping from 518 to 23 and so forth -- done in the way you do it -- really needs to be commented, or if you have value labels, you can decode to a string variable and work with that. I would do things like that with a merge between different datasets. It is not well suited to numerous replace statements. See https://www.stata.com/support/faqs/d...s-for-subsets/

                What does this example have to do with the request in #1?

                I can't regard the complications of what you're doing with specific data as Stata's fault or as indicating any deficiency of Stata. What would it look like in your favourite language, "real" or otherwise? .

                Comment


                • #9
                  The variables obviously have value labels re. connection to request. Merge requires another dataset with correct values. I have many corrections, on many variables, with many different conditions. Good luck creating datasets for each possible combination. And again, the numbers still don't mean anything and have readability 0. And besides that, all of this is extremely error prone and maintainability is zero.

                  Comment


                  • #10
                    This thread has to be read backwards as well as forwards in that (for example) the complaint in #5 only makes sense to me given the example in #7.

                    Also, what you're asking for and what you're asserting change post by post. The bottom line, perhaps, is that you don't like (this part of) Stata syntax at all, and I won't push further at that: Stata is not a religion and dissent is not heresy.

                    In any case I agree strongly that code like yours in #7 is appalling, being hard to read, tedious and error-prone. I suggested an alternative of treating data as data, and not as code, which you don't like, so I will leave it there.

                    Comment

                    Working...
                    X