Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable label in generate?

    Dear All,

    I'd like to confirm whether there is any syntax that would allow me to prescribe variable labels in the generate's syntax, something like hypothetical:
    Code:
    generate balance=income-spent, varlabel("Balance at the end of the month")
    The documentation for generate suggests that one can prescribe the value labels immediately in the same syntax, but is silent about the variable labels.

    If it doesn't exist yet, would be good to have it some time in the future.This would help quite a bit making the programs shorter and more documented.

    Currently one can do that in 2 statements, but that requires retyping the name of the variable and may be spaced out in the code:
    Code:
    generate balance=income-spent 
    variable label balance "Balance at the end of the month"
    Thank you, Sergiy Radyakin

  • #2
    Sergiy Radyakin surely knows this as a very experienced Stata user but just so no one gets confused, the second command is label variable

    If there is such an option, it is certainly not documented. I don't miss it myself.

    Comment


    • #3
      Here is the secret undocumented syntax we all been looking for

      Code:
      . clear
      
      . set obs 3
      number of observations (_N) was 0, now 3
      
      . gen income=3
      
      . gen spent = 1
      
      . for newlist balance: gen X = income - spent \ label var X "Balance at the end of the month"
      
      ->  gen balance = income - spent
      
      ->  label var balance `"Balance at the end of the month"'
      
      . des
      
      Contains data
        obs:             3                          
       vars:             3                          
       size:            36                          
      ----------------------------------------------------------------------------------------------------
                    storage   display    value
      variable name   type    format     label      variable label
      ----------------------------------------------------------------------------------------------------
      income          float   %9.0g                 
      spent           float   %9.0g                 
      balance         float   %9.0g                 Balance at the end of the month
      ----------------------------------------------------------------------------------------------------
      Sorted by: 
           Note: Dataset has changed since last saved.

      Comment


      • #4
        Congratulations, Joro Kolev for remembering and rediscovering what went non-documented in Stata 7 (2001). The price of being able to write it all on one line is here fourfold: (a) more typing in total (b) less efficient code (c) code that is as said not documented (d) code that is harder to debug if you get it wrong.

        Comment


        • #5
          I proudly take your Congratulations, Nick. In this context it was a joke. In the original version of the joke I thought to say that this undocumented new feature is forthcoming in Stata 17. But then I thought it is not 1st of April so I should not probably say stuff like this.

          More to the point of the matter, I agree with you in this context that what Sergiy wants might, or might not be useful. Simply because the two commands nicely split on two lines.



          Originally posted by Nick Cox View Post
          Congratulations, Joro Kolev for remembering and rediscovering what went non-documented in Stata 7 (2001). The price of being able to write it all on one line is here fourfold: (a) more typing in total (b) less efficient code (c) code that is as said not documented (d) code that is harder to debug if you get it wrong.

          Comment


          • #6
            Good. We are agreed. My post too was tongue in cheek.

            Comment


            • #7
              Dear Nick and Joro,

              thank you very much for your thoughts on this.
              Clearly "variable label" should have been "label variable", just me typing without proofing with Stata.

              On the trick shown by Joro, indeed I am too young to know it, thank you for showing this.
              But in this case it is rather an inconvenient solution, even though it seems to fit my original requirements of being in one line and mentioning the variable once only. And given the disadvantages a,b,c,d mentioned above by Nick (with which I fully concur) it might be more practical to define own
              Code:
              mygenerate anything, [label(string)]
              wrapping the generate and label commands (below).

              Thank you, Sergiy


              Code:
              clear all
              
              program define lgen
                  version 13.0
                  syntax anything=exp, [label(string) format(string) *]
                  quietly ds
                  local old `"`r(varlist)'"'
                  generate `anything' `exp', `options'
                  quietly ds
                  local new `"`r(varlist)'"'
                  local newvar : list new - old
                  if !missing(`"`label'"') label variable `newvar' `"`label'"'
                  if !missing(`"`format'"') format `newvar' `format'
              end
              
              sysuse auto
                lgen kgweight=weight/0.453, label("Weight (in kg)") after(weight)
                lgen double kgprice=price/kgweight, label("Price per kg") format(%8.4f) after(price)
                label define pcat 0 "Cheap" 1 "Expensive"
                lgen byte pricecat:pcat=(kgprice>1.00), label("Price category") before(price)
              
              describe
              tabulate pricecat
              
              // END OF FILE

              Comment


              • #8
                I would not use ds in a program for your purposes. You just need confirm used to check whether a new or old variable name is being specified.

                Comment


                • #9
                  Sergiy's program lgen is a just a proof of concept, I guess. If you were to write this up, you would
                  • support creating string variables; currently (well at least up to Stata 14.1), a literal string breaks the syntax's =exp [this is a bug in my view]
                  • support if and in qualifiers
                  • preserve contents in r() until you create the new variable; currently, ds destroys r()
                  • get rid of some of the overhead, e.g., calling ds twice
                  • perhaps make sure that break would not leave you with a new variable but without the label and/or properly formatted.
                  Here is a try (not much tested)

                  Code:
                  program lgen
                      version 11.2
                      
                      local zero : copy local 0
                      gettoken 0 : 0 , parse("=")
                      syntax newvarlist(max=1)
                      local newvar : copy local varlist
                      local 0 : copy local zero
                      
                      syntax anything(everything equalok) ///
                      [ ,                                 ///
                          LABEL(string asis)              ///
                          FORMAT(string asis)             ///
                          *                               ///
                      ]
                      
                      if (`"`options'"' != "") local comma ,
                      
                      nobreak {
                          generate `anything' `comma' `options'
                      
                          if ( mi(`"`label'`format'"') ) exit
                              /* NotReached */
                          
                          local RC 0
                          if (`"`label'"' != "") {
                              capture noisily label variable `newvar' `label'
                              local RC = _rc
                          }
                          if ("`format'" != "") & ( !`RC' ) {
                              capture noisily format `newvar' `format'
                              local RC = _rc
                          }
                          
                          if ( `RC' ) drop `newvar'
                              /*
                                  we call -drop- under version 11.2
                                  so -drop- will not destroy r()
                              */
                          exit `RC'
                      }
                  end
                  Last edited by daniel klein; 31 Jul 2020, 03:51. Reason: changes to the program; formatting

                  Comment


                  • #10
                    Unsurprisingly, others have passed this way before. I think Patrick Royston wrote something similar a way back.

                    Code:
                    . search defv, historical
                    
                    Search of official help files, FAQs, Examples, and Stata Journals
                    
                    STB-51  dm50.1  . . . . . . . . . . . . . . . . . . . . . . . . Update to defv
                            (help defv if installed)  . . . . . . . . . . . . . . .  J. R. Gleason
                            9/99    p.2; STB Reprints Vol 9, pp.14--15
                            updated to Stata 6 and improved
                    
                    STB-40  dm50  . . . . . . . Defining variables and recording their definitions
                            (help defv if installed)  . . . . . . . . . . . . . . .  J. R. Gleason
                            11/97   pp.9--10; STB Reprints Vol 7, pp.48--49
                            command to define a variable and document the operation it performs

                    Comment


                    • #11
                      Thank you Nick,

                      I have used ds as a "quick-and-dirty" way to determine, which variable name is specified, as I don't want to parse the anything myself. The last syntax in the example shows it could be more than just "take everything before =". I don't confirm the specified variable is new because I trust generate will do that for me.

                      Of course such use of ds inside my code ruins the results saved in r() by earlier commands. And of course there is more than one way to get to the same result. Here is the modified version that is not using ds and is not ruining the results saved in r(). And since I saved a couple of lines by switching to unab from ds, I've added the if/in modifiers which I have left out earlier.

                      Best, Sergiy

                      Code:
                      clear all
                      
                      program define lgen
                          version 13.0
                          syntax anything=exp [if] [in], [label(string) format(string) *]
                          marksample touse
                          
                          unab old : *
                          generate `anything' `exp' if `touse', `options'
                          unab new : *
                          local newvar : list new - old
                          if !missing(`"`label'"') label variable `newvar' `"`label'"'
                          if !missing(`"`format'"') format `newvar' `format'
                      end
                      
                      sysuse auto
                        lgen kgweight=weight/0.453, label("Weight (in kg)") after(weight)
                        lgen double kgprice=price/kgweight, label("Price per kg") format(%8.4f) after(price)
                        label define pcat 0 "Cheap" 1 "Expensive"
                        lgen byte pricecat:pcat=(kgprice>1.00), label("Price category") before(price)
                        lgen int thisyear=2020 if foreign | (price > 4200)
                      
                      describe
                      tabulate pricecat
                      
                      // END OF FILE

                      Comment


                      • #12
                        =exp will still choke on strings

                        Code:
                        . lgen mystringvar = strupper(make)
                        type mismatch
                        r(109);
                        and invalid labels and or formats will leave the (unfinished) variable in memory.

                        The first problem is, as noted, a bug in Stata's syntax my view; the second "problem" might be desirable but does not follow Stata's usual rule of: doing it all or doing nothing at all.
                        Last edited by daniel klein; 31 Jul 2020, 04:00.

                        Comment


                        • #13
                          Sorry Daniel, the page wasn't refreshed, so I didn't see your and Nick's comments in #9 and #10 above.
                          They are very helpful.

                          Comment


                          • #14
                            Originally posted by daniel klein View Post
                            The first problem is, as noted, a bug in Stata's syntax my view;
                            Agree to the point of a bug in Stata: I have re-read the syntax definition of =exp and there is no mentioning there that the expression must be of numeric nature.

                            I have further checked the version of lgen as shown in your post in #9 and I see
                            - that you've already gone through the pains of parsing the tokens in the command, so your code is superior to my version.
                            - that it passes all the tests I could through at it.

                            Well done! Thank you!

                            Comment

                            Working...
                            X