Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scalars and results r(something), and using such in expressions, and properly dereferencing such scalars and results

    I just realised that I have no idea what is the logic behind how scalars and results such as r(something) are being dereferenced and used. I have been using those with some success for 2 decades now, but if you ask me to explain why I am using one form or another, I would have nothing to say... I just memorised which is the form that works in a particular context...

    Can somebody put some order and logic into all of this, or point to references that I can read to clarify it for myself?

    Examples follow:


    1. Example 1: in this example, if I refer to r(mean) directly it works, if I dereference it `r(mean)' it works too, and if I pass it through a scalar, it works too.

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . clonevar price2 = price
    
    . clonevar price3 = price
    
    . qui summ price
    
    . replace price = . if price<r(mean)
    (52 real changes made, 52 to missing)
    
    . qui summ price2
    
    . replace price2 = . if price2<`r(mean)'
    (52 real changes made, 52 to missing)
    
    . qui summ price3
    
    . sca Mean = r(mean)
    
    . replace price3 = . if price3<Mean
    (52 real changes made, 52 to missing)
    
    
    . summ price*
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           price |         22    9814.364    3022.929       6229      15906
          price2 |         22    9814.364    3022.929       6229      15906
          price3 |         22    9814.364    3022.929       6229      15906
    
    .
    And then I encounter the first problem, when I try to dereference the scalar Mean, it does not work:

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . summ price
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
           price |         74    6165.257    2949.496       3291      15906
    
    . return list
    
    scalars:
                      r(N) =  74
                  r(sum_w) =  74
                   r(mean) =  6165.256756756757
                    r(Var) =  8699525.974268788
                     r(sd) =  2949.495884768919
                    r(min) =  3291
                    r(max) =  15906
                    r(sum) =  456229
    
    . sca Mean = r(mean)
    
    . replace price = . if price < `Mean'
    invalid syntax
    r(198);

    So when I write -return list- Stata claims that r(mean) is a scalar... But then I am able to dereference it as `r(mean)', whereas when I generate manually the scalar Mean, I am not able to dereference the scalar by `Mean'...

    The mystery gets deeper when I try to use those in loops:


    Example 2:

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . keep in 1/3
    (71 observations deleted)
    
    . qui summ price
    
    . forvalues i = 1/r(N) {
      2. dis `i'
      3. }
    invalid syntax
    r(198);
    
    . forvalues i = 1/`r(N)' {
      2. dis `i'
      3. }
    1
    2
    3
    
    . sca Nobs = r(N)
    
    . forvalues i = 1/Nobs {
      2. dis `i'
      3. }
    invalid syntax
    r(198);
    
    . forvalues i = 1/`Nobs' {
      2. dis `i'
      3. }
    invalid syntax
    r(198);
    
    . dis r(N)
    3
    
    . dis Nobs
    3
    
    . dis `Nobs'
    
    
    . dis `r(N)'
    3
    
    .
    So here Stata accepted only the second syntax `r(N)' and Stata rejected everything else as invalid syntax.

    Does anybody see any logic to all of this?

  • #2
    You use ` ' to get the content of a local macro, not to get the content of a scalar. So there are three ways to fix your second example:

    Code:
    // solution 1: refer to the scalar Mean as Mean
    sysuse auto, clear
    sum price
    scalar Mean = r(mean)
    replace price = . if price < Mean
    
    // solution 2: create a new name, store it in the local Mean, and use that
    // local to refer to that scalar
    sysuse auto, clear
    sum price
    tempname Mean
    scalar `Mean' = r(mean)
    replace price = . if price < `Mean'
    
    // solution 3 : store the mean in a local macro
    sysuse auto, clear
    sum price
    local Mean = r(mean)
    replace price = . if price < `Mean'
    Solution 3 will usually work fine, but the mean is not quite stored in double precision, so sometimes it does not wok. Solution 1 can cause problems as scalars share the same namespace as variables. Solution 2: is best (but in many cases overkill, so I more often use solution 3): the tempname command ensures we are not accidentally confusing variables and scalars and we get the full double precision from scalars.

    Your example 2 can be solved as:

    Code:
    forvalues i = 1/`=Nobs' {
       dis `i'
    }
    I have given a workshop on programming in Stata. I have a section on local macros and scalars in there. The slides are here: http://maartenbuis.nl/workshops/stata_l2/stata_l2.html
    Last edited by Maarten Buis; 18 Dec 2018, 08:42.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment

    Working...
    X