Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate the median and store it as an scalar.

    Good morning,

    I would like to calculate the median of a variable, let's call X and after store that median as a scalar. I need it because I want to create another variable which says if the X has a value greater than the median, give me a value of 1, 0 otherwise.

    Anyone has an idea how to score the median in a scalar in order to use after?

    Best,

    Diego.

  • #2
    Here are two ways to do it.


    Code:
    sysuse auto, clear
    
    summarize mpg, detail 
    
    scalar median = r(p50) 
    
    centile mpg 
    
    scalar median2 = r(c_1)

    Comment


    • #3
      And the third, better than the other two (because it is much faster and offers 2 options of how to calculate the median) is

      Code:
      . sysuse auto
      (1978 Automobile Data)
      
      . _pctile mpg, percentiles(50)
      
      . scalar Median = r(r1)
      
      . dis Median
      20

      Comment


      • #4
        Thank you Nick Cox . I imagine that my median is store in the variable median (in the first case) and median2 (in the second), right?

        Can I do the following:

        gen X_2 = 1 if X > median (or median2)
        gen X_2 = 0 if X < median (or median2) ?

        Best,

        Diego.

        Comment


        • #5
          Yes and no: as requested and as explicit in the syntax the results are stored in scalars, which to Stata are not variables. But you can refer to them similarly, noting that you are best advised not to use the same name for a scalar and a variable. The scalar() syntax is a good idea.

          Comment


          • #6
            What Nick is referring to is explained in detail in Kolev, Gueorgui I. "Stata tip 31: Scalar or variable? The problem of ambiguous names." The Stata Journal 6, no. 2 (2006): 279-280.

            In your case your syntax will result in a mess if you have another variable whose name starts with "m", because 1) Stata allows variables to be abbreviated, therefore mpg becomes "m" 2) Stata always takes the variable interpretation, over the scalar interpretation.

            Two ways around this are to refer to the scalar as scalar(median), or to adopt a personal convention where you say call your variables with starting lower case letters like m, and scalars and matrices with upper case letters like Median.

            In your particular case you seem to want classification by the median, therefore, the easiest thing to do is to

            Code:
            . sysuse auto
            (1978 Automobile Data)
            
            . _pctile mpg, percentiles(50)
            
            . gen x = mpg>r(r1)




            Originally posted by Diego Malo View Post
            Thank you Nick Cox . I imagine that my median is store in the variable median (in the first case) and median2 (in the second), right?

            Can I do the following:

            gen X_2 = 1 if X > median (or median2)
            gen X_2 = 0 if X < median (or median2) ?

            Best,

            Diego.

            Comment


            • #7
              Thank you Joro Kolev for your answer.

              I think with your code I do not have what I want. I want a variable that take value 1 if variable X is greater than the median and value of 0 otherwise.
              gen x = mpg>r(r1) do that? Best and thank you again for your help!

              Comment


              • #8
                Yes Diego, this is what the code does.

                Try it on a sample and you will see.

                Originally posted by Diego Malo View Post
                Thank you Joro Kolev for your answer.

                I think with your code I do not have what I want. I want a variable that take value 1 if variable X is greater than the median and value of 0 otherwise.
                gen x = mpg>r(r1) do that? Best and thank you again for your help!

                Comment


                • #9
                  Yes, it works! Thank you! I do not know why but it works jeje.

                  Best,

                  Diego!

                  Comment


                  • #10
                    It works because this command

                    Code:
                     . _pctile mpg, percentiles(50)
                    leaves behind a scalar named r(r1).

                    Then this command
                    Code:
                    . gen x = mpg>r(r1)
                    from right to left firstly makes the logical evaluation of whether mpg>r(r1) is true or false for each observation. If true, it evaluates to 1, if false it evaluates to 0.

                    Finally the equality sign is an assignment operator which assigns the logical evaluation from the previous sentence to x.


                    Originally posted by Diego Malo View Post
                    Yes, it works! Thank you! I do not know why but it works jeje.

                    Best,

                    Diego!

                    Comment


                    • #11
                      Indicator or dummy variables here are given by the evaluation of


                      variable > median

                      except that missing values also count as true or 1 for such an evaluation. If you prefer an indicator that is 0, 1 or missing, then the expression to feed to generate will be more like

                      variable > median if variable < .

                      For more, see https://www.stata.com/support/faqs/d...les/index.html

                      https://www.stata.com/support/faqs/d...rue-and-false/

                      For much more, see https://www.stata-journal.com/articl...article=dm0099

                      Comment


                      • #12
                        Thank you both of you! Nick Cox and Joro Kolev !!

                        Comment

                        Working...
                        X