Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate a dummy variable for firms at different quantiles?

    Hello,

    I would like to know what is the Stata code if I want to create a dummy variable that takes value of 1 if a firm belongs to the 75th percentile of the X variable and 0 otherwise. and another dummy variable that takes value of 1 if a firm belongs to the 25th percentile of the X variable and 0 otherwise?


  • #2
    Sally:
    you can do much better than that in a very easy way:
    Code:
    . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
    (1978 automobile data)
    
    .  xtile quart = price , nq(4)
    
    . tab quart
    
    4 quantiles |
      of price  |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |         19       25.68       25.68
              2 |         18       24.32       50.00
              3 |         19       25.68       75.68
              4 |         18       24.32      100.00
    ------------+-----------------------------------
          Total |         74      100.00
    
    .
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      I can't follow quite what this means. The 25% and 75% percentiles are points -- summary values calculated according whatever percentile rule you're following (and there are several) -- and otherwise by extension intervals that each correspond to just about 1% of the data.

      Do you mean something more like being in

      1. the first quarter (less than the 25% percentile or lower (first) quartile)

      and being in

      2. the last quarter (greater than or equal to the 75% percentile or upper (third quartile)?

      Comment


      • #4
        Hi Carlo, can you please clarify the use of the xtile?

        Hi Nick, yes I mean this I am following this method of measurement from an article and they also mentioned that they also used dummy variables using the 90th and 10th percentiles as robustness and the results remain the same. so if this is the case what is the best code to create these dummy variables?

        Comment


        • #5
          I am not clear that you answered my question.

          You added more in mentioning other percentiles.

          Possibly what you need is to use summarize if you are referring to those percentiles across the dataset and then generate variables for being greater or less. But if you have say panel data you will want to use egen.

          Comment


          • #6
            I am using panel data and yes to your previous question. X is the difference between two variables and to consider the heterogeneity of firms based on
            their Y and Z practices, I want to create these dummy variables. So my question if I would like to create two variables one that have the greatest gap representing 75% and those who have lowest gap 25% how to do that?

            Comment


            • #7
              Sally:
              I'm still not clear with what you're after.
              That said, I do hope the following toy-example (based on a cross-sectional dataset, though) can be of some help:
              Code:
              . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
              (1978 automobile data)
              
              . xtile quart = price , nq(4)
              
              . ttest price if quart==1 | quart==3, unequal by( quart )
              
              Two-sample t test with unequal variances
              ------------------------------------------------------------------------------
                 Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
              ---------+--------------------------------------------------------------------
                     1 |      19    3907.684    61.31822    267.2799    3778.859    4036.509
                     3 |      19    5708.947    100.1486    436.5376    5498.543    5919.352
              ---------+--------------------------------------------------------------------
              Combined |      38    4808.316     158.987    980.0618    4486.177    5130.454
              ---------+--------------------------------------------------------------------
                  diff |           -1801.263    117.4294               -2041.142   -1561.384
              ------------------------------------------------------------------------------
                  diff = mean(1) - mean(3)                                      t = -15.3391
              H0: diff = 0                     Satterthwaite's degrees of freedom =  29.8327
              
                  Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
               Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000
              
              .
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment

              Working...
              X