Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why we ignore median or middle observations in the distribution

    Dear Stata community
    Glancing through some economics and finance journals, gives me in a impression that, most of the studies ignore the median value of a distribution and consider only top and bottom observations. I understand that median observations are very much vulnerable to misclassification but still, is there a more strong reason why we consider the top and bottom of the distribution of a variable (say income) and drop those middle parts of observations.

  • #2
    Could you elaborate what the problem might be if we "ignore the median value of a distribution"?

    Comment


    • #3
      Dear Fei Wang
      Thanks for taking the time to understand my issue. I read recently an article that has the following lines
      "We divide our sample into three terciles (top33%, middle 33%, and bottom 33%) based on firms’ average prereform (2009 to 2011) measure of asset tangibility (as measured by the ratio of fixed assets to total assets), with firms in the highest tercile forming our treated group, whereas firms in the lowest tercile are our control group (cutting off firms that fall in the middle 33% of the asset tangibility distribution)"

      In another one it is written as.
      "In each year we rank firms according to their total book assets at the beginning of the year and treat the top 30% as financially
      unconstrained and the bottom 30% as financially constrained"



      Many articles do this, and my question is why those median values or middle observations are dropped? Earlier studies made threshold based on median values , for instance, a person is labelled as rich if his income is greater than the median income of all the households in that particular year, and 0 other wise. But now rich and poor are defined based on top and bottom part of the income distribution. Why medians are ignored and what is the logic of using bottom and top parts of distribution? Can you help me with this


      Comment


      • #4
        Thanks for the interpretation. I guess there are at least two reasons. First, specific definitions may be vague for the middle group. For example, we may quite be sure that the top third is financially unconstrained and the bottom third is constrained, but the middle third may be a mixture and hard to define. We usually have to tailor our sample to specific requirements at a cost of losing observations. Second, practically, my bold guess is that directly comparing top with bottom would generate more salient effects, desired by the authors. Myself would do it differently. I would treat the middle group as a different group: top third -- strong treatment group, middle third -- weak treatment group, and bottom third -- control group. This setting would reveal richer heterogeneous effects.

        Comment


        • #5
          Thanks dear Fei Wang for the wonderful explanation. I like your version which doesn't lose observations and can provide heterogeneity. However, after posting the question, I have some doubts. First, how come the bottom group comprises unconstrained firms as by construction bottom group has highest assets, right? Even in the first case, it is written highest tercile forming our treated group, what is highest and lowest here? I am attaching a sample dataset and can you help me with this dataset
          1) to classify firms into terciles
          2) How bottom or top /highest or lowest tercile is constructed


          Code:
          clear
          input long co_code int year double tass_w
              11 2010   1614.1
              11 2011   2191.1
              11 2012   2784.9
              11 2013   3057.6
              11 2014     3425
              11 2015   3498.9
              11 2016   3532.9
              11 2017   3900.3
              11 2018     4068
              11 2019   4383.7
           96387 2005   4447.3
           96387 2006   8283.5
           96387 2007  13605.1
           96387 2008  24209.7
           96387 2009  39661.8
           96387 2010  39844.7
           96387 2011  42517.2
           96387 2012    40650
           96387 2013  37755.2
           96387 2014  38411.7
           96387 2017  17829.2
           96387 2018  16980.7
           96387 2019  17751.7
           36277 2019  21933.2
           73119 2007  13543.5
           73119 2008  21380.2
           73119 2009  27159.7
           73119 2010    25695
           73119 2011  32031.6
           73119 2012  36842.9
           73119 2013  47097.6
           73119 2014  36100.3
           73119 2015  39883.2
           73119 2016  40004.4
           73119 2017  40747.7
           73119 2018  36963.6
           73119 2019  35017.6
          389178 2011  21237.6
          389178 2012  28538.6
          389178 2013  28974.1
          389178 2014  28881.8
          389178 2015  28925.9
          389178 2016  34620.5
          389178 2017  33960.9
          389178 2018  31446.6
          389178 2019  22740.9
           21420 2013  69646.5
             289 2012   1746.5
             289 2013     1522
             289 2014   1349.4
             414 2017    389.4
             414 2018    322.7
             414 2019    259.3
             415 2019   1874.7
           23354 2001  34572.4
           23354 2002  36817.1
           23354 2003  37684.2
           23354 2004  42575.3
           23354 2005  47945.5
           23354 2006  50304.4
           23354 2007  60777.2
           23354 2008  71758.6
           23354 2009  86590.7
           23354 2010 102280.7
           23354 2011 121849.7
           23354 2012 121080.4
           23354 2013 121446.6
           23354 2014 123591.9
           23354 2015 129307.7
           23354 2016 130183.4
           23354 2017 138059.2
           23354 2018   152513
           23354 2019 163650.6
             783 2009   1049.4
             783 2010   1201.9
             783 2011   1723.5
             783 2012   1816.7
             783 2013   2208.3
             783 2014   2182.6
             783 2015   2183.8
             783 2016     2179
             783 2017   2148.1
             783 2018   2149.7
             783 2019   2042.7
          455291 2019   2970.2
            1120 2007   6970.6
            1120 2008     8434
            1120 2009  11336.6
            1120 2010  12975.6
            1120 2011  12945.2
            1120 2012  14937.2
            1120 2013  18524.6
            1120 2014  22357.9
            1120 2015  25511.6
            1120 2016    27984
            1120 2017    32081
            1120 2018  35691.3
            1120 2019  40503.9
           15510 2005    558.5
           15510 2007     2604
           15510 2008   2725.5
           15510 2009   2837.9
           15510 2010   2811.9
           15510 2011   2985.7
           15510 2012   2912.3
           15510 2013   2909.6
           15510 2014     3059
           15510 2015   3174.9
           15510 2016   2866.4
           15510 2017   2659.7
           15510 2018   2936.8
           15510 2019   3129.3
           78253 2007  10341.6
           78253 2008  16905.1
           78253 2009  21358.9
           78253 2010  28227.4
           78253 2011  37371.4
           78253 2012  48085.4
           78253 2013  52839.1
           78253 2014  55360.6
           78253 2015  47864.1
           78253 2016  47864.1
           78253 2017  49736.2
           78253 2018  57083.1
           78253 2019  48886.8
          373258 2008   1871.5
          373258 2009   1962.2
          373258 2010   2463.8
          373258 2011   2811.7
          373258 2012   2634.8
          373258 2013   2768.6
          373258 2014   2778.1
          528025 2016      918
          528025 2017   1090.8
          528025 2018   1555.7
          528025 2019   2151.4
          183399 2003   2126.1
          183399 2004   2060.1
          183399 2005   1954.9
          183399 2006   2324.3
          183399 2007   2335.6
          183399 2008   2651.9
          183399 2009   2270.6
          183399 2010   2599.6
          183399 2011   2694.8
          183399 2012   3545.4
          183399 2013   3379.1
          183399 2014   3953.5
          183399 2015   4077.6
          183399 2016   1269.9
          183399 2017   1353.1
          183399 2018   1282.4
          183399 2019   1174.6
           35548 2010   3950.2
           35548 2011   5544.1
           35548 2012     7058
           35548 2013     9974
           35548 2014  11502.7
           35548 2015  13647.1
           35548 2016  17385.4
           35548 2017  20018.1
           35548 2018  23289.6
           35548 2019  29776.1
            2248 2018   3008.9
           15646 2009    282.9
           15646 2010    406.8
           15646 2011    416.6
          370768 2012  19156.2
          370768 2017    19283
          370768 2018    18786
          370768 2019  18933.6
           23482 2015   3894.4
           23482 2016   4281.1
           23482 2017   4678.8
           23482 2018   4736.4
           23482 2019   5425.2
            2842 2004    119.7
            2842 2005    124.1
            2842 2006    176.1
            2842 2007    220.5
            2842 2008    291.5
            2842 2009    356.1
            2842 2010    350.4
            2842 2011    659.7
            2842 2012    871.9
            2842 2013   1079.5
            2842 2014   1172.1
            2842 2015     1178
            2842 2016    901.6
            2842 2017    617.9
            2842 2018    693.2
            2842 2019    864.7
          463750 2019   1855.4
          546359 2018    510.7
            3335 2008      956
            3335 2009   1032.1
            3335 2010   1198.6
            3335 2011   1399.2
            3335 2012   1962.2
            3335 2013     2284
          end
          tass_w is the assets winsorized and co_code is the company code number.

          Comment


          • #6
            First, how come the bottom group comprises unconstrained firms as by construction bottom group has highest assets, right?
            This is related to specific knowledge of finance, and is beyond my field.

            Even in the first case, it is written highest tercile forming our treated group, what is highest and lowest here?
            According to your quote, "highest tercile" means the top third, and the "lowest tercile" indicates the bottom third.

            How bottom or top /highest or lowest tercile is constructed
            Is your data for case 1 or 2 in #2? I roughly understand case 1 but case 2 seems vague to me. But anyway, you may split your sample, according to the value of x, into three groups by using -xtile-.

            Code:
            xtile group = x, n(3)

            Comment


            • #7
              Thanks Fei Wang. A similar point was discussed in thread https://www.statalist.org/forums/for...35#post1471335

              I read further and related to quote 1 it is stated somewhere in the paper that
              "We divide our sample into three terciles (top 33%, middle 33%, and bottom 33%) based on firms’ average pre-reform (2009 to 2011) measure of asset tangibility (as measured by the ratio of fixed assets to total assets), with firms in the highest tercile are defined as high asset tangibility firms (our treatment group), whereas firms in the lowest tercile are defined as low asset tangibility firms"

              I think authors mean that Treatments group (high tangible firms) are bottom one of the code xtile group = x, n(3); 3rd ones and control firms corresponds to 1 and 2 is dropped, right?

              Comment


              • #8
                In short, we often don't do that; but reasons for concentrating on best and worst performing firms are surely simple enough: describing, explaining and predicting their attributes and performance is an interesting and important problem, just as in medicine people who are very sick or very healthy are both instructive in different ways.

                Comment


                • #9
                  Originally posted by lal mohan kumar View Post
                  Thanks Fei Wang. A similar point was discussed in thread https://www.statalist.org/forums/for...35#post1471335

                  I read further and related to quote 1 it is stated somewhere in the paper that
                  "We divide our sample into three terciles (top 33%, middle 33%, and bottom 33%) based on firms’ average pre-reform (2009 to 2011) measure of asset tangibility (as measured by the ratio of fixed assets to total assets), with firms in the highest tercile are defined as high asset tangibility firms (our treatment group), whereas firms in the lowest tercile are defined as low asset tangibility firms"

                  I think authors mean that Treatments group (high tangible firms) are bottom one of the code xtile group = x, n(3); 3rd ones and control firms corresponds to 1 and 2 is dropped, right?
                  To my understanding, the treatment group includes firms with the highest ratios of fixed assets to total assets, and therefore is the one with group number 3. The control group is with number 1.

                  Comment


                  • #10
                    Thanks Nick Cox & Fei Wang for your time and support

                    Comment

                    Working...
                    X