Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Divide panel data into subgroups

    Dear Statalist,

    I am working on a panel dataset (xtset firm year).

    I want to divide the firms into two subgroups. One for small firms and the second for bigger firms.
    The category a firm belongs to should be determined by the mean of its assets over time.
    This is also an unbalanced panel so that for some firms, the mean must be based on two years, while it can be based on eight years for other firms.
    Then, firms with mean(ASSETS) above median should be categorized as "big_firms" (dummy=1), while small firms get dummy=0.

    I hope you can help me with this.

  • #2
    I believe this does it:
    Code:
    by firm, sort: egen mean_assets = mean(assets)
    egen flag = tag(firm)
    xtile big_firm = mean_assets if flag, nq(2)
    by firm (big_firm), sort: replace big_firm = big_firm[1]
    replace big_firm = big_firm - 1
    In the future, when asking for code, please show example data to work with, and use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Hi Shayan, and welcome to Statalist!

      So, it will be a lot easier for people to help you if you can share a sample of your data using Stata's dataex command (help dataex). If you're not familiar with dataex (and most Stata users aren't) there is a Youtube tutorial here.

      I created some toy data to help get you started. I'll start with the data in wide format (it's easier to see), then reshape it to long.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte firm_id int(y2005 y2006 y2007) float(y2008 y2009 avg)
      1 1200 1500 1875 2343.75 2929.69 1969.69
      2  500  525    .       .       .   512.5
      3  100  200  300     400     500     300
      4 1800 1980 2178    2400    2600  2191.6
      5 2061 2056 2107    2154    2141  2103.8
      end
      
      . list, noobs
      
        +---------------------------------------------------------------+
        | firm_id   y2005   y2006   y2007     y2008     y2009       avg |
        |---------------------------------------------------------------|
        |       1    1200    1500    1875   2343.75   2929.69   1969.69 |
        |       2     500     525       .         .         .     512.5 |
        |       3     100     200     300       400       500       300 |
        |       4    1800    1980    2178      2400      2600    2191.6 |
        |       5    2061    2056    2107      2154      2141    2103.8 |
        +---------------------------------------------------------------+
      ------------------ copy up to and including the previous line ------------------

      The median of 1969.69, 512.5, 300, 2191.6, and 2103.8 = 1,969.69 So, by that definition, firms 4 & 5 will be above median, and firms 1-3 will be less than or equal to the median.

      Doing that in Stata:
      Code:
      reshape long y, i(firm_id) j(year)
      rename y assets
      
      . tabstat assets avg, stats(mean median sd min max) c(stats)
      
          variable |      mean       p50        sd       min       max
      -------------+--------------------------------------------------
            assets |  1538.656    1927.5  895.3661       100   2929.69
               avg |  1415.518   1969.69  846.9263       300    2191.6
      ----------------------------------------------------------------
      
      tabstat assets, by(firm_id) stats(mean median min max)
      
      Summary for variables: assets
           by categories of: firm_id 
      
       firm_id |      mean       p50       min       max
      ---------+----------------------------------------
             1 |  1969.688      1875      1200   2929.69
             2 |     512.5     512.5       500       525
             3 |       300       300       100       500
             4 |    2191.6      2178      1800      2600
             5 |    2103.8      2107      2056      2154
      ---------+----------------------------------------
         Total |  1538.656    1927.5       100   2929.69
      --------------------------------------------------
      
      bysort firm_id (year): egen firm_avg = mean(assets)  // creates avg assets for each firm
      gen above_median = 0
      replace above_median = 1 if firm_avg >= 1969.7 & firm_avg!=.   // marks firm as above median if its avg assets is above the median
      
      . list, sepby(firm_id) abbrev(12) noobs
      
        +--------------------------------------------------------------+
        | firm_id   year    assets       avg   firm_avg   above_median |
        |--------------------------------------------------------------|
        |       1   2005   1,200.0   1,969.7    1,969.7              0 |
        |       1   2006   1,500.0   1,969.7    1,969.7              0 |
        |       1   2007   1,875.0   1,969.7    1,969.7              0 |
        |       1   2008   2,343.8   1,969.7    1,969.7              0 |
        |       1   2009   2,929.7   1,969.7    1,969.7              0 |
        |--------------------------------------------------------------|
        |       2   2005     500.0     512.5      512.5              0 |
        |       2   2006     525.0     512.5      512.5              0 |
        |       2   2007         .     512.5      512.5              0 |
        |       2   2008         .     512.5      512.5              0 |
        |       2   2009         .     512.5      512.5              0 |
        |--------------------------------------------------------------|
        |       3   2005     100.0     300.0      300.0              0 |
        |       3   2006     200.0     300.0      300.0              0 |
        |       3   2007     300.0     300.0      300.0              0 |
        |       3   2008     400.0     300.0      300.0              0 |
        |       3   2009     500.0     300.0      300.0              0 |
        |--------------------------------------------------------------|
        |       4   2005   1,800.0   2,191.6    2,191.6              1 |
        |       4   2006   1,980.0   2,191.6    2,191.6              1 |
        |       4   2007   2,178.0   2,191.6    2,191.6              1 |
        |       4   2008   2,400.0   2,191.6    2,191.6              1 |
        |       4   2009   2,600.0   2,191.6    2,191.6              1 |
        |--------------------------------------------------------------|
        |       5   2005   2,061.0   2,103.8    2,103.8              1 |
        |       5   2006   2,056.0   2,103.8    2,103.8              1 |
        |       5   2007   2,107.0   2,103.8    2,103.8              1 |
        |       5   2008   2,154.0   2,103.8    2,103.8              1 |
        |       5   2009   2,141.0   2,103.8    2,103.8              1 |
        +--------------------------------------------------------------+


      Comment

      Working...
      X