Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapse command and how to generate various percentiles for different variables and have them in one dataset

    Dear Stata Forum,

    I have individual level data and I would like to aggregate the wage data by region (250) and year (8) for the 10th percentile for both men (wage_m) and women (wage_f).
    For that purpose I run the following Stata command:
    Code:
    collapse (p10) wage_f wage_m, by(region year)
    I obtain the following:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(region year wage_f wage_m)
     1 2009  1.6777365 1.8589314
     1 2010  1.8355788 1.9151434
     1 2011   1.780765 2.2847788
     1 2012  1.6297935 2.2326164
     1 2013  1.7919272 2.3745558
     1 2014  1.8978895   2.42048
     1 2015   1.504847 2.5967376
     1 2016  1.8664685 1.9286542
     1 2017  1.8952124 2.3284633
     1 2018   1.751607 -3.470383
     1 2019  1.8743157 -3.555578
     1 2020  2.0351083 2.7034874
     1 2021   2.321434 2.5938554
     1 2022  1.9615375         .
    29 2009   .6825254  1.886309
    29 2010  1.9003967  .4382414
    29 2011   1.780765 1.0106492
    29 2012   1.826612  1.873501
    29 2013  1.8366302  2.015282
    29 2014   1.699241 1.8739363
    29 2015  1.7533083  1.976452
    29 2016  1.9004835 1.4490812
    29 2017  1.8376565  1.748631
    29 2018 -.25243014 1.8405644
    29 2019  1.7582496 1.9309986
    29 2020  1.8930303 2.0102127
    29 2021  1.9537123  1.879946
    29 2022  1.9623846 2.0046847
    30 2009   1.714847 2.3671966
    30 2010  1.7900953 1.8864112
    30 2011  1.0888505 2.0614786
    30 2012  1.6107576 2.0914638
    30 2013   1.206303 1.8549865
    30 2014  1.8371898 1.5391915
    30 2015  1.9012284  1.843309
    30 2016  1.8680296 1.9661306
    30 2017   1.902195 2.0467763
    30 2018  1.9486855 1.9803348
    30 2019   1.965883 2.1637087
    30 2020    1.79505  1.932251
    30 2021  1.6934086  1.866261
    30 2022  2.2022803 2.1573446
    30 2023          .         .
    31 2009  1.7540778 1.9948173
    31 2010   1.750428 2.0014927
    31 2011    1.34226 1.9948587
    31 2012  1.6107676 2.0719323
    31 2013   1.768422 1.8737785
    31 2014  1.6101997 2.0065582
    31 2015  1.8056804 1.9609476
    end
    My problem now is that I also need the overall median wage (female and male wages combined) by region and year in the same dataset as above.
    I am not sure how to proceed from this point on.
    I would highly appreciate your help.

    Thanks,
    Best,
    Nico








  • #2
    You need to ask for what you want with e.g.

    Code:
    (p10) p10_m=wage_m  p10_f=wage_f (median) med_m=wage_m med_f=wage_f
    Relevant examples in the command Help -- look at Quick Start.

    Comment


    • #3
      Hi Nick,
      Thanks so much for your help!
      I very much appreciate your input.
      It works now with the information you provided.
      My apologies for not having looked at Quick Start.
      Have a great day!
      Best,
      Nico

      Comment


      • #4
        Dear Stata Forum,
        I have a follow-up question.
        Code:
        collapse (p10) p10_m=wage_m p10_f=wage_f (median) med_wage = wage, by(region year)
        If you look at this, I obtain p10 and median by region and year.
        However, I need p10 by region and year, but median by region only.
        Is there a way to do this?
        I would appreciate your help.
        Have a good day.
        Best,
        Nico

        Comment


        • #5
          Code:
          preserve
          collapse (p10) p10_wage_f = wage_f p10_wage_m = wage_m, by(region year)
          tempfile holding
          save `holding'
          
          restore
          collapse (median) med_wage_f = wage_f med_wage_m = wage_m, by(region)
          
          merge 1:m region using `holding'
          isid region year, sort

          Comment


          • #6
            If you generate first

            Code:
            egen median_wage = median(wage), by(region)
            you can then add it to the arguments of collapse. It doesn't matter whether you ask for its median or its mean, as by construction each variable is constant within regions.

            Comment


            • #7
              Dear Clyde and Nick,
              Thanks for your feedback!
              Both suggestions help me a great deal.
              Have a good day,
              Best,
              Nico

              Comment

              Working...
              X