Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Aggregating panel data to a weighted average mean by year and region

    Hello,

    I hope you are all doing fine!

    I have a panel dataset with over a thousand firm IDs and 397 variables. For each ID I have data on time periods ranging from 2003-2010, although each firm ID underlies different time periods (2003-2006, 2004-2008, etc.).

    Because I would like to run a linear regression using as the dependent variable the returns (_roe) of two subgroups (given by the variable profit_status: non-profit or profit), I need to create a for-profit and a non-profit return portfolio first. Therefore, I need to find the weighted average return of each of the two portfolios per year and then obtain the mean value for each subgroup.

    Finally, I would like to implement the same process by region.

    Here is an extract of my dataset:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long mfiid int year str10 profit_status str31 region double loanp_gr float _roe
    100000 2007 "Non-profit" "South Asia"                        4.447335    .2913083
    100000 2008 "Non-profit" "South Asia"                        5.815762   .17713334
    100000 2009 "Non-profit" "South Asia"                        6.989898   .14123334
    100001 2004 "Non-profit" "Eastern Europe and Central Asia"   3.823308   .04955833
    100001 2005 "Non-profit" "Eastern Europe and Central Asia"   5.243011 -.008699998
    100001 2006 "Non-profit" "Eastern Europe and Central Asia"    9.54903   .02328333
    100001 2007 "Non-profit" "Eastern Europe and Central Asia"  32.509157   .21060835
    100001 2009 "Non-profit" "Eastern Europe and Central Asia"  43.950894  .015933335
    100001 2010 "Non-profit" "Eastern Europe and Central Asia"  34.686921  .017058332
    100004 2008 "Profit"     "Africa"                          318.429249   .20683333
    100008 2006 "Non-profit" "South Asia"                         .925905   -.4828166
    100008 2007 "Non-profit" "South Asia"                        2.514016   -.4197917
    100008 2008 "Non-profit" "South Asia"                        3.528543  -2.9300666
    100008 2009 "Non-profit" "South Asia"                        3.126599    9.721633
    100012 2005 "Non-profit" "South Asia"                        1.188292       .3659
    100012 2006 "Non-profit" "South Asia"                        2.859306   -.4431167
    100012 2007 "Non-profit" "South Asia"                        6.380352   1.6969082
    100012 2008 "Non-profit" "South Asia"                        7.094487   1.2119334
    100012 2009 "Non-profit" "South Asia"                       12.460604    .9820334
    100012 2010 "Non-profit" "South Asia"                       17.238102   .16185834
    100013 2009 "Non-profit" "South Asia"                        1.247345  -.10906667
    100013 2010 "Non-profit" "South Asia"                        1.343236  -.13484167
    100016 2005 "Profit"     "South Asia"                        4.186943      -.0101
    100016 2006 "Profit"     "South Asia"                        6.661947 .0035833344
    100016 2007 "Profit"     "South Asia"                       12.494852  .016408335
    100016 2008 "Profit"     "South Asia"                       12.535291   .05173333
    100016 2009 "Profit"     "South Asia"                       17.369746   .07153333
    100016 2010 "Profit"     "South Asia"                       19.892395   .08925833
    100017 2003 "Profit"     "South Asia"                       18.902664      .17895
    100017 2004 "Profit"     "South Asia"                       40.199809   .16935833
    end
    format %ty year

    I have tried to do it by creating a variable for the weighted average returns and then extract the mean values of the summary statistics to an Excel file, but I wonder whether there is a better option just using Stata (as I am really new to the software).

    I would appreciate your help.

    Kind regards
    Last edited by Rafael Maschke; 20 Jan 2020, 10:31.

  • #2

    I had a similar challenge and got the following code from a user in the forum! you may wish to consider trying it


    by region year, sort: egen numerator = total(weight*bits)
    by region year: egen denominator = total(weight)
    gen weighted_bits_mean = numerator/denominator
    Last edited by Beri Parfait; 21 Mar 2020, 05:40.

    Comment

    Working...
    X