Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analyzing subsample of dataset

    Dear Stata users,

    I think I have a quite easy question, but since i am new to Stata I haven't been able to figure it out.

    Below you can find a sample of my data set which consist out of 260 companies with monthly data over 13 years.

    ame gwa return marketcap companyname year month date x portfolios
    NOVOZYMES A/S - GOODWILL - GROSS .029167 .091569 .004189 155 2006 1 2006m1 1 2
    MERCK KGAA - GOODWILL - GROSS .10902 .062225 .002097 143 2006 1 2006m1 1 3
    NATURGY ENERGY GROUP - GOODWILL - GROSS .033331 .027543 .001873 148 2006 1 2006m1 1 2
    DIAGEO PLC - GOODWILL - GROSS .012174 .04012 .00355 71 2006 1 2006m1 1 1
    LANXESS AG - GOODWILL - GROSS .004368 .059602 .000501 135 2006 1 2006m1 1 1
    SVENSKA CELLULOSA - GOODWILL - GROSS .127964 .122239 .011656 213 2006 1 2006m1 1 3
    CAPGEMINI SE - GOODWILL - GROSS .23913 .017238 .000955 42 2006 1 2006m1 1 4
    HEINEKEN HOLDING - GOODWILL - GROSS .176639 .060422 .001051 97 2006 1 2006m1 1 4
    ALFA LAVAL AB - GOODWILL - GROSS .235412 .153414 .004812 11 2006 1 2006m1 1 4
    ROYAL UNIBREW A/S - GOODWILL - GROSS .094737 .135891 .0006 182 2006 1 2006m1 1 3
    GIVAUDAN SA - GOODWILL - GROSS .228314 .0049 .00111 93 2006 1 2006m1 1 4
    TELEFONAKTIEBOLAGET - GOODWILL - GROSS .033887 .018313 .061239 221 2006 1 2006m1 1 2
    SPECTRIS PLC - GOODWILL - GROSS .401544 .053161 .000137 207 2006 1 2006m1 1 5
    NOKIA OYJ - GOODWILL - GROSS .024395 .021263 .008561 151 2006 1 2006m1 1 2
    KBC GROUP NV - GOODWILL - GROSS .005253 .02719 .004501 125 2006 1 2006m1 1 1
    NORDEA BANK ABP - GOODWILL - GROSS .006011 .005975 .038167 153 2006 1 2006m1 1 1
    FIAT CHRYSLER - GOODWILL - GROSS .062612 .013937 .001863 86 2006 1 2006m1 1 3
    RTL GROUP SA - GOODWILL - GROSS .628545 .059643 .001789 184 2006 1 2006m1 1 5

    Now I want to calculate the average return of the 50 smallest companies (by marketcap) and the average return of the 50 biggest companies (by market cap)
    Can somebody help me with the code?

    Kind regards!

  • #2
    Your example data is presented in a form that makes it almost unusable. It would take far more time and effort to import it into Stata to work with than to actually solve your problem. Please post back presenting your example using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When you do that, please also explain your question more clearly. How will you decide which companies are the biggest (or smallest) by marketcap? The same company's marketcap will vary from one month to the next, and some companies will drift in and out of the top-50 depending on the time period you look at. So which month's ascertainment of marketcap is to be used? Or do you mean that you want to calculate separate top 50 and bottom 50 mean returns for each month based on the top 50 and bottom 50 for that month?

    Comment


    • #3
      Thank you for your quick response!
      Hereby the better table:
      However if I copy them after using -dataex- I still get the same problem...


      input str39 name float(gwa return marketcap date)
      "AGEAS SA - GOODWILL - GROSS" .009033503 .10659746 .000883185 642
      "TOTAL SA - GOODWILL - GROSS" .011726501 -.022543075 .015024375 622
      "OLD MUTUAL LTD - GOODWILL - GROSS" .017107189 -.027507124 .001009161 605
      "BANCO SABADELL - GOODWILL - GROSS" .005408415 -.015229913 .001010345 661
      "DEUTSCHE BANK AG - GOODWILL - GROSS" .007018294 -.12948601 .004094488 649
      "ORION OYJ - GOODWILL - GROSS" .0161696 .003437212 .000465866 627
      "TOTAL SA - GOODWILL - GROSS" .011726501 .007351099 .015043226 617
      "SWISS LIFE HOLDING - GOODWILL - GROSS" .008787595 .03123988 .001136881 692
      "INVESTEC PLC - GOODWILL - GROSS" .011835885 -.016918883 .000522675 646
      "AIR FRANCE - KLM - GOODWILL - GROSS" .009573377 .24282925 .000308376 629
      "UNICREDIT SPA - GOODWILL - GROSS" .0135791 .023891166 .009665903 552
      "SIMCORP AS - GOODWILL - GROSS" .007284423 -.03847129 .000801675 627
      "DAIMLER AG - GOODWILL - GROSS" .005482155 .05413484 .008785705 657
      "ALLIANZ SE - GOODWILL - GROSS" .018019779 -.012089327 .007449592 636
      "CREDIT AGRICOLE SA - GOODWILL - GROSS" .012173735 -.12113318 .004091058 581
      "NORDEA BANK ABP - GOODWILL - GROSS" .003430818 .11363717 .03894224 687
      "BALOISE HOLDING AG - GOODWILL - GROSS" .002559522 .10909116 .000906835 554
      "MEDIOBANCA SPA - GOODWILL - GROSS" .004694974 .012831854 .000438577 624
      "DAIMLER AG - GOODWILL - GROSS" .007011117 .10263158 .007904968 607
      "SWEDBANK AB - GOODWILL - GROSS" .006916718 .002633294 .02565441 658

      Indeed I want to calculate seperate top 50 and bottom top 50 for each month based on the top 50 for that month.
      If I do that, I should be able to take the mean of those results afterwards, shouldnt I?

      Kind regards




      Comment


      • #4
        You have omitted part of the dataex output that you were not supposed to omit, which is why your sample data is not much easier to read. Note that at least dataex surrounded your string variable in quotation marks, as required to read it into Stata.

        Here is an example of dataex output in which I have colored red the lines that indicate the beginning and end of what you are to copy and paste into your post - the lines from [CODE] to [/CODE]. If you felt that 100 observations were too many to post, you can limit the number with the count() option, as I demontrate. Among the lines you did not copy in your dataex output was the format command that describes your date variable.
        Code:
        . dataex, count(5)
        
        ----------------------- copy starting from the next line -----------------------
        [CODE]
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str18 make int price
        "AMC Concord"   4099
        "AMC Pacer"     4749
        "AMC Spirit"    3799
        "Buick Century" 4816
        "Buick Electra" 7827
        end
        [/CODE]
        ------------------ copy up to and including the previous line ------------------
        
        Listed 5 out of 74 observations
        
        .

        Comment


        • #5
          Code:
          by date (marketcap), sort: gen byte top_50 = (_N-_n < 50)
          by date (marketcap): gen byte bottom_50 = (_n <= 50)
          by date: egen avg_ret_top_50 = mean(cond(top_50, ret, .))
          by date: egen avg_ret_bottom_50 = mean(cond(bottom_50, ret, .))
          Now, this code puts the average returns of the top and bottom 50 into every observation of the month. So if you want to calculate the means, you need to do so only counting each month once:

          Code:
          egen flag = tag(date)
          sum top_50 bottom_50 if flag

          Comment

          Working...
          X