Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • descriptive statistics on funds

    Hello, I would appreciate any help on the following problem:

    fundid fundnr isin cusip calmt quarter month year esb
    fs00008l08 1 US74316J7303 74316J730 622 207 11 2011 Large Blend
    fs00008l0w 2 US00766Y4897 00766Y489 620 206 9 2011 Large Growth
    fs00008l33 3 US34984T4360 34984T436 623 207 12 2011 Large Blend
    fs00008l6e 4 US75281Y2063 75281Y206 622 207 11 2011 Small Growth
    fs00008l7x 5 US56064V1061 56064V106 623 207 12 2011 Small Growth

    I have to summarize the numbers on small caps which i tried by creating a new variable with: 'gen smallcap = 1 if substr(esb,1, 5) == "Small"' and then 'sum smallcap' to get the total numbers of small caps of funds.
    But now i have a problem in finding out the average number of smallcaps since 'summarize smallcap, detail' gives out mean = 1
    Does anyone know how to avoid this problem?


    Last edited by Samantha Will; 22 Mar 2022, 12:59.

  • #2
    Welcome to Statalist.

    Using the 5 cases in your question as example, by "total numbers of small caps of funds", did you mean:
    1. 1245 (from 622 + 623), or
    2. 414 (from 207 + 207), or
    3. 2 (from two cases of "Small" in the data)?

    Comment


    • #3
      drop if qualifier so that smallcap equals 0 if substr(esb,1,5) != "Small",
      Code:
      *Example generated by -dataex-. To install: ssc install dataex
      clear
      input str10 fundid byte ufndnr str12 isin str9 cusip int(calmt quarter) byte month int year str12 esb
      "fs00008l08" 1 "US74316J7303" "74316J730" 622 207 11 2011 "Large Blend"
      "fs00008l0w" 2 "US00766Y4897" "00766Y489" 620 206  9 2011 "Large Growth"
      "fs00008l33" 3 "US34984T4360" "34984T436" 623 207 12 2011 "Large Blend"
      "fs00008l6e" 4 "US75281Y2063" "75281Y206" 622 207 11 2011 "Small Growth"
      "fs00008l7x" 5 "US56064V1061" "56064V106" 623 207 12 2011 "Small Growth"
      end
      
      gen smallcap = substr(esb,1,5)=="Small"
      
      tab smallcap

      Comment


      • #4
        Originally posted by Ken Chui View Post
        Welcome to Statalist.

        Using the 5 cases in your question as example, by "total numbers of small caps of funds", did you mean:
        1. 1245 (from 622 + 623), or
        2. 414 (from 207 + 207), or
        3. 2 (from two cases of "Small" in the data)?
        case 3, sorry for any confusion!

        Comment


        • #5
          Originally posted by Samantha Will View Post

          case 3, sorry for any confusion!
          In that case, you can use the method in #3 to compute the mean of the 1/0, with that fraction, multiply with the total case number to get the total count.

          If you want to use your own variable, then you need to dig a bit deeper, the actual sum is not routinely reported, even with ", detail. Try this:

          Code:
          summarize smallcap, detail
          display r(sum)

          Comment


          • #6
            Thank you both for your answers! I am trying to recreate Carharts ''Time-Series Averages of Cross-Sectional Average Annual Attribute'' but on current data.
            By trying to get the 'avg number' on the fund category ''Small'' my mean value (which i thought would deliver the average number) is now pretty small with 0.2009569
            'summarize fundnr, detail' did work to get the avg number on all funds, so i do not understand the problem here?

            Attached Files

            Comment


            • #7
              Then I think you may need to study table in more detail and make sure your data match their structure. From The Journal of Finance, Vol. 52, No. 1 (Mar., 1997), pp. 57-82, it says on page 59:

              My sample includes a total of 1892 diversified equity funds and 16109 fund years.
              and then:

              In an average year, the sample includes 509 funds with average total net assets (TNA) of ...
              So, in Cahart's data, there are two levels: years and funds. In this table, they probably compute the total number of unique funds by each year, and then took an average of them.

              Comment


              • #8
                Is there a function which delivers not 0/1 but 0( if not smallcap) and 1,2,3....(upwards) for all others in the right order?

                Comment


                • #9
                  perhaps,
                  Code:
                  gen wanted = cond(esb=="Small Growth",sum(esb=="Small Growth"),0)
                  Last edited by Øyvind Snilsberg; 23 Mar 2022, 03:30.

                  Comment


                  • #10
                    Originally posted by Øyvind Snilsberg View Post
                    perhaps,
                    Code:
                    gen wanted = cond(esb=="Small Growth",sum(esb=="Small Growth"),0)
                    Thank you so much!!

                    Comment


                    • #11
                      Originally posted by Ken Chui View Post
                      Then I think you may need to study table in more detail and make sure your data match their structure. From The Journal of Finance, Vol. 52, No. 1 (Mar., 1997), pp. 57-82, it says on page 59:



                      and then:



                      So, in Cahart's data, there are two levels: years and funds. In this table, they probably compute the total number of unique funds by each year, and then took an average of them.
                      Thank you!! So my interpretation of calculating the averages based on function 'mean' in stata is probably not correct? Even if I calculate the fund years i wouldnt know how to create the 'average year'

                      Comment


                      • #12
                        Originally posted by Samantha Will View Post

                        Thank you!! So my interpretation of calculating the averages based on function 'mean' in stata is probably not correct? Even if I calculate the fund years i wouldnt know how to create the 'average year'
                        Of course it is possible. If this author you quoted has a repeated time series data, where multiple companies are listed across different years, it'd be like:

                        Code:
                            clear
                            input year fundid smallcap
                            1997 1 0
                            1997 2 1
                            1997 3 1
                            1997 2 1
                            1997 3 0
                            1998 2 0
                            1998 3 0
                            1999 1 1
                            1999 2 0
                            1999 4 0
                            2000 2 1
                            2000 3 1
                            2000 4 0
                            2000 5 0
                            end
                            
                            * Number of funds, yearly average:
                            preserve
                            collapse (count) fundid, by(year)
                            sum fundid
                            restore
                            
                            * Number of small cap funds, yearly average:
                            preserve
                            collapse (sum) smallcap, by(year)
                            sum smallcap
                            restore
                        I believe by "average year", the author might mean averages of annual sums/counts.
                        Last edited by Ken Chui; 24 Mar 2022, 19:06.

                        Comment


                        • #13
                          Originally posted by Ken Chui View Post

                          Of course it is possible. If this author you quoted has a repeated time series data, where multiple companies are listed across different years, it'd be like:

                          Code:
                           clear
                          input year fundid smallcap
                          1997 1 0
                          1997 2 1
                          1997 3 1
                          1997 2 1
                          1997 3 0
                          1998 2 0
                          1998 3 0
                          1999 1 1
                          1999 2 0
                          1999 4 0
                          2000 2 1
                          2000 3 1
                          2000 4 0
                          2000 5 0
                          end
                          
                          * Number of funds, yearly average:
                          preserve
                          collapse (count) fundid, by(year)
                          sum fundid
                          restore
                          
                          * Number of small cap funds, yearly average:
                          preserve
                          collapse (sum) smallcap, by(year)
                          sum smallcap
                          restore
                          I believe by "average year", the author might mean averages of annual sums/counts.
                          Thank you. I tried using this on my database, I have total number of 3344 different funds, when I try using this function I will get an average of more than 20 000, which cant be possible. Is it because there are same funds within the same year (but different months) thus creating duplicate data? I somehow have to identify the yearly average but the fund should be counted only once
                          (for example:
                          fundid fundnr isin cusip calmt quarter month year
                          fs00008l08 1 US74316J7303 74316J730 618 206 7 2011
                          fs00008l08 1 US74316J7303 74316J730 619 206 8 2011
                          fs00008l08 1 US74316J7303 74316J730 620 206 9 2011
                          fs00008l08 1 US74316J7303 74316J730 621 207 10 2011
                          fs00008l08 1 US74316J7303 74316J730 622 207 11 2011
                          fs00008l08 1 US74316J7303 74316J730 623 207 12 2011
                          fs00008l08 1 US74316J7303 74316J730 624 208 1 2012
                          While I have multiple observations of fundnr 1 in the year 2011, I should only count it as one fund within the year of 2011)

                          Comment


                          • #14
                            See the -tag()- function of egen

                            Code:
                            help egen
                            So something like:

                            Code:
                            egen tag= tag(fundid year)
                            preserve
                            collapse (count) fundid if tag, by(year)
                            sum fundid
                            restore
                            Last edited by Andrew Musau; 30 Mar 2022, 13:34.

                            Comment


                            • #15
                              Originally posted by Andrew Musau View Post
                              See the -tag()- function of egen

                              Code:
                              help egen
                              So something like:

                              Code:
                              egen tag= tag(fundid year)
                              preserve
                              collapse (count) fundid if tag, by(year)
                              sum fundid
                              restore
                              Thank you! This seems to be working. So by that logic, if I am trying to get the smallcaps of each year ,multiple fundnr should also only be counted once in each year, then its:
                              egen tag = tag(smallcap fundnr)
                              preserve
                              collapse(sum) smallcap if tag, by(year)
                              sum smallcap
                              restore
                              (Example: fundnr month year smallcap
                              568 11 1990 1
                              568 2 1990 1
                              863 7 1990 0
                              926 12 1990 0
                              1027 4 1990 1
                              555 10 1990 0
                              699 5 1990 1
                              1169 10 1990 1
                              I sorted by year, so if the fundnr 568 observes multiple months in the year 1990 and happens to be a smallcap (number 1) it should only be counted as one smallcap firm)

                              Comment

                              Working...
                              X