Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustered standard errors for a single variable in panel data

    Dear Stata users,

    I am working with panel data for funds and look for a solution to calculate standard errors (SEs) of a single variable (return) on a given day t. These SEs need to be clustered around the respective values for the cluster_variable (which refers to different investment styles in this case). I.e. I want the SEs only to be calcluated for all observations with the same cluster_variable on day t, and not for the whole sample on the day. As you can see, the cluster_variable is static over time for each fund.

    Here is a short example.



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(fund t) double return byte cluster_variable
    1 1  .1 1
    2 1  .2 1
    3 1 .08 2
    4 1  .9 2
    5 1  .7 2
    1 2  .4 1
    2 2  .5 1
    3 2 .03 2
    4 2  .2 2
    5 2  .4 2
    end




    I have contemplated to produce the SDs and then count the observations (obs) of each cluster variable to produce SEs, following SE = SD/sqrt(obs). So I started with: egen SD = sd(return) by (cluster_variable t) to generate the following.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(fund t) double return byte cluster_variable float SD
    1 1  .1 1 .07071068
    2 1  .2 1 .07071068
    3 1 .08 2  .4275512
    4 1  .9 2  .4275512
    5 1  .7 2  .4275512
    1 2  .4 1 .07071068
    2 2  .5 1 .07071068
    3 2 .03 2  .1852026
    4 2  .2 2  .1852026
    5 2  .4 2  .1852026
    end


    Can anyone provide a more elegant way to derive the desired SEs or provide help how to count the number of same cluster_variable observations on a given day t?
    The counting result (obs) should look like this in a new variable:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(fund t) double return byte cluster_variable float(SD obs)
    1 1  .1 1 .07071068 2
    2 1  .2 1 .07071068 2
    3 1 .08 2  .4275512 3
    4 1  .9 2  .4275512 3
    5 1  .7 2  .4275512 3
    1 2  .4 1 .07071068 2
    2 2  .5 1 .07071068 2
    3 2 .03 2  .1852026 3
    4 2  .2 2  .1852026 3
    5 2  .4 2  .1852026 3
    end



    The data above is a simplified example. The real dataset has >1.000 funds and around 12 cluster variables.

    Best,
    Daniel

  • #2
    To count the number of nonmissing observations in a group you can do

    Code:
    egen n = count(return)  by(cluster_variable t)
    The user written package -egenmore-, type
    Code:
    findit egenmore
    and follow instructions to install, includes the function - semean(exp), by(group) - which seems to be what you are asking for.

    Comment


    • #3
      Thanks a lot Joro, very much appreciated. I am aware of the count command but couldn't figure out how to make it work for a specific non-missing value. I will try your recommendation and report.

      Comment


      • #4
        Hi Joro, after installing -egenmore- I was able to create the required variable with egen obs = semean(return), by (cluster_variable t). I verfied the results by hand, it yields correct values. Thanks very much again for your help. The matter is resolved.

        Comment

        Working...
        X