Clustered standard errors for a single variable in panel data

Daniel Gilcher

Join Date: Apr 2022

Posts: 9
#1

Clustered standard errors for a single variable in panel data

25 Apr 2022, 14:55

Dear Stata users,

I am working with panel data for funds and look for a solution to calculate standard errors (SEs) of a single variable (return) on a given day t. These SEs need to be clustered around the respective values for the cluster_variable (which refers to different investment styles in this case). I.e. I want the SEs only to be calcluated for all observations with the same cluster_variable on day t, and not for the whole sample on the day. As you can see, the cluster_variable is static over time for each fund.

Here is a short example.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(fund t) double return byte cluster_variable 1 1 .1 1 2 1 .2 1 3 1 .08 2 4 1 .9 2 5 1 .7 2 1 2 .4 1 2 2 .5 1 3 2 .03 2 4 2 .2 2 5 2 .4 2 end

I have contemplated to produce the SDs and then count the observations (obs) of each cluster variable to produce SEs, following SE = SD/sqrt(obs). So I started with: egen SD = sd(return) by (cluster_variable t) to generate the following.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(fund t) double return byte cluster_variable float SD 1 1 .1 1 .07071068 2 1 .2 1 .07071068 3 1 .08 2 .4275512 4 1 .9 2 .4275512 5 1 .7 2 .4275512 1 2 .4 1 .07071068 2 2 .5 1 .07071068 3 2 .03 2 .1852026 4 2 .2 2 .1852026 5 2 .4 2 .1852026 end

Can anyone provide a more elegant way to derive the desired SEs or provide help how to count the number of same cluster_variable observations on a given day t?
The counting result (obs) should look like this in a new variable:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(fund t) double return byte cluster_variable float(SD obs) 1 1 .1 1 .07071068 2 2 1 .2 1 .07071068 2 3 1 .08 2 .4275512 3 4 1 .9 2 .4275512 3 5 1 .7 2 .4275512 3 1 2 .4 1 .07071068 2 2 2 .5 1 .07071068 2 3 2 .03 2 .1852026 3 4 2 .2 2 .1852026 3 5 2 .4 2 .1852026 3 end

The data above is a simplified example. The real dataset has >1.000 funds and around 12 cluster variables.

Best,
Daniel
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#2

26 Apr 2022, 03:52

To count the number of nonmissing observations in a group you can do

Code:

egen n = count(return) by(cluster_variable t)

The user written package -egenmore-, type

Code:

findit egenmore

and follow instructions to install, includes the function - semean(exp), by(group) - which seems to be what you are asking for.
1 like
Comment
Daniel Gilcher

Join Date: Apr 2022

Posts: 9
#3

26 Apr 2022, 10:36

Thanks a lot Joro, very much appreciated. I am aware of the count command but couldn't figure out how to make it work for a specific non-missing value. I will try your recommendation and report.
Comment
Daniel Gilcher

Join Date: Apr 2022

Posts: 9
#4

26 Apr 2022, 11:20

Hi Joro, after installing -egenmore- I was able to create the required variable with egen obs = semean(return), by (cluster_variable t). I verfied the results by hand, it yields correct values. Thanks very much again for your help. The matter is resolved.
1 like
Comment

Announcement

Clustered standard errors for a single variable in panel data

Comment

Comment

Comment