Estimating p-value from t-value in regression with cluster-robust standard errors

Sia Lee

Join Date: Dec 2018

Posts: 5
#1

Estimating p-value from t-value in regression with cluster-robust standard errors

10 Oct 2020, 04:43

Hi,

I have run a regression model with cluster-robust standard errors, using the following command:

regress comf1 i.BB i.UB i.gender age edu1 income1 i.hc, cluster (hc)

Below is the results that I get:

As far as I know, when the t-value is larger than 1.96 the p-value should be significant at p < .05.
But the above results suggest that this is not the case here. Observations of the above results suggest that the t-value should be 2.72 to get a p-value significant at .05.
I guess stata is doing some sort of adjustment here. But I have no idea what this adjustment is? and why is it doing this adjustment?
std.errors are already adjusted for 5 clusters in hc as per my command. But is it adjusting the p-value cutoff for t-values? I know confidence intervals are consistent with p-values but I want to know what is happening with p-value cutoff here.
It would be appreciated if someone can clarify what is happening.

Thank you.

Last edited by Sia Lee; 10 Oct 2020, 05:11.
Tags: p-value, regression, t-value
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#2

10 Oct 2020, 04:49

Stata is not doing any adjustments. You are doing some incorrect algebra of p-values.

The p-value in your table is calculated like this:

Code:

. dis 2*ttail(4,4.9) .00804399

because with 5 clusters you have 4 degrees of freedom, and you t-stat is 4.9. There are no adjustments, it is just the tail probability in the upper tail and in the lower tail (this is why I multiplied by 2).
1 like
Comment
Sia Lee

Join Date: Dec 2018

Posts: 5
#3

10 Oct 2020, 07:02

Hi Joro,

Thank you so much for the clarification.

Can you please refer me to some relevant sources where I can learn more about how p-value is calculated with the different numbers of clusters in cluster-robust standard errors?

I am just learning this and your guidance would be very much appreciated.

Thank you.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#4

10 Oct 2020, 07:27

Sia: You should never cluster with five clusters. Those standard errors have no justification. What is hc?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#5

10 Oct 2020, 07:28

Hi Sia, any book on statistics or econometrics,
e.g., Wooldridge, J. M. (any year you can find). Introductory econometrics: A modern approach. Nelson Education.
explains test statistics and p-values.

Another source is Gueorgui I. Kolev, Statalist: https://www.statalist.org/forums/for...tandard-errors :-)

This latter source goes more or less like this:

1. The t-distribution (t-statistics) and the standard normal distribution (z-statistics) are pretty much indistinguishable for more than 30 degrees of freedom.

2. When you calculate robust and/or clustered variances, you can just assume that you have more than 30 degrees of freedom, and use the normal distribution.

3. Or you can assume that your clusters become your observations, and then the degrees of freedom become your clusters minus one (not much rigorous theory behind this, but this is what Stata does, as you saw above what I showed you in your example).

Then whether you use t or z statistics, the statistic has always the form t/z = (parameter estimate)/s.e.(parameter estimate).

You read the p-value from the relevant distribution of this t/z statistic you have assumed.

If you assume t distribution, you do [(Number of Clusters) - 1] = DF. Then to calculate the p-value you do 2*ttail(DF,t), where DF is the degrees of freedom you calculated, and t is the t-statistic you calculated.

If you assume normal distribution, you do not have degrees of freedom and you calculate your p-value = 2*normal(-abs(z)), where z is the t/z statistic you have calculated.

Originally posted by Sia Lee View Post

Hi Joro,

Thank you so much for the clarification.

Can you please refer me to some relevant sources where I can learn more about how p-value is calculated with the different numbers of clusters in cluster-robust standard errors?

I am just learning this and your guidance would be very much appreciated.

Thank you.
Comment
Sia Lee

Join Date: Dec 2018

Posts: 5
#6

10 Oct 2020, 07:49

Originally posted by Jeff Wooldridge View Post

Sia: You should never cluster with five clusters. Those standard errors have no justification. What is hc?

Hi Jeff,

hc is household composition type. I have 5 categories of household composition: single person family, adult couple, family with children, family with adults, shared house.

I have observation that significant difference in error variance across those household composition type.

Please advise me if you have a suggestion for a better model.

Thank you.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#7

10 Oct 2020, 08:09

Sia, your clustering is nonsensical. You do not cluster for heteroskedasticity ("significant difference in error variance across those household composition type", the option -robust- will take care of this), you cluster for correlation among observations within clusters. E.g., clustering at the level of the household makes sense because outcomes of household members are correlated. Clustering at the level of a class room makes sense, because students in a classroom share teachers and have impact on each other, etc.

Originally posted by Sia Lee View Post

Hi Jeff,

hc is household composition type. I have 5 categories of household composition: single person family, adult couple, family with children, family with adults, shared house.

I have observation that significant difference in error variance across those household composition type.

Please advise me if you have a suggestion for a better model.

Thank you.
Comment
Sia Lee

Join Date: Dec 2018

Posts: 5
#8

10 Oct 2020, 08:36

Originally posted by Joro Kolev View Post

Sia, your clustering is nonsensical. You do not cluster for heteroskedasticity ("significant difference in error variance across those household composition type", the option -robust- will take care of this), you cluster for correlation among observations within clusters. E.g., clustering at the level of the household makes sense because outcomes of household members are correlated. Clustering at the level of a class room makes sense, because students in a classroom share teachers and have impact on each other, etc.

Hi Joro,

Just to make sure one thing. My DV is food discard amount and this can be correlated with different levels of household composition. For example, household with children would discard more food than single person household.

Can this be used as a justification for clustering at the household composition level?

Thank you again for helping me through this process.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#9

10 Oct 2020, 09:13

Whatever the economic justification, the basic problem remains, as Prof. Wooldridge has pointed out, that with only five groups you cannot estimate cluster robust standard errors.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#10

11 Oct 2020, 01:09

No Sia, the fact that "household with children would discard more food than single person household" means that you should include in your regression household composition (dummy variables probably for different household types). It does not mean that you should cluster at household composition level.

So I guess a fine model in this case would be

regress comf1 i.BB i.UB i.gender age edu1 income1 i.hc, robust

Originally posted by Sia Lee View Post

Hi Joro,

Just to make sure one thing. My DV is food discard amount and this can be correlated with different levels of household composition. For example, household with children would discard more food than single person household.

Can this be used as a justification for clustering at the household composition level?

Thank you again for helping me through this process.
Comment

Announcement

Estimating p-value from t-value in regression with cluster-robust standard errors

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment