Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating p-value from t-value in regression with cluster-robust standard errors

    Hi,

    I have run a regression model with cluster-robust standard errors, using the following command:

    regress comf1 i.BB i.UB i.gender age edu1 income1 i.hc, cluster (hc)

    Below is the results that I get:
    Click image for larger version

Name:	clustered robust standard errors regression.png
Views:	1
Size:	136.8 KB
ID:	1576485



    As far as I know, when the t-value is larger than 1.96 the p-value should be significant at p < .05.
    But the above results suggest that this is not the case here. Observations of the above results suggest that the t-value should be 2.72 to get a p-value significant at .05.
    I guess stata is doing some sort of adjustment here. But I have no idea what this adjustment is? and why is it doing this adjustment?
    std.errors are already adjusted for 5 clusters in hc as per my command. But is it adjusting the p-value cutoff for t-values? I know confidence intervals are consistent with p-values but I want to know what is happening with p-value cutoff here.
    It would be appreciated if someone can clarify what is happening.

    Thank you.
    Last edited by Sia Lee; 10 Oct 2020, 05:11.

  • #2
    Stata is not doing any adjustments. You are doing some incorrect algebra of p-values.

    The p-value in your table is calculated like this:

    Code:
    . dis 2*ttail(4,4.9)
    .00804399
    because with 5 clusters you have 4 degrees of freedom, and you t-stat is 4.9. There are no adjustments, it is just the tail probability in the upper tail and in the lower tail (this is why I multiplied by 2).

    Comment


    • #3
      Hi Joro,

      Thank you so much for the clarification.

      Can you please refer me to some relevant sources where I can learn more about how p-value is calculated with the different numbers of clusters in cluster-robust standard errors?

      I am just learning this and your guidance would be very much appreciated.

      Thank you.

      Comment


      • #4
        Sia: You should never cluster with five clusters. Those standard errors have no justification. What is hc?

        Comment


        • #5
          Hi Sia, any book on statistics or econometrics,
          e.g., Wooldridge, J. M. (any year you can find). Introductory econometrics: A modern approach. Nelson Education.
          explains test statistics and p-values.

          Another source is Gueorgui I. Kolev, Statalist: https://www.statalist.org/forums/for...tandard-errors :-)

          This latter source goes more or less like this:

          1. The t-distribution (t-statistics) and the standard normal distribution (z-statistics) are pretty much indistinguishable for more than 30 degrees of freedom.

          2. When you calculate robust and/or clustered variances, you can just assume that you have more than 30 degrees of freedom, and use the normal distribution.

          3. Or you can assume that your clusters become your observations, and then the degrees of freedom become your clusters minus one (not much rigorous theory behind this, but this is what Stata does, as you saw above what I showed you in your example).

          Then whether you use t or z statistics, the statistic has always the form t/z = (parameter estimate)/s.e.(parameter estimate).

          You read the p-value from the relevant distribution of this t/z statistic you have assumed.

          If you assume t distribution, you do [(Number of Clusters) - 1] = DF. Then to calculate the p-value you do 2*ttail(DF,t), where DF is the degrees of freedom you calculated, and t is the t-statistic you calculated.

          If you assume normal distribution, you do not have degrees of freedom and you calculate your p-value = 2*normal(-abs(z)), where z is the t/z statistic you have calculated.




          Originally posted by Sia Lee View Post
          Hi Joro,

          Thank you so much for the clarification.

          Can you please refer me to some relevant sources where I can learn more about how p-value is calculated with the different numbers of clusters in cluster-robust standard errors?

          I am just learning this and your guidance would be very much appreciated.

          Thank you.

          Comment


          • #6
            Originally posted by Jeff Wooldridge View Post
            Sia: You should never cluster with five clusters. Those standard errors have no justification. What is hc?
            Hi Jeff,

            hc is household composition type. I have 5 categories of household composition: single person family, adult couple, family with children, family with adults, shared house.

            I have observation that significant difference in error variance across those household composition type.

            Please advise me if you have a suggestion for a better model.

            Thank you.

            Comment


            • #7
              Sia, your clustering is nonsensical. You do not cluster for heteroskedasticity ("significant difference in error variance across those household composition type", the option -robust- will take care of this), you cluster for correlation among observations within clusters. E.g., clustering at the level of the household makes sense because outcomes of household members are correlated. Clustering at the level of a class room makes sense, because students in a classroom share teachers and have impact on each other, etc.



              Originally posted by Sia Lee View Post

              Hi Jeff,

              hc is household composition type. I have 5 categories of household composition: single person family, adult couple, family with children, family with adults, shared house.

              I have observation that significant difference in error variance across those household composition type.

              Please advise me if you have a suggestion for a better model.

              Thank you.

              Comment


              • #8
                Originally posted by Joro Kolev View Post
                Sia, your clustering is nonsensical. You do not cluster for heteroskedasticity ("significant difference in error variance across those household composition type", the option -robust- will take care of this), you cluster for correlation among observations within clusters. E.g., clustering at the level of the household makes sense because outcomes of household members are correlated. Clustering at the level of a class room makes sense, because students in a classroom share teachers and have impact on each other, etc.




                Hi Joro,

                Just to make sure one thing. My DV is food discard amount and this can be correlated with different levels of household composition. For example, household with children would discard more food than single person household.

                Can this be used as a justification for clustering at the household composition level?

                Thank you again for helping me through this process.

                Comment


                • #9
                  Whatever the economic justification, the basic problem remains, as Prof. Wooldridge has pointed out, that with only five groups you cannot estimate cluster robust standard errors.

                  Comment


                  • #10
                    No Sia, the fact that "household with children would discard more food than single person household" means that you should include in your regression household composition (dummy variables probably for different household types). It does not mean that you should cluster at household composition level.

                    So I guess a fine model in this case would be

                    regress comf1 i.BB i.UB i.gender age edu1 income1 i.hc, robust


                    Originally posted by Sia Lee View Post

                    Hi Joro,

                    Just to make sure one thing. My DV is food discard amount and this can be correlated with different levels of household composition. For example, household with children would discard more food than single person household.

                    Can this be used as a justification for clustering at the household composition level?

                    Thank you again for helping me through this process.

                    Comment

                    Working...
                    X