Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting p-val from F-test with non-standard specification

    I have two regressions using common experimental panel data collected at the subject-period level for three periods. The first regression specification is a limited version of the second. The data are organized into two levels of random assignment. The first level is to one of four "thresholds", call them 100 200 300 400. The second level is to one of three "discounts", call them 0 50 75. The outcome of interest is calculated in gallons.

    The first regression includes dummy variables for each treatment group. The second one includes interactions:

    The number of subjects is in the hundreds of thousands and clustering occurs at the subject level.

    Code:
    g absorb constant = 1
    
    *Model 1: simple model
    reghdfe gallons thresh_100_D thresh_200_D thresh_300_D thresh_400_D disc_0_D disc_50_D disc_75_D, absorb(absorb_constant) vce(cluster subject)
    
    *Model 2: interaction model
    reghdfe gallons h_100_0D h_100_50D h_100_75D h_200_0D h_200_50D h_200_75D h_300_0D h_300_50D h_300_75D h_400_0D h_400_50D h_400_75D ///
    absorb(absorb_constant) vce(cluster subject)
    In an effort to show that the simpler model is "equivalent" to the interaction model, I want to calculate p-values from an F-test with the following definition:

    SS_simple = sum of squared residuals for simple model
    SS_full = sum of squared residuals for interaction model
    df_simple = degrees of freedom from the simple model
    df_full = degrees of freedom from the full model
    s = df_simple - df_full


    F = ((SS_simple - SS_full) / s) / (SS_full / df_full)

    I would imagine that i can determine that the two models are different if F is greater than the 1 - a percentile in the F(s, df_full) distribution where a is is the level of significance.

    --

    I would like to save these SS, df and relevant critical values when calculating the regressions, then do the f-tests/pval afterward. I have two questions:

    1) How can I get the SS, df, and cricitical values from the regression output
    2) Is it possible to extract the p-val from the F test specified above in stata?

    Please let me know if I can clarify anything at all and I would be happy to do so.

  • #2
    Revising the code

    Code:
    g absorb_constant = 1
     *Model 1: simple model
    
    reghdfe gallons thresh_100_D thresh_200_D thresh_300_D thresh_400_D disc_0_D disc_50_D disc_75_D, absorb(absorb_constant) vce(cluster subject)
    
    global SS_simple = e(rss)
    global df_simple = e(df_r)  
    
    *Model 2: interaction model
    
    reghdfe gallons h_100_0D h_100_50D h_100_75D h_200_0D h_200_50D h_200_75D h_300_0D h_300_50D h_300_75D h_400_0D h_400_50D h_400_75D /// absorb(absorb_constant) vce(cluster subject)
    
    global SS_full = e(rss) g
    lobal df_full = e(df_r)
    The link below directed me to the command invFtail which can help me arrive at the critical values I need in order to determine significance.

    https://www.stata.com/statalist/arch.../msg00261.html

    So, hypothetically, I should be able to compute the following with positive values being significant and negative values being insignificant.

    Code:
    di ( ( ($SS_res - $SS_full) / ($df_res - $df_full ) ) / ( $SS_full / $df_full ) ) - invFtail($df_full , $df_res, 0.01)
    If I'm thinking about this correctly then I have a problem because my regressions have the same number of clusters (read: subjects) so the denominator of the first fraction is 0 and the F statistic blows up to infinity. Am I misinterpreting this process because of the clustering aspect of the model specification?
    Last edited by Jack Reimer; 30 Jul 2019, 15:35.

    Comment


    • #3
      Hi Jack,
      I think the problem comes from the assumptions in the model. The F statistic as you are trying to estimate is based on the assumption of homoskedasticity.
      I believe that, because you are clustering your standard errors, you should instead estimate an LM test.
      See for example, Chp 8 from Introductory Econometrics:A modern approach by Wooldridge. (this based on the 6th edition)


      Fernando

      Comment

      Working...
      X