Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Insignificant F-test, F-test in stricter regression even drops out - still able to interpret coefficients?

    Dear Statalist,

    I kindly ask your opinion on my regressions and whether I can interpret the coefficients. In the first regression, the dependent variable measures simulated price changes of at least 10. The model is statistically insignificant, as you can see from my Prob > F equal to 0.1262. However, I would like to know whether I can say anything regarding their coefficients, e.g., it is interesting that the treatment's significance now drops out. In contrast, it was statistically significant with smaller deviations, such as 2 and 5.

    May I also ask if I need to alert the reader that my F-test in the first case is insignificant and missing in the second case?

    I am clustering my standard errors on the individual level, as I have repeated observations of the same individual. I am aware of panel data commands (xtset, xtreg), however, they are not commonly used in my Finance research department, and we are encouraged to use OLS but cluster standard errors at the individual level.

    Code:
    Linear regression                               Number of obs     =        652
                                                    F(17, 379)        =       1.41
                                                    Prob > F          =     0.1262
                                                    R-squared         =     0.0405
                                                    Root MSE          =     .49006
    
                                     (Std. err. adjusted for 380 clusters in CASE)
    ------------------------------------------------------------------------------
                 |               Robust
    behavior~10 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             T_C |
        6 years  |    .026645   .0504043     0.53   0.597    -.0724622    .1257522
                 |
     return_year |
           2021  |    -.02211   .0688569    -0.32   0.748    -.1574993    .1132793
           2020  |  -.1389711   .1048867    -1.32   0.186    -.3452038    .0672617
           2019  |  -.0250764    .068438    -0.37   0.714    -.1596421    .1094894
           2018  |  -.0592463    .061312    -0.97   0.335    -.1798007     .061308
           2017  |  -.0397833   .0863172    -0.46   0.645    -.2095039    .1299374
           2016  |  -.0768234   .1467341    -0.52   0.601    -.3653382    .2116914
           2015  |  -.1608509   .1003739    -1.60   0.110    -.3582104    .0365087
           2014  |   .1624146   .1786289     0.91   0.364    -.1888132    .5136423
           2013  |   -.037806   .0607149    -0.62   0.534    -.1571863    .0815742
                 |
            risk |
              2  |   .0063166    .042302     0.15   0.881    -.0768594    .0894926
              3  |   .0339861    .053213     0.64   0.523    -.0706436    .1386157
              4  |  -.1049715    .061563    -1.71   0.089    -.2260192    .0160763
                 |
           round |
              2  |   .0192849   .0522453     0.37   0.712    -.0834421     .122012
              3  |   .0145904    .049937     0.29   0.770    -.0835978    .1127786
              4  |  -.0097286   .0521156    -0.19   0.852    -.1122006    .0927434
                 |
        1.male |   .1529302   .0480718     3.18   0.002     .0584094    .2474509
           _cons |   .3655178   .0774047     4.72   0.000     .2133214    .5177142
    ------------------------------------------------------------------------------
    Code:
    Linear regression                               Number of obs     =        342
                                                    F(14, 252)        =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.0454
                                                    Root MSE          =     .49311
    
                                     (Std. err. adjusted for 253 clusters in CASE)
    ------------------------------------------------------------------------------
                 |               Robust
    behavior~15 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             T_C |
        6 years  |  -.0176953   .0852887    -0.21   0.836    -.1856648    .1502741
                 |
     return_year |
           2021  |  -.0076493    .100601    -0.08   0.939    -.2057752    .1904765
           2020  |   -.252593   .1335985    -1.89   0.060    -.5157049     .010519
           2019  |   .0225698   .1040235     0.22   0.828    -.1822965    .2274361
           2018  |  -.0421317   .0892726    -0.47   0.637    -.2179471    .1336837
           2017  |   .0395145   .1178768     0.34   0.738    -.1926348    .2716638
           2016  |  -.3958689   .0976411    -4.05   0.000    -.5881654   -.2035724
           2013  |  -.0702231   .0880004    -0.80   0.426    -.2435331    .1030869
                 |
            risk |
              2  |    .053997   .0608965     0.89   0.376    -.0659339    .1739279
              3  |   .0688219   .0839197     0.82   0.413    -.0964514    .2340951
              4  |  -.0984557   .0998575    -0.99   0.325    -.2951173    .0982058
                 |
           round |
              2  |  -.0749698   .0757854    -0.99   0.323    -.2242233    .0742837
              3  |   .0201311   .0708838     0.28   0.777     -.119469    .1597311
              4  |   .0572367   .0771545     0.74   0.459     -.094713    .2091864
                 |
        1.male |   .0936874   .0624712     1.50   0.135    -.0293448    .2167196
           _cons |   .3948466   .1294512     3.05   0.003     .1399026    .6497907
    ------------------------------------------------------------------------------
    I understand, especially when analyzing changes of 15 that the model's F-stat drops out because I don't have that many observations that reach a change of 15 compared to smaller changes, such as 2 or 5. Would it be sufficient to interpret the coefficients while acknowledging the limitation of my data that not many observations reach a change of 15?

    Judging from this conversation (https://www.statalist.org/forums/for...ou-please-help), comments #10 and #11 especially, seem to indicate that a model test is not important (if my interpretation is correct). #14 calms me by saying that reviewers might not even acknowledge an overall insignificant model :
    Originally posted by Clyde Schechter View Post

    Regarding your concern about reviewers, you can never predict what reviewers will do. Some are very sharp; there are others who are both ignorant and unaware of their ignorance. Suffice it to say that unless the omnibus hypothesis test of all coefficients equaling 0 (a very bizarre hypothesis in your context, I think) is part of your research goal, there is no legitimate reason for a reviewer to challenge it. If you encounter that problem, I would recommend appealing to the editor to either override the reviewer on the matter or get another opinion.
    However, other sites, particularly this one: https://stats.stackexchange.com/ques...t-can-i-use-it

    say that looking any further in a model that has an insignificant F-test could be p-value hacking, which of course, I want to avoid.

    I am most grateful for any advice!

  • #2
    It is really difficult, if not impossible, to say what is going on with regard to the differences between the two models. Apart from changing the outcome variable, you have also removed some of the year variables among the predictors, and the sample size has dropped considerably. In fact, in the second model, you have reduced the sample size so low that you no longer have enough sample to calculate a model F statistic using clustered standard errors. So that truly makes it impossible to comment on the model F statistics.

    p-hacking is the testing of multiple hypotheses until you come up with a "statistically significant" result. It is scientifically wrong, and increasingly is coming to be regarded as scientific misconduct. But, frankly, if you are working in the paradigm of null hypothesis significance testing, you are already on the slippery slope unless you commit to a specific hypothesis you will test before you ever look at or touch your data. In fact, best practice is to commit your statistical analysis plan to writing before you ever get your hands on the data. Then you perform precisely those hypothesis tests that you committed to ahead of time, and no others. If peculiarities of the data make it impossible or infeasible to carry out the original plan (but not merely because you don't get the results you wanted), you can change the plan post hoc, but that requires detailed explanation of why the original plan had to be abandoned and justification for the alternative used. And any such alternative analysis is always suspect!

    With that context in mind, it will be very unusual if, in advance of looking at the data, you will say to yourself "I want to do a test of the omnibus null hypothesis for my model" because that hypothesis almost never corresponds to anything meaningful in the real world, at least not in the modern era where most analyses include adjustment for confounders (aka "control variables"). In fact, I would go so far as to say that for other than toy problems given as homework in a Regression 101 course, if you think that your overall F statistic is important you probably have misunderstood the problem, or you have serious misunderstandings of what hypothesis testing is. Yes, there are exceptions, but they are uncommon in real life and if you find yourself going down the overall model F-test route you really should run a quick double-check on your thinking. Usually the hypothesis you will set out in advance involves a single variable: the "treatment" or "intervention" or "policy change" or "event." And in that case you test only the coefficient of that variable and you just ignore the overall model F statistic. One a priori hypothesis and one test, specifically a test of that hypothesis. It's the antithesis of p-hacking.

    Comment


    • #3
      Dear Clyde,

      Thank you very much for your comments. The dependent variables experimentally measure the participants' behavior when presented with different changes. Here, I posted the changes of 10 and 15, as this is where I discovered problems.

      I have used exactly the same command:
      Code:
       regress behaviorpartic10 i.T_C i.return_numb i.risk i.round i.male, vce(cluster CASE)
      and
      Code:
       regress behaviorpartic15 i.T_C i.return_numb i.risk i.round i.male, vce(cluster CASE)
      The sample size reduction can be explained in that only a few observations truly reach a difference of 15, but the majority reaches a percentage point difference of 2, 5, and possibly 10. Likewise, I think some return years drop out because simply no observations were found in that year that were greater than 15.

      I really want to use the insignificant results from regression15 to say that with large deviations, none of the independent variables influence the participants' behavior.

      This is why I wonder if, in the paper, I can say anything regarding the insignificance of the independent variables in the regression or if I am not allowed to make any statements regarding the significance/insignificance of the coefficients because of the insignificant F-test. As the F-test is not reported when exporting the results from Stata to Latex, do I need to inform readers that the model is insignificant?

      Thank you in advance!

      Comment


      • #4
        The -regress- commands you show in #3 cannot have been the sources for the output shown in #1 because neither command contains a year variable. Or is return_numb some clone of return_year?

        I really want to use the insignificant results from regression15 to say that with large deviations, none of the independent variables influence the participants' behavior.
        You need to write down exactly what equations correspond to this sentence. One null hypothesis that is compatible with this sentence is that all of the coefficients of the independent variables are zero. If that is your null hypothesis, then it is the overall model F test that you need to use, not the individual coefficient test statistics. Even then, the non-significance of the F-test does not enable you to say that your data imply that all of the coefficients are zero, only that the data are compatible with that being the case. After all, you cannot "accept" the null hypothesis; you can only fail to reject it. (You might have been able to strengthen the conclusion to actually asserting the coefficients are zero, or, at least, smaller than what the alternative hypothesis entailed, had you pre-specified an alternate hypothesis and done a power calculation before you got the data.)

        On the other hand, that same sentence could be interpreted as proposing a series of hypothesis tests, one for each variable separately. The overall model F statistic is irrelevant to that. You would need to carry out the individual F-tests associated with each variable. Moreover, since there would be 5 such tests, you would need to apply a correction for multiple hypothesis testing if you want your overall type 1 error rate to remain at 0.05.

        Again, though, you were supposed to have made these decisions before you worked with the data. Evidently it is too late for that now, but bear it in mind for the future.


        Comment


        • #5
          You are correct: return_numb is return_year; I just changed the label on the output for a better understanding. I'm sorry for not being thorough enough.

          Regarding the equations, I am not looking for all coefficients to be zero (i.e., I am not interested in the F test); just individual coefficients would be fine, as in the individual coefficient test statistics.

          So, in this case, I would like to make the following statement: Neither the treatment condition (T_C), nor the return-year (return_numb), nor risk, nor round, nor being male has any statistical significance on the participants' behavior when changes are at least 10 - judging from the insignificant individual t-test coefficients.

          When analyzing the behavior in the presence of changes of 15, neither the treatment condition (T_C) nor the majority of return-year (return_numb) except for 2016 and 2020, risk, round, or being male has any statistical significance - again, judging from the individual insignificant t-test coefficients.

          Can I make these two statements?

          Why would I need individual F-tests in each variable (after thinking about it, I think it means, e.g., analyzing only the treatment condition in isolation of the other variables)? In contrast, the provided t-tests and the p-value analyze the significance of the treatment condition in the presence of the other independent variables.

          Now, you are asking me which test I need, and I think I would be fine with the provided t-stats and p-values, as the presence of the other independent variables does not bother me.

          Thank you for your note regarding power tests and proper planning; I have recently been made aware of these and will use them in future experiments. As you can probably tell, I am at the beginning of independent research and am most grateful for any guidance and advice.

          Comment


          • #6
            Why would I need individual F-tests in each variable (after thinking about it, I think it means, e.g., analyzing only the treatment condition in isolation of the other variables)? In contrast, the provided t-tests and the p-value analyze the significance of the treatment condition in the presence of the other independent variables.
            Let's just take the risk variable as an example. It has four levels, three of which appear in the outputs, and one (level 1) of which is omitted as the reference category. The significance or non-significance of any of the three coefficients is a test about the difference between that level and the omitted level. It does not tell you that that level of risk has no effect on outcome. It tells you that the data do not distinguish its effect from that of level 1: but since the effect of level 1 is not estimated (and is not estimable) it could be that both of them have an effect. If you want to say that risk has no effect on the outcome, then that requires a different approach, the one I recommended in #4:
            Code:
            testparm i.risk
            which will give you an F test with 3 numerator df.

            The same reasoning applies to all the other variables, with the exception of T_C, which has only two levels, so the difference between the two levels is, by definition, the the effect of that variable.

            All of that said, you have referred to this study as an "experiment," and your variable T_C has a name that suggests that it encodes treatment group vs control group. Normally in an experiment, we are usually interested only in the effect of the experimental treatment(s), and any other variables in the regression are included to adjust for their nuisance effects (either confounding, aka omitted variable bias, or to reduce outcome variance). So, I have to say I was surprised by your response in #5: I expected to read that all that mattered is the T_C variable, and everything else is just a "control variable."

            Comment


            • #7
              Thank you Clyde!
              I take it I need to run testparm with each independent variable after each regression in order to be able to state whether this effect alone overall has any influence on the participants' behavior? Do I then take the coefficient of the OLS regression or the testparm; i.e., i. male is statistically significant in behaviorpartic10; when I run
              Code:
               . testparm i.male
              
               ( 1)  1.male = 0
              
                     F(  1,   379) =   10.12
                          Prob > F =    0.0016
              How do I interpret the coefficient of 10.12? In the OLS regression (coefficient .15), I would have said men are 15 percentage points more likely than women to be impacted by price changes of at least 10.

              For completeness, I want to display all variables as well as their coefficients, and you are absolutely right, we are mainly interested in the treatment/ control variable.

              With large changes (as in 10 and 15), I want to establish when the treatment condition is no longer significant - as it is significant in smaller deviations. This is why I am trying to find out if I can rely on the insignificant coefficients in an insignificant model (as established by the F-test).

              Thank you so much for your help!

              Comment


              • #8
                The -testparm- results are purely hypothesis tests and have no information about effect size. Effect sizes are estimated by coefficients, and confidence intervals or standard errors provide information about the uncertainty in those estimates.

                When I wrote about using -testparm-, I forgot that you had another dichotomous variable, male, in there. There is no need to run -testparm- with dichotomies. The F-test you get from -testparm- is 100% equivalent to the t-test that you can read off the regression coefficient output itself. Notice that the p-value you get is 0.0016 for male. Looking at the regression output itself, that p-value is 0.002, which looks different only because in the regression output they round it to 3 decimal places whereas -testparm- gives you four. If you have little or nothing better to do, consider re-running the regression with -set pformat %5.4f- so that you get the p-values to 4 places and you will see you get .0016 there, too. An F-test with 1 numerator degree of freedom is the same thing as a t-test with the same number of denominator degrees of freedom. (And the F statistic itself is the square of the corresponding t-statistic.)

                The 10.12 you refer to is not a coefficient. It is the F-statistic itself, and it is analogous to the overall model F statistic. In general, an F statistic is analogous to a t-statistic when more than one degree of freedom is involved (and, as already noted, is equivalent to a t-statistic when there is just 1 degree of freedom being tested). When you do the -testparm- command with one of the 3 or more level variables, like risk or round, then you will get an F test of the null hypothesis that all of the levels jointly have no effect on the outcome variable. The F-statistic is analogous to a t-statistic, and you can interpret the p-value the same would you would for a t-test.

                With large changes (as in 10 and 15), I want to establish when the treatment condition is no longer significant - as it is significant in smaller deviations.
                This is a truly terrible idea. As you have already noted, the sample size and the number of model df change with the change parameter, so the change in statistical significance tells you NOTHING usable. Even in ideal circumstances (and your circumstances are about as far from ideal as you can get) the difference between statistically significant and not statistically significant is, itself, of no statistical importance. If you had an ideal situation in which the sample size and model variables were the same at all levels of the change factor, it would be nice to collect the T_C coefficient and its standard error (or 95% CI) at each interesting level of the change factor into a separate data set (or frame) and then tabulate, or, probably better, graph the coefficient on the vertical axis against the change factor on the horizontal axis, with error bars based on the standard error or confidence intervals. You might then want to study those results to get a sense of at what level of change factor does the T_C effect cross the threshold of real-world meaningfulness (a threshold that might be zero, but usually isn't). The point at which it crosses the threshold of statistical significance is of no importance at all, and, in your real situation with dependence of sample size and model df on the change factor, it isn't even interpretable.

                Comment


                • #9
                  Thank you Clyde. I take it you suggest I omit the 10 and 15 regressions from my paper and focus solely on the smaller deviations, where I have a significant F-test? As in, I need a significant overall model (as established by the F-stat) to interpret even insignificant coefficients even though I am interested in the individual coefficients, not the overall model?

                  Comment


                  • #10
                    No. I would probably be the last person in the world to ever advise somebody to select what to present based on what is statistically significant.

                    A research project is about posing one or more questions and seeking answers to them. The presentation should be driven by that structure. Each question corresponds to one (or sometimes more) analysis, and the results may provide the anticipated answer, the opposite answer, or be inconclusive. Non statistically significant results are in most circumstances (and, specifically, in yours as you have set them out here) inconclusive. It's that simple. Assuming you had some good reason to do these different analyses in the first place, you should present their results. The fact that you didn't get any statistically significant findings doesn't alter the fact that asking those questions was worthwhile. In presenting them, you acknowledge that the results were inconclusive.

                    And I certainly did not say that you need a significant overall model to interpret coefficients (significant or otherwise). On the contrary, you made a clear statement that the primary interest is in the individual variables. So, with apologies for not being crystal clear earlier on: ignore the overall model F-test--it's an irrelevant distraction. Don't waste another second thinking about it.

                    Comment


                    • #11
                      Thank you very much! I apologize for my misunderstanding and if I hurt you by saying you said something while you clearly did not say that! I am very grateful for your guidance!
                      To be on the safe side, since I would still like to report all analyses, can I say the following?

                      With large deviations of 10 and 15, I get insignificant individual coefficients with an overall insignificant or un-estimable model (denoted by the dots in F and Prob > F). This means I CANNOT conclude that any variables (e.g., the treatment condition) drop out, especially if they were significant at smaller deviations because I cannot compare the models of smaller deviations to the models of larger deviations, because the models do not have the same number of observations and clusters?

                      I.e., I can only say at large deviations of 10 and 15, I get inconclusive results as the variables are no longer significant, but I cannot draw comparisons? I could only to draw comparisons if the as you say "the sample size and model variables were the same" across the models. Since they are not the same, I can only make absolute statements (not relative to another model)?

                      Thank you so very much!

                      Comment


                      • #12
                        Yes, this is correct.

                        I apologize ... if I hurt you by saying you said something while you clearly did not say that!
                        No apology needed. In my view, if I say something and somebody else misunderstands it, the onus is on me to be clearer. I can remember a few occasions where despite several repeated efforts I could not get my point across to my interlocutor, and I was inclined to think that it was his problem at that point, not mine. But there is no circumstance like that where I would feel hurt.

                        Cheers!


                        Comment


                        • #13
                          Thank you very much for your support and guidance, Clyde! I appreciate your help greatly!

                          Comment


                          • #14
                            Dear Clyde,

                            Thank you very much for your help and guidance!

                            Please allow me to ask 1) how I would perform a correction for multiple hypothesis testing if I want my overall type 1 error rate to remain at 0.05 and 2) if this means I'm imposing a sort of overall F-test on my five separate hypotheses of the same model, i.e., I fail to accept the null hypothesis that jointly all five coefficients are equal to zero - i.e., they are different from zero?


                            Originally posted by Clyde Schechter View Post
                            The -regress- commands you show in #3 cannot have been the sources for the output shown in #1 because neither command contains a year variable. Or is return_numb some clone of return_year?

                            On the other hand, that same sentence could be interpreted as proposing a series of hypothesis tests, one for each variable separately. The overall model F statistic is irrelevant to that. You would need to carry out the individual F-tests associated with each variable. Moreover, since there would be five such tests, you would need to apply a correction for multiple hypothesis testing if you want your overall type 1 error rate to remain at 0.05.
                            Thank you so very much; I truly appreciate your help!

                            Comment


                            • #15
                              how I would perform a correction for multiple hypothesis testing if I want my overall type 1 error rate to remain at 0.05
                              There are a number of ways to do this. The simplest is the Bonferroni correction. Whatever p-value you get, multiply it by the number of tests to get a corrected p-value. (If the corrected p-value turns out to be > 1, just truncate to 1.0).

                              if this means I'm imposing a sort of overall F-test on my five separate hypotheses of the same model, i.e., I fail to accept the null hypothesis that jointly all five coefficients are equal to zero - i.e., they are different from zero?
                              I don't understand the question.

                              Comment

                              Working...
                              X