Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • F-Test values

    Hello,
    I'm performing some quite simple linear regression analyses (with the command reg followed by the variables). However, the F-test value is always >99999.00. This fact is worrying me because I don't understand if this is a problem with the dataset itself (the dataset has around 3 million observations and 30 variables). I appreciate any indications that can be given to this case.
    Thank you!
    Best regards.
    Last edited by Kate Isabella; 17 Jun 2021, 06:14.

  • #2
    It is better if you copy and paste exactly what you typed at Stata, and exactly what Stata returned, rather than asking abstract questions.

    High F statistic is not in itself a sign of any problem that I know of.

    Comment


    • #3
      Of course, thank you. Since I always get the same value in F-Test regardless of the variables I use, I didn't know if it would be worrying to show such a value. Thank you for your attention.
      Attached Files
      Last edited by Kate Isabella; 17 Jun 2021, 07:28.

      Comment


      • #4
        This smells fishy, you might want to write to Stata Technical Support about this.

        In bivariate regression the F-test for overall significance is exactly equal to the t-statistic of the predictor raised to the power of 2. But this is not so in your regression as
        Code:
        . dis 684.99^2
        469211.3
        and t^2=469,211.3 is very different from F=99,999.

        Stata Corp might be capping our F-statistics at F=99,999 for our own good, because too much of a good thing is no good as it leads to excess and abuse :-).


        Here is how things should look normally:

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . reg price mpg
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(1, 72)        =     20.26
               Model |   139449474         1   139449474   Prob > F        =    0.0000
            Residual |   495615923        72  6883554.48   R-squared       =    0.2196
        -------------+----------------------------------   Adj R-squared   =    0.2087
               Total |   635065396        73  8699525.97   Root MSE        =    2623.7
        
        ------------------------------------------------------------------------------
               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
               _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
        ------------------------------------------------------------------------------
        
        . dis "t-stat for mpg Squared = " (_b[mpg]/_se[mpg])^2
        t-stat for mpg Squared = 20.258353
        as you see the t^2 = F in this bivariate regression as it should be.



        Comment


        • #5
          and t^2=469,211.3 is very different from F=99,999.

          The output shows F> 99,999. As 469,211.3 >99,999, there is nothing inconsistent. Just that Stata chooses not to print very large numbers in the output.

          Comment


          • #6
            Andrew is making a good point, that it does not say that F=99,999, but it rather says F>99,999, which is correct.

            Then the question is What do you see when you write after you fit your regression

            Code:
            . dis e(F)
            20.258353
            (this is after the regression in #4).

            Comment


            • #7
              Thank you very much for your help Mr. Joro Kolev, and once again Mr. Andrew Musau. I was worried that this value could be an indicator of something very wrong. From your experience at STATA, which is certainly far superior to mine, is it may be due to the structure of the dataset, as you explain to me in another topic Mr. Andrew Musau?

              Comment


              • #8
                Mr, Joro Kolev, after fit the regression, the output present 469213.01. From what I'm apprehending, although perhaps such a value is unusual, can I consider the values obtained in such regressions into mine analyses? I don't know if you have more tips about the way I should proceed, obviously taking into account your previous comment Mr. Joro Kolev
                Attached Files

                Comment


                • #9
                  I was worried that this value could be an indicator of something very wrong. From your experience at STATA, which is certainly far superior to mine, is it may be due to the structure of the dataset, as you explain to me in another topic Mr. Andrew Musau?

                  My comment on the data structure related to the error you got when declaring the panel identifier and time variable. As you are running OLS, you are not taking into account the panel structure of your data (if it is indeed a panel) - and therefore my comment does not apply. As you have over 3.5 million observations, the high F-value and the high significance of your coefficient is to be expected, so nothing to worry about. However, with only one regressor, there is a lot of variation in the outcome that you leave unexplained (approximately 88%). You probably want to find more independent variables to add to your model.
                  Last edited by Andrew Musau; 17 Jun 2021, 09:22.

                  Comment


                  • #10
                    Thank you very much Andrew Musau! I will try to find more independent variables to this model.
                    Last edited by Kate Isabella; 17 Jun 2021, 10:01.

                    Comment


                    • #11
                      I think everything is fine, you just have a huge sample and your predictor predicts so well the dependent variable that you get these huge t-statistics and F-statistics.

                      Carry on with your analysis, everything looks great.

                      Originally posted by Kate Isabella View Post
                      Mr, Joro Kolev, after fit the regression, the output present 469213.01. From what I'm apprehending, although perhaps such a value is unusual, can I consider the values obtained in such regressions into mine analyses? I don't know if you have more tips about the way I should proceed, obviously taking into account your previous comment Mr. Joro Kolev

                      Comment


                      • #12
                        Thank you very much for your help Mr. Joro Kolev!

                        Comment

                        Working...
                        X