Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Prediction errors by groups

    Dear all

    Does any of you have and idea of how to calculate prediction errors or uncertainty by groups of observations?

    Say I have a dataset of individual observations in a country and I predict average grades (gpa) by the following linear regression:

    reg gpa age i.gender i.fathereducation
    predict yhat

    So by tabstat yhat, by(region) I get the predicted grades by region, but how do I calculate the s.e. or confidence/prediction interval by region?

    Thanks a lot

  • #2
    -tabstat- has a -statistics()- option that lets you specify which statistics you want calculated for each group. mean is one of them (and what you get by default if you don't specify anything) but it also has semean for the standard error. -help tabstat- for details.

    Another approach altogether would be to use the -ci- command for this. See -help ci-.

    Added: It dawns on me that this may not be what you want. Those approaches represent the variation in the prediction among the observations in each group, but they do not account at all for uncertainty in the regression coefficients themselves. So maybe what you really want wold be the output of -margins, over(region)- after your regression.
    Last edited by Clyde Schechter; 28 Nov 2017, 09:02.

    Comment


    • #3
      Hi Clyde.

      Thanks for your reply - yeah I guess the problem is that your first solution doesn't take the differences between predicted and actual values into account (residuals).

      I'll look into your addet suggestion. Is the margins a postestimation as well?

      Comment


      • #4
        Yes, -margins- is a postestimation command.

        Comment


        • #5
          So I've tried the -margins,over(region) - but it's not the same as tabstat yhat,by(region). The results are different...?

          Moreover I'm a Little confused about the s.e. from the margins postestimation - how are these s.e. different from the s.e. I get by my regress?

          Comment


          • #6
            Mikkel:
            another approach that springs to my mind is:
            Code:
            . sysuse auto.dta
            (1978 Automobile Data)
            
            . regress price mpg i.foreign
            
                  Source |       SS           df       MS      Number of obs   =        74
            -------------+----------------------------------   F(2, 71)        =     14.07
                   Model |   180261702         2  90130850.8   Prob > F        =    0.0000
                Residual |   454803695        71  6405685.84   R-squared       =    0.2838
            -------------+----------------------------------   Adj R-squared   =    0.2637
                   Total |   635065396        73  8699525.97   Root MSE        =    2530.9
            
            ------------------------------------------------------------------------------
                   price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494
                         |
                 foreign |
                Foreign  |   1767.292    700.158     2.52   0.014     371.2169    3163.368
                   _cons |   11905.42   1158.634    10.28   0.000     9595.164    14215.67
            ------------------------------------------------------------------------------
            
            . predict predict, xb
            
            . predict se, stdp
            
            . bysort foreign: list predict se if _n<=10
            
            ---------------------------------------------------------------------------------------------------------
            -> foreign = Domestic
            
                 +---------------------+
                 |  predict         se |
                 |---------------------|
              1. | 5433.114   371.2582 |
              2. | 6904.091   384.6718 |
              3. | 5433.114   371.2582 |
              4. | 6021.504   351.1114 |
              5. | 7492.482   442.0976 |
                 |---------------------|
              6. | 6609.896   365.4288 |
              7. | 4256.332   491.3017 |
              8. | 6021.504   351.1114 |
              9. | 7198.287   410.6212 |
             10. |   6315.7   353.9875 |
                 +---------------------+
            
            ---------------------------------------------------------------------------------------------------------
            -> foreign = Foreign
            
                 +---------------------+
                 |  predict         se |
                 |---------------------|
              1. | 8671.384   691.7728 |
              2. |  6906.21   548.5566 |
              3. | 6317.819   539.7479 |
              4. |  6906.21   548.5566 |
              5. | 3375.864   784.5907 |
                 |---------------------|
              6. | 6612.015   541.3127 |
              7. | 7494.602   579.0627 |
              8. | 7494.602   579.0627 |
              9. | 6317.819   539.7479 |
             10. | 5435.232   568.7454 |
                 +---------------------+
            
            
            .
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hi Carlo. Thanks for your contribution.

              I guess your suggestion doesn't report the s.e. on a group level (foreign)? or am I missing something here?

              Comment


              • #8
                Mikkel.
                yes, you're correct,
                You may want to try:
                Code:
                forval i = 0/1 {
                  2. mean predict se if foreign==`i'
                  3. }
                
                Mean estimation                   Number of obs   =         52
                
                --------------------------------------------------------------
                             |       Mean   Std. Err.     [95% Conf. Interval]
                -------------+------------------------------------------------
                     predict |   6072.423    193.515      5683.925    6460.921
                          se |   426.4441   13.84191      398.6553    454.2329
                --------------------------------------------------------------
                
                Mean estimation                   Number of obs   =         22
                
                --------------------------------------------------------------
                             |       Mean   Std. Err.     [95% Conf. Interval]
                -------------+------------------------------------------------
                     predict |   6384.682   414.6715      5522.325    7247.038
                          se |   636.3742     27.252      579.7006    693.0479
                --------------------------------------------------------------
                However, as you already noted, the SEs differ from the ones estimated via -margins-.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  thanks again. Yes it differs from -margins-, plus I'm not sure the averaged s.e. per Group is at good measure of the groups prediction error. It's not easy!

                  Comment


                  • #10
                    Mikkel:
                    yes, I share your concerns.
                    And that's why I wouldn't have followed your reasearch strategy.
                    I would have added an -i.region- predictor in the right-hand side of the regression equation, instead.
                    However, I'm sure you have good methodological reasons (or constraints) to act differently.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      that wouldn't do because I'm not looking for a "everything else being equal"-interpretation, it's for a benchmark analysis

                      Comment

                      Working...
                      X