Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linear regression, use Stata to estimate and predict

    Hello,

    Textbooks gave the formula to estimate and predict after linear regression. From "A Second Course in Statistics Regression Analysis 7E" by William Mendenhall, I read the formulas to estimate the population mean (for group p) and to predict the response variable (for group p). For previous thread "Confidence interval of variable derived from linear regression was different from by syntax -ci means-" and posts, I know that -margins- is not designed for this purpose. So how?

    Click image for larger version

Name:	Screen Shot 2018-01-03 at 9.29.38 PM.png
Views:	1
Size:	101.9 KB
ID:	1424239



  • #2
    Hello Tom. I think you'll find what you're looking for in this thread: Cheers,
    Bruce
    --
    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)

    Comment


    • #3
      Too difficult for me to understand. No built-in syntax?

      Comment


      • #4
        The formulas you present seem to be for a univariate regression of y on x.

        Your question, and your previous posts, suggest that your independent variable x is a categorical variable (which defines the groups that you are interested in) and you using factor variable notation to construct indicator variables from the independent variable.

        Thus, you are running a multivariate regression of y on a collection of indicator variables, and it is not clear to me that a formula for univariate regression will be relevant in this case.

        Comment


        • #5
          If indeed you are regressing on a categorical variable, then the margins command is precisely what you want in this case. Look at the example below.

          Consider the group with rep78=5. For every observation in that group, the predicted price will be _cons + 5.rep78, so that value will be the mean predicted price for that group. We can use the lincom command to calculate _cons + 5.rep78. But in this case, since we have no covariates to adjust for, the margins command will give that result as well.

          Code:
          . sysuse auto, clear
          (1978 Automobile Data)
          
          . drop if rep78==. | rep78==1
          (7 observations deleted)
          
          . regress price i.rep78
          
                Source |       SS           df       MS      Number of obs   =        67
          -------------+----------------------------------   F(3, 63)        =      0.12
                 Model |  3208652.94         3  1069550.98   Prob > F        =    0.9489
              Residual |   568163356        63  9018465.96   R-squared       =    0.0056
          -------------+----------------------------------   Adj R-squared   =   -0.0417
                 Total |   571372009        66  8657151.65   Root MSE        =    3003.1
          
          ------------------------------------------------------------------------------
                 price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 rep78 |
                    3  |   461.6083   1194.958     0.39   0.701    -1926.324    2849.541
                    4  |    103.875   1276.062     0.08   0.935    -2446.131    2653.881
                    5  |    -54.625    1395.41    -0.04   0.969    -2843.129    2733.879
                       |
                 _cons |   5967.625   1061.748     5.62   0.000     3845.891    8089.359
          ------------------------------------------------------------------------------
          
          . lincom _cons + 5.rep78
          
           ( 1)  5.rep78 + _cons = 0
          
          ------------------------------------------------------------------------------
                 price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   (1) |       5913   905.4615     6.53   0.000      4103.58     7722.42
          ------------------------------------------------------------------------------
          
          . margins i.rep78
          
          Adjusted predictions                            Number of obs     =         67
          Model VCE    : OLS
          
          Expression   : Linear prediction, predict()
          
          ------------------------------------------------------------------------------
                       |            Delta-method
                       |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                 rep78 |
                    2  |   5967.625   1061.748     5.62   0.000     3845.891    8089.359
                    3  |   6429.233   548.2842    11.73   0.000     5333.575    7524.892
                    4  |     6071.5   707.8318     8.58   0.000     4657.011    7485.989
                    5  |       5913   905.4615     6.53   0.000      4103.58     7722.42
          ------------------------------------------------------------------------------
          
          .
          Note again that this is based on your post #1, which sets the stage as estimating and predicting after linear regression. That is different than in your previous topic, which seemed to be about obtaining confidence intervals for the mean value of the observed price within each group, under the assumption that the data were distributed normally within each group.

          Code:
          . bysort rep78: ci means price
          
          -----------------------------------------------------------------------------------------------
          -> rep78 = 2
          
              Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
          -------------+---------------------------------------------------------------
                 price |          8    5967.625    1265.494        2975.208    8960.042
          
          -----------------------------------------------------------------------------------------------
          -> rep78 = 3
          
              Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
          -------------+---------------------------------------------------------------
                 price |         30    6429.233    643.5995        5112.924    7745.542
          
          -----------------------------------------------------------------------------------------------
          -> rep78 = 4
          
              Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
          -------------+---------------------------------------------------------------
                 price |         18      6071.5    402.9585        5221.332    6921.668
          
          -----------------------------------------------------------------------------------------------
          -> rep78 = 5
          
              Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
          -------------+---------------------------------------------------------------
                 price |         11        5913    788.6821        4155.707    7670.293
          
          .
          Loosely speaking, the difference between the two approaches is that regress treats all observations as having the same variance and distinct means, while ci is treating each group as having a distinct variance and distinct mean.

          Comment


          • #6
            Originally posted by William Lisowski View Post
            The formulas you present seem to be for a univariate regression of y on x.

            Your question, and your previous posts, suggest that your independent variable x is a categorical variable (which defines the groups that you are interested in) and you using factor variable notation to construct indicator variables from the independent variable.

            Thus, you are running a multivariate regression of y on a collection of indicator variables, and it is not clear to me that a formula for univariate regression will be relevant in this case.
            Yes, the formula in the figure only computes the conditional mean and predict interval of response variable in univariate linear regression or the simple linear regression. Actually in my case, I did a multiply linear regression. I am interested in know the conditional mean and predict interval of response variable when predictor variables are set to specific values.

            Tom

            Comment


            • #7
              Originally posted by Tom Hsiung View Post
              I am interested in know the conditional mean and predict interval of response variable when predictor variables are set to specific value.
              Then as I wrote in post #5, the margins command is the tool you need to use.

              Comment


              • #8
                Originally posted by William Lisowski View Post
                Then as I wrote in post #5, the margins command is the tool you need to use.
                Sorry, I mislead you. I want to know the conditional mean and its 95% confidence interval, in addition to the prediction interval. Looks like that the -margins- would display the observed mean ("yhat") and the prediction interval. The 95% confidence interval is derived from the observed mean ("yhat"). Thank you.

                And is it possible to compute the absolute difference in the conditional mean between two groups?

                PS: margins r.variable outputs the absolute difference in the prediction values between two groups?

                Comment

                Working...
                X