Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Show the formula that the -predict- command uses?

    Hello everyone!

    I'm new here and also very new to STATA, although I really enjoy statistics. I have several projects in mind and I'm willing to learn some STATA to accomplish them, so I thought this forum would be a good idea.

    I'm working with an Excel Workbook which contains information on several past events. My goal is to be able to predict what's going to happen in the future events based on the information I have beforehand, so I take a regression (which I still have to figure out which one is the best). So I import my Excel into stata and execute reg Y X_1 ... X_n , where Y is the variable I want to predict and X_i all the variables I think Y depends of. Once the regression is done, I can execute predict Ypred and STATA will add to my Excel a column titled Ypred which will predict the Y outcome depending on all X_i. Now, here comes my question, what command can I run if I want to know what explicit formula STATA is using to generate the Ypred column? I know I can use the Coef. in the linear case, but not in the others. Is this possible to know?

    PS: If anyone knows how can I choose the best regression, any help is appreciated! (I heard about Lasso but don't really get it)

    Thanks very much beforehand to everyone.
    Daniel

  • #2
    in general, the appropriate section of the manual has a subsection on "methods and formulas" and that is the place to look; e.g., you used "reg" - type
    Code:
    help regress
    and click on "also see" (upper right) and then click on "regress post estimation" and then on "view complete pdf manual entry" and then on "methods and formulas"

    Comment


    • #3
      Hi Rich,

      Thanks for your quick reply.

      I've already visited that page and others similar which include all the general case theory behind each regression.
      My problem is that I want STATA to tell me the prediction formula used in my particular regression. It makes no sense having to calculate it on my own when STATA is using it.
      Maybe I didn't explain myself properly, I want to know how do I ask STATA for my particular case formula.

      Thanks again.

      Comment


      • #4
        If you take any particular regression -- here's a fairly silly one

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . regress price weight length
        
              Source |       SS           df       MS      Number of obs   =        74
        -------------+----------------------------------   F(2, 71)        =     18.91
               Model |   220725280         2   110362640   Prob > F        =    0.0000
            Residual |   414340116        71  5835776.28   R-squared       =    0.3476
        -------------+----------------------------------   Adj R-squared   =    0.3292
               Total |   635065396        73  8699525.97   Root MSE        =    2415.7
        
        ------------------------------------------------------------------------------
               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
              weight |   4.699065   1.122339     4.19   0.000     2.461184    6.936946
              length |  -97.96031    39.1746    -2.50   0.015    -176.0722   -19.84838
               _cons |   10386.54   4308.159     2.41   0.019     1796.316    18976.76
        ------------------------------------------------------------------------------
        -- then in the results Stata is almost telling you the formula it would use, as it's given by the Coef. column. So in this example

        10386.54 + 4.699065 weight - 97.96031 length

        is what it would use for prediction, almost. Why the "almost"? Because Stata would use more decimal places. Something like

        Code:
        gen double predicted = _b[_cons] + _b[weight] * weight + _b[length] * length
        would reach into Stata to get the coefficients in more detail. That would give you the predicted values, except that almost no-one ever does it this way, because as said that is what predict does.

        If that's not an answer, then I really don't understand the question. What's key is that Stata is hiding nothing of importance here. Why would there be any point to that?

        Comment


        • #5
          Thanks for such detailed information.

          That's what I to do with linear regressions. But for example, when I take a poisson, Coefs are no longer interpretable in the same way. That's why I was trying to look for a general command for STATA to return me the particular case instead of me learning how to interpret Coefs of every regression type.

          Comment


          • #6
            There is no utterly general command that I know of. For example, with a Poisson regression you are expected to know that you need to exponentiate the expression implied by the predictors and their coefficients.

            I have to say that the question now is much more general than that in #1. That's fine by me, but it is not surprising that you didn't get the answer you want if you didn't ask your real question.

            Please note https://www.statalist.org/forums/help#spelling

            Comment


            • #7
              Thanks for your help and sorry about that

              Comment


              • #8
                It would be probably possible to write a prefix-command which displays the required information before or after the prediction step. At least as I understand the question, some kind of database with information about the interpretation and prediction formulas would be enough.
                I like the idea of having a command which helps me interpret the output of other commands since you can easily forget some important details.
                Given the existing excellent documentation of Stata's own commands, I can understand why such a command does not exist.

                Comment

                Working...
                X