Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Plotting fitted values from regression

    Hello everyone,

    As a follow-up to my previous question on regression, I'm now trying to plot fitted values.

    Basically, I've done a regression on a variable called YearlyEarnings using 6 independent variables:
    Code:
    reg YearlyEarnings age88 age88sq civstatus education potexp potexpsq if status==0
    and now I would like to plot the fitted values generated from the results of the regression. This is what I'm unsure about how to do it though. I've tried using the predict command:
    Code:
    predict fitted_values
    and then plotting that over my potexp variable:
    Code:
    line fitted_values potexp
    This however produces a gazillion lines for me, which I assume is logical but unwanted. Logical because of (correct me if I'm wrong) the 6 independent variables but unwanted as I only want to see 1 line, the mean. Please find attached an image of the graph it is creating for at the moment.

    To elaborate, as you can see from my regression command, I'm running it for those observations for which status is equal to 0. Status is a variable that can take values of 0 and 1, and I actually want to have a graph that shows fitted values for both groups, to be able to compare them. I want to have the YearlyEarnings on the y-axis and potexp on the x-axis.

    I've been reading up on several commands, including -coefplot-, -lgraph- but I don't seem to get there.

    In order to make helping me easier, I figured I'd use the 1978 automobile dataset as an example. These are the commands I've used there:
    Code:
    reg price mpg trunk weight length if foreign==0
    predict fitted_values
    line fitted_values mpg
    This produces a graph like the second attachment, which is similar to the one produced for my own dataset. If anyone could tell me how to properly plot the fitted values for when foreign is equal to 0 and when foreign is equal to 1, I'm sure I can translate it to my own data.

    Thanks very much in advance. If anything remains unclear, please tell.
    Attached Files

  • #2
    Take a look at margins and marginsplot.

    Code:
    sysuse auto,clear
    reg price foreign##(c.mpg c.trunk c.length ) 
    margins, at( mpg=(10(5)40) foreign = (0 1)) atmeans noatlegend
    marginsplot, noci
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	15.2 KB
ID:	1298950

    Comment


    • #3
      Hey Scott,

      Thanks for your suggestion. The -margins- command might indeed be what I'm looking for, but I'm not sure. And while you're using an interaction term in your example regression, I'm not using any so I'll just keep that out.

      Anyway, here's some trouble I'm having. Firstly, since I'm not including the 'status'-variable in my (OLS) regression, I'm getting an error when running the -margins- command:
      Code:
      'status' not found in list of covariates
      . The error is logical, since I'm running separate regressions for both of my groups (so a separate regression for when status==0 and a separate regression when status==1). I'd rather keep it that way, unless that's going to be problematic in plotting fitted values.

      Secondly, I'm not all too sure when I said I "only want to see 1 line, the mean.". This is because I'm not only running an OLS regression, but also quantile regressions for the .25th percentile, the median and the .75th percentile. So in essence, I want 4 plots: one with the fitted values from the OLS regression, one with fitted values from the .25 quantile regression, one with fitted values from the median regression and one with fitted values from the .75 quantile regression. In every plot, I would like to see a graph for when status==0, and a graph for when status==1.

      Sorry for any inconvenience this has caused - I figured it would be easier by explaining it without the quantile regressions.



      The thing I'm wondering about also, since there's going to be fitted values for each observation in the case of the OLS regression, and fitted values for each observation in the corresponding quantile for the quantile regressions, how would Stata create one graph per regression? Let's say there are 6 observations in the OLS, and they have fitted values of 10; 12; 14; 16; 18; 20 respectively when potexp==1, will the datapoint in the graph be the average, 15? And what if for the median regression I have 6 observations, which have fitted values of 6; 7; 8; 9; 14; 16. At potexp==1 in the graph, will the datapoint be again the average (10), or the median (8.5)?

      I hope I've made myself clearer now.

      Many thanks in advance.

      Comment


      • #4
        The interaction term was used so that both models could be estimated simultaneously. The estimated coefficients will be the same. Compare:
        Code:
        sysuse auto,clear
        qui {
            reg price mpg trunk length if fore == 0
            est store r1
            reg price mpg trunk length if fore == 1
            est store r2
        
            reg price foreign##(c.mpg c.trunk c.length )
        }
        
        est table  r1 r2
        disp _b[mpg] + _b[1.foreign#c.mpg]
        disp _b[trunk] + _b[1.foreign#c.trunk]
        disp _b[length] + _b[1.foreign#c.length]
        disp _b[_cons] + _b[1.foreign]
        As to constructing the graphs, here is one way:
        Code:
        clear*
        sysuse auto
        qui {
        forv j = 0/1 {
        
            reg price mpg trunk length if fore == `j'
            est store r`j'
            sum trunk if e(sample)
            local meantrunk = r(mean)
            sum length if e(sample)
            local meanlength = r(mean)
            predictnl yhat_r`j'= _b[_cons] + _b[mpg]*mpg+ `meantrunk'*_b[trunk] + `meanlength'*_b[length] if e(esample)
        
            forv i = 25(25)75 {
                qreg price mpg trunk length if fore == `j', q(`i')
                est store q`j'_`i'
                predictnl yhat_q`j'_`i' = _b[_cons] + _b[mpg]*mpg+ `meantrunk'*_b[trunk] + `meanlength'*_b[length] if e(esample)
            }
        }
        }
        est table r* q*
        
        line yhat* mpg, lw(medthick..) lc(black black*.75 black*.5 black*.25 /// 
            blue blue*.75 blue*.5 blue*.25) /// 
            legend(order(1 "OLS 0" 2 "Q25 0" 3 "Q50 0"  4 "Q75 0" /// 
             5 "OLS 1" 6 "Q25 1" 7 "Q50 1"  8 "Q75 1") row(2) size(*.5))

        Comment

        Working...
        X