Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to compare two linear regressions?

    Hello,
    First time post here. I stumbled across this forum as I searched Google for answer to my questions.

    I have just recently began to use Stata for analysis and self-learning through online courses. I am stuck trying to a specific analysis. I am not sure what the correct approach is to it and would appreciate any help.

    I have data that looking at trends for two events over the past 15 years. Let's say 'Event A' and 'Event B' from 2005-2020. I have performed a simple linear regression analysis using the 'regress' command. Both events are significantly increasing over time (p<0.05). Next, I want to compare the trends of the two events to observe if there is any significant difference in the change/rate of change for these two independent events over the time period. My first thought was to compare the slopes of the two linear regressions I have to each other, but I am not sure if this is the correct way. I could also be completely incorrect in this approach. Is there a better way to approach my question and can it be performed on Stata?

    Thank you in advance!

  • #2
    Your explanation isn't entirely clear to me. I'm imagining you have some data set that gives the number of A events and the number of B events in each year, and you have run two regressions, one for each of the event variables against time. If you want to compare the rates of growth (or decline) in the number of events of the two types, you can do that through the -suest- command. See -help suest- for details of how to do that.

    That said, depending on circumstances, particularly if the growth is non-linear, models other than the simple linear model of -regress- might be more appropriate for the purpose. Without more information it is impossible to say. But have you at least looked at graphs of the data to satisfy yourself that a linear model is sensible?

    Comment


    • #3
      Hello Dr. Schechter. Thank you for taking the time to answer my question. I am a medical student currently learning Stata and applications of statistics for clinical research, so I appreciate any help and advice you can offer.

      I have placed below the code using the -suest- command (the data is made up, but the overall trends is the same as the data I am analyzing). Is this the correct approach to compare the rate of change for the two trends?

      Code:
      
      . list time event1 event2
      
           +------------------------+
           | time   event1   event2 |
           |------------------------|
        1. | 2000       23        5 |
        2. | 2001       44       12 |
        3. | 2002       67       25 |
        4. | 2003       89       47 |
        5. | 2004      113       78 |
           |------------------------|
        6. | 2005      145      102 |
        7. | 2006      156      143 |
        8. | 2007      177      184 |
        9. | 2008      196      206 |
       10. | 2009      210      226 |
           |------------------------|
       11. | 2010      223      242 |
           +------------------------+
      
      
      . regress event1 time
      
            Source |       SS           df       MS      Number of obs   =        11
      -------------+----------------------------------   F(1, 9)         =    960.17
             Model |  46844.5455         1  46844.5455   Prob > F        =    0.0000
          Residual |  439.090909         9  48.7878788   R-squared       =    0.9907
      -------------+----------------------------------   Adj R-squared   =    0.9897
             Total |  47283.6364        10  4728.36364   Root MSE        =    6.9848
      
      ------------------------------------------------------------------------------
            event1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
              time |   20.63636   .6659776    30.99   0.000     19.12982    22.14291
             _cons |  -41244.73   1335.287   -30.89   0.000    -44265.36    -38224.1
      ------------------------------------------------------------------------------
      
      . estimates store event1
      
      . regress event2 time
      
            Source |       SS           df       MS      Number of obs   =        11
      -------------+----------------------------------   F(1, 9)         =    444.50
             Model |  77672.0818         1  77672.0818   Prob > F        =    0.0000
          Residual |  1572.64545         9  174.738384   R-squared       =    0.9802
      -------------+----------------------------------   Adj R-squared   =    0.9779
             Total |  79244.7273        10  7924.47273   Root MSE        =    13.219
      
      ------------------------------------------------------------------------------
            event2 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
              time |   26.57273   1.260369    21.08   0.000     23.72157    29.42388
             _cons |  -53162.86   2527.044   -21.04   0.000    -58879.43   -47446.29
      ------------------------------------------------------------------------------
      
      . estimates store event2
      
      
      . suest event1 event2
      
      Simultaneous results for event1, event2                     Number of obs = 11
      
      ------------------------------------------------------------------------------
                   |               Robust
                   | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
      event1_mean  |
              time |   20.63636   .6502892    31.73   0.000     19.36182    21.91091
             _cons |  -41244.73    1303.06   -31.65   0.000    -43798.68   -38690.78
      -------------+----------------------------------------------------------------
      event1_lnvar |
             _cons |   3.887482   .3840604    10.12   0.000     3.134737    4.640226
      -------------+----------------------------------------------------------------
      event2_mean  |
              time |   26.57273   1.281635    20.73   0.000     24.06077    29.08469
             _cons |  -53162.86   2571.367   -20.67   0.000    -58202.65   -48123.08
      -------------+----------------------------------------------------------------
      event2_lnvar |
             _cons |    5.16329   .2590664    19.93   0.000     4.655529    5.671051
      ------------------------------------------------------------------------------
      
      . test _b[event1_mean:_cons] = _b[event2_mean:_cons]
      
       ( 1)  [event1_mean]_cons - [event2_mean]_cons = 0
      
                 chi2(  1) =   16.15
               Prob > chi2 =    0.0001
      Thank you very much!

      Comment


      • #4
        You are fine until the very end. The -test- command you used tests whether the intercepts are the same for the two events. But you want to test equality of the time trends. That would be
        Code:
        test _b[event1_mean:time] = _b[event2_mean:time]
        That said, I urge you not to do this as a null hypothesis significance test. Think about it for a minute. Although I still don't know what these events are, wouldn't it be a remarkable coincidence if the rates for the two events were exactly equal? If your answer to that question is, yes, it would be a remarkable coincidence, then the null hypothesis is just a straw man. Even if the answer is no, it wouldn't be surprising at all given what these events are, wouldn't it be better to know what range of differences between the event rates is compatible with the data, rather than just a could be zero/probably isn't zero answer? So I suggest that you instead run the following command:
        Code:
        lincom _b[event1_mean:time] - _b[event2_mean:time]
        That will give you a point estimate of the difference between the two time trends along with a 95% confidence interval. (It will also give you the test statistic and p-value you would get from the -test- command, but I'm arguing that those are less informative than the estimate and confidence interval.)

        Comment


        • #5
          Yes, I agree the analysis will be more meaningful if I include the 95% CI, however, I didn't know what is and isn't possible to do with Stata. From our conversation, I have learned two new commands (-suest- and -lincom-) in addition to finding an answer to my question.. Thank you!

          Comment


          • #6
            Hello, would this approach work as well?

            Code:
             list time rate type
            
                 +----------------------+
                 | time   rate     type |
                 |----------------------|
              1. | 2000     23   event1 |
              2. | 2001     44   event1 |
              3. | 2002     67   event1 |
              4. | 2003     89   event1 |
              5. | 2004    113   event1 |
                 |----------------------|
              6. | 2005    145   event1 |
              7. | 2006    156   event1 |
              8. | 2007    177   event1 |
              9. | 2008    196   event1 |
             10. | 2009    210   event1 |
                 |----------------------|
             11. | 2010    223   event1 |
             12. | 2000      5   event2 |
             13. | 2001     12   event2 |
             14. | 2002     25   event2 |
             15. | 2003     47   event2 |
                 |----------------------|
             16. | 2004     78   event2 |
             17. | 2005    102   event2 |
             18. | 2006    143   event2 |
             19. | 2007    184   event2 |
             20. | 2008    206   event2 |
                 |----------------------|
             21. | 2009    226   event2 |
             22. | 2010    242   event2 |
                 +----------------------+
            
            . regress rate c.time##i.type
            
                  Source |       SS           df       MS      Number of obs   =        22
            -------------+----------------------------------   F(3, 18)        =    375.43
                   Model |  125877.036         3  41959.0121   Prob > F        =    0.0000
                Residual |  2011.73636        18  111.763131   R-squared       =    0.9843
            -------------+----------------------------------   Adj R-squared   =    0.9816
                   Total |  127888.773        21  6089.94156   Root MSE        =    10.572
            
            ------------------------------------------------------------------------------
                    rate | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                    time |   20.63636   1.007982    20.47   0.000     18.51867    22.75406
                         |
                    type |
                 event2  |  -11918.14   2858.136    -4.17   0.001    -17922.86   -5913.416
                         |
             type#c.time |
                 event2  |   5.936364   1.425502     4.16   0.001     2.941494    8.931233
                         |
                   _cons |  -41244.73   2021.007   -20.41   0.000    -45490.71   -36998.75
            ------------------------------------------------------------------------------
            Code:
             type#c.time |
                 event2  |   5.936364   1.425502     4.16   0.001     2.941494    8.931233
            Is this analysis different from how -suest- was done previously?

            Thank you.

            Comment


            • #7
              These approaches are equivalent. You have to understand how to interpret it. The coefficient of type#c.time event2 is the difference between the rates for event1 and event2. You can recover the same results for the event2 rate you got from -suest- by adding it to the coefficient for time (which is the rate for event1). -lincom- will calculate that sum for you, and give you the inferential statistics for it as well. Or, simpler, if you run -margins type, dydx(time)- you will get the rates for both event types in a single table. Welcome to the world of interaction models.

              Because you described your data in a way that led me to believe it was laid out as in #3, and you said you had already done the separate regressions, -suest- was the most straightforward approach. Had you told me that the data look like what you show in #6 and had not yet done any analysis, I would have recommended the approach in #6.

              One word of caution. These models are completely equivalent only because apart from type and time, you have no other explanatory variables in the model. If you did have some, you would have to interact type with all of them in order to get results that are equivalent to -suest- following separate regressions.

              Comment

              Working...
              X