Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression: Difference i.cohort and cohort==x

    Dear all,

    I am currently analyzing how much similarity mother and father (spouses) show in their years of schooling, that is how high is the degree of assortative mating.

    I have nine different birth cohorts in my data, each spanning four years. Thereby I want to use the opportunity to observe this behavior over time.

    I first use a simple linear regression

    Code:
    reg years_schooling_G3_mother years_schooling_G3_father if (...) & cohort ==1
    reg years_schooling_G3_mother years_schooling_G3_father if (...) & cohort ==2
    reg years_schooling_G3_mother years_schooling_G3_father if (...) & cohort ==5
    for all nine birth cohorts.

    Then I estimate a second regression

    Code:
    reg years_schooling_G3_mother years_schooling_G3_father i.cohort if (...)
    reg years_schooling_G3_mother years_schooling_G3_father cohort if (...)
    However, I am quite surprised by the results. While the first model shows lower coefficients for higher cohorts, the second and third show (as far as I understood) increasing coefficients over cohorts.

    Can anybody explain my misunderstanding? As all coefficients are significant, this cannot be the answer and I might have a logical error...


    Thank you!



    Code:
    ------------------------------------------------------------
                          (1)             (2)             (3)  
                 ~g_G3_mother    ~g_G3_mother    ~g_G3_mother  
    ------------------------------------------------------------
    ~g_G3_father        0.491***        0.467***        0.459***
                      (18.89)         (19.88)         (21.70)  
    
    _cons               5.108***        5.457***        5.911***
                      (19.69)         (22.87)         (25.85)  
    ------------------------------------------------------------
    N                    2522            3061            3623  
    ------------------------------------------------------------
    t statistics in parentheses
    * p<0.05, ** p<0.01, *** p<0.001

    Code:
    -------------------------------------------------------------------------------------------
    years_schooling_G3_mother |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------------------+----------------------------------------------------------------
    years_schooling_G3_father |   .4817556   .0055682    86.52   0.000     .4708416    .4926697
                              |
                       cohort |
                           2  |   .0991928   .0360512     2.75   0.006     .0285298    .1698558
                           3  |   .2796893   .0340529     8.21   0.000     .2129429    .3464356
                           4  |   .4186461   .0335497    12.48   0.000     .3528862    .4844059
                           5  |   .4601548   .0348339    13.21   0.000     .3918777    .5284318
                           6  |   .4614515   .0399617    11.55   0.000     .3831234    .5397795
                           7  |   .4049299   .0484335     8.36   0.000     .3099966    .4998631
                           8  |   .1815389   .0696623     2.61   0.009     .0449956    .3180822
                           9  |   -.000846    .155064    -0.01   0.996    -.3047831    .3030911
                              |
                        _cons |   5.203955   .0639659    81.36   0.000     5.078577    5.329333
    -------------------------------------------------------------------------------------------


    Code:
    -------------------------------------------------------------------------------------------
    years_schooling_G3_mother |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    --------------------------+----------------------------------------------------------------
    years_schooling_G3_father |   .4873492   .0055614    87.63   0.000     .4764484      .49825
                       cohort |   .0705557   .0051374    13.73   0.000      .060486    .0806254
                        _cons |   5.177953   .0626206    82.69   0.000     5.055212    5.300694
    -------------------------------------------------------------------------------------------

  • #2
    You're estimating different things in your regressions:
    (1) You are running a simple regression across different samples (by cohort)
    (2) You are running over the entire sample but controlling for cohort (discrete) - you will have different interactions between each cohort and your other vars
    (3) You are running over the entire sample but controlling for cohort (continuous) - again you will have different interactions between each cohort and other vars

    In terms of the pattern, in method (1) your constant is increasing over time. This might help explain your obs under method (2) and (3)
    Last edited by Rhys Williams; 06 Apr 2021, 11:17. Reason: Further details

    Comment


    • #3
      Well, first I think you are not reading the output of the -regress .... i.cohort- model correctly. The effects seen do not show a consistent directional trend over time. Rather they seem to increase from cohorts 1 through 5, peaking at cohort 6, and then declining again in cohorts 7 through 9. These facts alone tell me that the use of a -regress .... cohort- (without i.) model would be a very bad specification of the cohort effect and should not be used.

      All of that said, the other thing you need to understand is that with -regress .... i.cohort-, you are not modeling changes in assortative mating over time. What you are capturing is changes in the overall level of schooling over time, but the model constrains the association between father and mother years of schooling to be the same in all cohorts.

      If you want to study whether the assortative mating varies across cohorts, then your first approach (cohort by cohort regression) is one way to do it. Another is a single regression with an interaction between cohort and father education:
      Code:
      regress years_schooling_G3_mother i.cohort##c.years_schooling_G3_father
      margins cohort, dydx(years_schooling_G3_father)
      Added: crossed with #2

      Comment


      • #4
        Maren:
        welcome to this forum.
        Rhys made an useful diagnosis of what's the matter with your model.
        Personally, I would go:
        Code:
        reg years_schooling_G3_mother years_schooling_G3_father i.cohort if (...)
        Eventualy, two asides:
        -please use CODE delimiters without major/minor surgery, so that interested listers can have the big picture;
        - trying different regression models looking for the most promising one is not that scientific: focus on the one that gives the fairest and truest view of the data generating process you're interested in, instead.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you all very much for your helpful answers! As Carlo said, I need to be more clear about the underlying data generation process and derive the appropriate model from it ...


          Clyde thank you very much for the clarification and suggestion. I also see now why the interaction term is appropriate in this case.

          As I haven't used margins before, am I right in interpreting the output of your suggestion:


          Code:
           regress years_schooling_G3_mother i.cohort##c.years_schooling_G3_father
          
                Source |       SS           df       MS      Number of obs   =    36,158
          -------------+----------------------------------   F(17, 36140)    =    997.89
                 Model |  26954.7788        17  1585.57522   Prob > F        =    0.0000
              Residual |  57423.6324    36,140  1.58892176   R-squared       =    0.3195
          -------------+----------------------------------   Adj R-squared   =    0.3191
                 Total |  84378.4112    36,157  2.33366737   Root MSE        =    1.2605
          
          ----------------------------------------------------------------------------------------------------
                   years_schooling_G3_mother |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -----------------------------------+----------------------------------------------------------------
                                      cohort |
                                          2  |   .1564449   .1703191     0.92   0.358    -.1773855    .4902753
                                          3  |  -.0780779   .1614139    -0.48   0.629    -.3944539    .2382981
                                          4  |  -.0310163   .1583773    -0.20   0.845    -.3414405     .279408
                                          5  |   .1041726   .1644846     0.63   0.527    -.2182221    .4265672
                                          6  |  -.1685443   .1855813    -0.91   0.364    -.5322892    .1952005
                                          7  |   .1490305   .2121998     0.70   0.482    -.2668874    .5649484
                                          8  |   .7102047   .2415963     2.94   0.003     .2366688    1.183741
                                          9  |   1.045491    .338933     3.08   0.002     .3811727     1.70981
                                             |
                   years_schooling_G3_father |   .4983694   .0121951    40.87   0.000     .4744667    .5222722
                                             |
          cohort#c.years_schooling_G3_father |
                                          2  |  -.0082846   .0165221    -0.50   0.616    -.0406684    .0240992
                                          3  |   .0270432   .0156023     1.73   0.083    -.0035378    .0576243
                                          4  |    .031492   .0152557     2.06   0.039     .0015905    .0613936
                                          5  |   .0221246   .0158127     1.40   0.162    -.0088686    .0531179
                                          6  |   .0503567    .017679     2.85   0.004     .0157053    .0850081
                                          7  |   .0184605   .0202436     0.91   0.362    -.0212174    .0581385
                                          8  |  -.0393446   .0230541    -1.71   0.088    -.0845314    .0058422
                                          9  |  -.0657494   .0319589    -2.06   0.040    -.1283897    -.003109
                                             |
                                       _cons |   4.838443    .125187    38.65   0.000     4.593073    5.083814
          ----------------------------------------------------------------------------------------------------
          
          .
          . margins cohort, dydx(years_schooling_G3_father)
          
          Average marginal effects                        Number of obs     =     36,158
          Model VCE    : OLS
          
          Expression   : Linear prediction, predict()
          dy/dx w.r.t. : years_schooling_G3_father
          
          -------------------------------------------------------------------------------------------
                                    |            Delta-method
                                    |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
          --------------------------+----------------------------------------------------------------
          years_schooling_G3_father |
                             cohort |
                                 1  |   .4983694   .0121951    40.87   0.000     .4744667    .5222722
                                 2  |   .4900849   .0111471    43.97   0.000     .4682361    .5119336
                                 3  |   .5254127    .009732    53.99   0.000     .5063376    .5444877
                                 4  |   .5298615    .009166    57.81   0.000     .5118959     .547827
                                 5  |   .5204941   .0100658    51.71   0.000     .5007649    .5402233
                                 6  |   .5487262   .0127995    42.87   0.000     .5236388    .5738135
                                 7  |     .51683    .016158    31.99   0.000     .4851598    .5485001
                                 8  |   .4590248   .0195646    23.46   0.000     .4206777     .497372
                                 9  |   .4326201   .0295406    14.64   0.000     .3747196    .4905206
          -------------------------------------------------------------------------------------------
          as follows: up to cohort six, a higher cohort has a stronger correlation of fathers years of schooling to mothers years of schooling? The significance is only given under the margins estimation.



          By rethinking my approach I also tested for statistically different coefficients using the first regression model:

          Code:
           quietly reg years_schooling_G3_mother years_schooling_G3_father if generation ==3 & cohort ==1
          
          . est sto cohortone
          
          . quietly reg years_schooling_G3_mother years_schooling_G3_father if generation ==3 & cohort ==2
          
          . est sto cohorttwo
          
          . quietly reg years_schooling_G3_mother years_schooling_G3_father if generation ==3 & cohort ==5
          
          . est sto cohortfive
          
          . suest cohortone cohorttwo cohortfive , cluster(fid)
          
          Simultaneous results for cohortone, cohorttwo, cohortfive
          
                                                          Number of obs     =      9,206
          
                                                       (Std. Err. adjusted for 4,673 clusters in fid)
          -------------------------------------------------------------------------------------------
                                    |               Robust
                                    |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          --------------------------+----------------------------------------------------------------
          cohortone_mean            |
          years_schooling_G3_father |   .4909852   .0259724    18.90   0.000     .4400801    .5418902
                              _cons |   5.107597   .2593029    19.70   0.000     4.599373    5.615822
          --------------------------+----------------------------------------------------------------
          cohortone_lnvar           |
                              _cons |   .4932834   .0485742    10.16   0.000     .3980797    .5884871
          --------------------------+----------------------------------------------------------------
          cohorttwo_mean            |
          years_schooling_G3_father |   .4671694   .0234861    19.89   0.000     .4211375    .5132012
                              _cons |   5.457492   .2385796    22.87   0.000     4.989885      5.9251
          --------------------------+----------------------------------------------------------------
          cohorttwo_lnvar           |
                              _cons |   .5627219   .0380087    14.81   0.000     .4882263    .6372175
          --------------------------+----------------------------------------------------------------
          cohortfive_mean           |
          years_schooling_G3_father |   .4590011   .0211425    21.71   0.000     .4175626    .5004396
                              _cons |   5.911131   .2286085    25.86   0.000     5.463067    6.359196
          --------------------------+----------------------------------------------------------------
          cohortfive_lnvar          |
                              _cons |    .635967   .0292302    21.76   0.000     .5786769    .6932572
          -------------------------------------------------------------------------------------------
          
          .
          . test [cohortone_mean]years_schooling_G3_father - [cohorttwo_mean]years_schooling_G3_father =0
          
           ( 1)  [cohortone_mean]years_schooling_G3_father - [cohorttwo_mean]years_schooling_G3_father = 0
          
                     chi2(  1) =    0.46
                   Prob > chi2 =    0.4964
          
          .
          . test [cohortone_mean]years_schooling_G3_father - [cohortfive_mean]years_schooling_G3_father =0
          
           ( 1)  [cohortone_mean]years_schooling_G3_father - [cohortfive_mean]years_schooling_G3_father = 0
          
                     chi2(  1) =    0.91
                   Prob > chi2 =    0.3396

          And I am interpreting this as there is no statistically significant difference of the father to mother relation over the cohorts, am I right in this?



          I would be very thankful if you could clarify on the interpretation. The model choice might need some deeper understanding of the data itself from my side...

          Last edited by Maren Haug; 06 Apr 2021, 13:30. Reason: clarification on interaction term

          Comment


          • #6
            up to cohort six, a higher cohort has a stronger correlation of fathers years of schooling to mothers years of schooling?
            More or less. There is a little downward wiggle from cohort 4 to cohort 5, but it's tiny and I wouldn't worry about it.

            The significance is only given under the margins estimation.
            I'm not sure what you mean by this. The outputs of that -margins- command are the only place where the marginal effect of father's education are calculated. If you are comparing the p-values of those results to the p-values of the cohort coefficients or cohort#years_schooling_G3_father in the regression output, those are different things and are not directly comparable.

            And I am interpreting this as there is no statistically significant difference of the father to mother relation over the cohorts, am I right in this?
            Well, what you show says that the differences between cohort 2 and cohort 1 and between cohort 5 and cohort 1 are not "statistically significant."

            Comment


            • #7
              Thank you, Clyde, for the clarification and also for reminding me to interpret only what I estimated and not quickly draw a general conclusion!

              Comment


              • #8
                I have one last question to your proposed suggestion, Clyde.


                In many papers, for example

                Pekkala, S. and Lucas, R.E., 2007. Differences across cohorts in Finnish intergenerational income mobility. Industrial Relations: A Journal of Economy and Society, 46(1), pp.81-111.
                (https://doi.org/10.1111/j.1468-232X.2007.00458.x)

                I find the same approach of using interaction terms however they are not using margins.

                Instead of:

                Code:
                 
                 regress years_schooling_G3_mother i.cohort##c.years_schooling_G3_father  margins cohort, dydx(years_schooling_G3_father)
                they estimate:

                Code:
                 
                 regress years_schooling_G3_mother i.cohort##c.years_schooling_G3_father
                The results do change a lot if considering the first or the second approach. I was not really able to find out why the margins command needs to be applied. I would be happy for an explanation or also some literature suggestions. Thanks!

                Comment


                • #9
                  Well, the effects of years_schooling_G3_father in each cohort are not directly shown in the -regress- output. If the papers you have read are not showing the -margins- command as part of their methods I can think of four possible explanations:

                  1. They have used the -margins- command but do not mention it.
                  2. They have performed calculations equivalent to what -margins- does using other methods such as a loop with -lincom- commands, and they may or may not bother to mention that.
                  3. They are reporting incorrect results based on misinterpreting their regression outputs.
                  4. They are reporting differences between the marginal effects in the different cohorts (which are the interaction coefficients in the regression output) without reporting the marginal effects themselves.

                  The link you have given is to the abstract only, and the article itself is behind a paywall, so I cannot tell you anything more about it.

                  Comment

                  Working...
                  X