Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sign flip when using different measure of same variable

    Hello!

    I am running a probit model where I use math scores in one specification and Chinese score in another specification as measures of the same variable.
    However, the problem is that in one specification another variable (land size) is positive and significant at the conventional levels and in another specification it is negative and significant.

    I am wondering if it may be possible that the fact of using these two different scores is changing the sign of the land size variable in the sense that the scores variable is correlated to the land size variable (not in theory but just a coincidence).

    Please note that the Chinese and math scores have both positive and negative values while the land size variable is always positive.
    The dependent variable is a binary one where 1= migration and 0 = no migration.

    Below is the estimation results:
    • With math score
    Code:
    Probit regression                               Number of obs     =      1,767
                                                    Replications      =      1,000
                                                    Wald chi2(19)     =     321.07
                                                    Prob > chi2       =     0.0000
    Log likelihood = -552.38318                     Pseudo R2         =     0.3151
    
    ----------------------------------------------------------------------------------------
                           |   Observed   Bootstrap                         Normal-based
                       M_P |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
    __________________ Income |   1.877601    .573929     3.27   0.001     .7527208    3.002481
        ______ Income squared |  -3.157901   1.203905    -2.62   0.009    -5.517512     -.79829
     ______________Test score |   .1218852   .0331745     3.67   0.000     .0568643     .186906
    _______Test score squared |  -.0078734   .0024398    -3.23   0.001    -.0126553   -.0030915
    _______ IncomeXTest score |  -.0901189   .0713313    -1.26   0.206    -.2299257    .0496879
    ________________Land size |   -.085268   .0131477    -6.49   0.000    -.1110371   -.0594989
                 Other Xs here
    ----------------------------------------------------------------------------------------
    Note: 9 failures and 0 successes completely determined.
    • with Chinese score
    Code:
    Probit regression                               Number of obs     =      1,767
                                                    Replications      =      1,000
                                                    Wald chi2(19)     =     242.84
                                                    Prob > chi2       =     0.0000
    Log likelihood = -248.33463                     Pseudo R2         =     0.6921
    
    -----------------------------------------------------------------------------------------
                            |   Observed   Bootstrap                         Normal-based
                        M_P |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------------+----------------------------------------------------------------
    __________________Income|   1.442293   .6379811     2.26   0.024     .1918728    2.692713
        ______Income squared |  -3.478919   1.687054    -2.06   0.039    -6.785485   -.1723529
    ______________Test score |   .3454658   .0360982     9.57   0.000     .2747147    .4162169
    _______Test score squared |  -.0140432   .0039948    -3.52   0.000    -.0218729   -.0062134
    _______IncomeXTest score |  -.1937659   .1092818    -1.77   0.076    -.4079543    .0204225
    ________________Land size|   .0557323   .0171623     3.25   0.001     .0220948    .0893699
                 
                   Other Xs here
    -----------------------------------------------------------------------------------------
    Note: 136 failures and 0 successes completely determined.
    Now, if I run the simple regressions where I only include the test score and land size variables, I get the following:
    • With Chinese score
    Code:
    reg Land_size_08 Test score
    
          Source |       SS           df       MS      Number of obs   =     1,767
    -------------+----------------------------------   F(1, 1765)      =     42.05
           Model |  1139.82304         1  1139.82304   Prob > F        =    0.0000
        Residual |  47840.4371     1,765  27.1050635   R-squared       =    0.0233
    -------------+----------------------------------   Adj R-squared   =    0.0227
           Total |  48980.2602     1,766  27.7351417   Root MSE        =    5.2063
    
    ---------------------------------------------------------------------------------------
             Land_size_08 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ----------------------+----------------------------------------------------------------
    Test score |  -.1800086   .0277587    -6.48   0.000     -.234452   -.1255651
                    _cons |   3.257215   .2408442    13.52   0.000     2.784845    3.729585
    ---------------------------------------------------------------------------------------
    
    
    . reg Land_size_08 Test score Test score squared
    
          Source |       SS           df       MS      Number of obs   =     1,767
    -------------+----------------------------------   F(2, 1764)      =     47.49
           Model |  2502.54688         2  1251.27344   Prob > F        =    0.0000
        Residual |  46477.7133     1,764    26.34791   R-squared       =    0.0511
    -------------+----------------------------------   Adj R-squared   =    0.0500
           Total |  48980.2602     1,766  27.7351417   Root MSE        =     5.133
    
    -----------------------------------------------------------------------------------------
               Land_size_08 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------------+----------------------------------------------------------------
     Test score |   .1917083   .0584856     3.28   0.001     .0769998    .3064167
    Test score squared |   .0298946   .0041568     7.19   0.000     .0217418    .0380474
                      _cons |   3.772802    .248043    15.21   0.000     3.286313    4.259291
    -----------------------------------------------------------------------------------------
    • With math score
    Code:
    . reg Land_size_08 Test score
    
          Source |       SS           df       MS      Number of obs   =     1,767
    -------------+----------------------------------   F(1, 1765)      =    386.44
           Model |  8797.76244         1  8797.76244   Prob > F        =    0.0000
        Residual |  40182.4977     1,765  22.7662877   R-squared       =    0.1796
    -------------+----------------------------------   Adj R-squared   =    0.1792
           Total |  48980.2602     1,766  27.7351417   Root MSE        =    4.7714
    
    --------------------------------------------------------------------------------------
            Land_size_08 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ---------------------+----------------------------------------------------------------
    Test score|   .5048024   .0256792    19.66   0.000     .4544376    .5551672
                   _cons |   8.820562   .2430064    36.30   0.000     8.343952    9.297173
    --------------------------------------------------------------------------------------
    
    . reg Land_size_08 Test score Test score squred
    
          Source |       SS           df       MS      Number of obs   =     1,767
    -------------+----------------------------------   F(2, 1764)      =    498.23
           Model |  17680.6472         2  8840.32358   Prob > F        =    0.0000
        Residual |   31299.613     1,764  17.7435448   R-squared       =    0.3610
    -------------+----------------------------------   Adj R-squared   =    0.3603
           Total |  48980.2602     1,766  27.7351417   Root MSE        =    4.2123
    
    ----------------------------------------------------------------------------------------
              Land_size_08 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
      Test score|   1.404657    .046167    30.43   0.000     1.314109    1.495204
    Test score squared |    .062689   .0028018    22.37   0.000     .0571938    .0681841
                     _cons |   10.73609   .2309831    46.48   0.000     10.28306    11.18912
    ----------------------------------------------------------------------------------------
    Thank you for your help.

  • #2
    Marry:
    1) it's absolutely possible: different regression specification produce different coefficients. By the way, looking at- pseudoR- as an indirect way to compare the two models (as there's no something simliar to adjRsq available from -probit-), I would trust your second code.
    2) I'm totally unclear with your using OLS as a test to investigate some kind of robustness (with respect to what?)
    3) please use -fvvarlist- notation instead of creating a squared term by hand.
    Last edited by Carlo Lazzaro; 20 May 2022, 09:14.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo Lazzaro, thank you so much for your answer.

      In fact, the use of OLS to run a regression of Land size and test score is just to see the correlation between the two variables when using the different measures of the test score variable (using math or Chinese test scores).

      I am convinced that it is possible to have different signs of the Land size variable with the different specifications but I am trying to investigate why the change happened, I want to understand the reasons, if there are any particular ones.

      Thank you!

      Comment


      • #4
        What is the main purpose of the study? What hypothesis are you testing? The relationship between migration and academic performance?

        Comment


        • #5
          Dear Jeff Wooldridge, thank you for your answer.

          I am looking at the effect of child human capital differential on migration of parents, where the child human capital differential is the difference between the predicted test score if the parent migrates and the predicted test score if the parent does not migrate.

          So, the following empirical strategy is adopted:
          • Reduced form migration equation where the child human capital differential is replaced by the explanatory variables used in the child human capital differential equation
          • Get the IMRs
          • Run the child human capital differential equation if the parent migrates with the IMRs
          • predict for the whole sample the test score if the parent migrates
          • Run the child human capital differential equation if the parent does not migrates with the IMRs
          • predict for the whole sample the test score if the parent does not migrate
          • compute the child human capital differential variable= the test score if the parent migrates - the test score if the parent does not migrates
          • run the structural migration equation including the computed child human capital differential variable (the results are presented in the Table above in #1).
          Thank you for any remarks.

          Comment

          Working...
          X