Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing coefficients from two different samples using xtreg

    Dear All,
    I have seen several postings on the test of coefficients coming from two different xtreg models. I have the below model and would like to test whether the coefficient of fx in model 1 is smaller than the coefficient of fx in model 2.

    Model 1:
    xtreg expshare lfx dllabprod VIX Dolratetota col leverage2 lFGDP_s ipsectoralgrowth log_GDP if year>2001 & dummy_highimport==1, vce(robust)

    Model 2:
    xtreg expshare lfx dllabprod VIX Dolratetota col leverage2 lFGDP_s ipsectoralgrowth log_GDP if year>2001 & dummy_highimport==0, vce(robust)

    I followed the recommendations on the postings and I obtained the following. As far as I understood, I should be interested in the p-value of c.lfx (0.976 ). This implies that the difference of the coefficients is not statistically significant. I would really appreciate if you could confirm whether I am on the right tract?

    Have a great day!

    gen group=1 if dummy_highimport==1
    replace group=2 if dummy_highimport==0
    tab group
    xtreg expshare i.group##(c.lfx c.dllabprod c.VIX c.Dolratetota c.col c.leverage2 c.lFGDP_s c.ipsectoralgrowth c.log_GDP) if year>2001, fe robust

    xtreg expshare i.group##(c.log_industry_rer_96_99_cst c.dllabprod c.VIX c.Dolratetota c.
    > col c.leverage2 c.lFGDP_s c.ipsectoralgrowth c.log_GDP) if year>2001, fe robust
    note: 2.group omitted because of collinearity

    Fixed-effects (within) regression Number of obs = 28,430
    Group variable: id Number of groups = 4,872

    R-sq: Obs per group:
    within = 0.0191 min = 1
    between = 0.0029 avg = 5.8
    overall = 0.0027 max = 9

    F(18,4871) = 12.45
    corr(u_i, Xb) = -0.9792 Prob > F = 0.0000

    (Std. Err. adjusted for 4,872 clusters in id)
    ----------------------------------------------------------------------------------------------
    | Robust
    expshare | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -----------------------------+----------------------------------------------------------------
    2.group | 0 (omitted)
    lfx | -.0531907 .0362823 -1.47 0.143 -.1243204 .0179389
    dllabprod | .0047292 .0035266 1.34 0.180 -.0021846 .011643
    VIX | .0010271 .000228 4.51 0.000 .0005803 .001474
    Dolratetotal | .0440244 .0068111 6.46 0.000 .0306716 .0573773
    col | -.0466193 .0200281 -2.33 0.020 -.0858835 -.0073551
    leverage2 | .0000302 .0002133 0.14 0.887 -.0003879 .0004482
    lFGDP_s | -.2160109 .116787 -1.85 0.064 -.4449661 .0129443
    ipsectoralgrowth | -.009061 .0087774 -1.03 0.302 -.0262686 .0081466
    log_GDP | .1570772 .0616418 2.55 0.011 .0362314 .277923
    |
    group#|
    c.lfx |
    2 | .0013076 .0435208 0.03 0.976 -.0840129 .086628
    |
    group#c.dllabprod |
    2 | .0072897 .0043555 1.67 0.094 -.0012491 .0158285
    |
    group#c.VIX |
    2 | -.0011931 .000308 -3.87 0.000 -.0017969 -.0005894
    |
    group#c.Dolratetotal |
    2 | .0019018 .0098426 0.19 0.847 -.0173941 .0211977
    |
    group#c.col |
    2 | .0029947 .0271973 0.11 0.912 -.0503242 .0563137
    |
    group#c.leverage2 |
    2 | .0117649 .0067593 1.74 0.082 -.0014864 .0250162
    |
    group#c.lFGDP_s |
    2 | .1813865 .1796721 1.01 0.313 -.1708517 .5336248
    |
    group#c.ipsectoralgrowth |
    2 | -.0188761 .0160302 -1.18 0.239 -.0503025 .0125503
    |
    group#c.log_GDP |
    2 | -.2035728 .0829576 -2.45 0.014 -.366207 -.0409385
    |
    _cons | 3.030952 2.336514 1.30 0.195 -1.54967 7.611573
    -----------------------------+----------------------------------------------------------------
    sigma_u | 1.3683954
    sigma_e | .1249707
    rho | .99172847 (fraction of variance due to u_i)
    ----------------------------------------------------------------------------------------------

  • #2
    This looks correct to me.

    I will take this opportunity to note that, in my opinion, significance testing is not the best approach in this situation. More informative, in my view, is to report the group difference in lfx coefficients (given by the group#c.lfx coefficient) and its 95% confidence interval or standard error.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      This looks correct to me.

      I will take this opportunity to note that, in my opinion, significance testing is not the best approach in this situation. More informative, in my view, is to report the group difference in lfx coefficients (given by the group#c.lfx coefficient) and its 95% confidence interval or standard error.
      Thank you Clyde. I understand that I have to report the parts in red.Why you think it's more appropriate?
      Robust
      expshare Coef. Std. Err. t P>t [95% Conf. Interval]


      .0013076. .0435208 0.03 0.976 -.0840129 .086628

      Comment


      • #4
        Well, if you do a significance test, you are testing the null hypothesis that there is zero difference between the two groups. But in most research contexts (and perhaps yours is an exception--there are some) that null hypothesis is a ridiculous straw-man. Of course, two different groups are going to differ on almost anything you measure to some degree. So if the null hypothesis is a straw man, why bother testing it? After all, you already know it's false. What you really want to know is something more like "is the difference large enough to matter?" Well, a p-value doesn't tell you that because the p-value is a mashup into a single statistic of the sample size, the noisiness of the data, and the true effect size. By contrast, a 95% confidence interval tells you, in this case, that your best estimate of the mean difference between groups is 0.0013076, and, given the vagaries of noise, and sample size, the proposition that the true value is somewhere between -0.0840129 and +0.086628 is plausible, because 95% of intervals calculated in this way actually do contain the correct value. So now you have an estimate of the group difference along with a sense of how precise that estimate is. If all of the values contained in the confidence interval are big enough to matter for practical purposes, then you can confidently assert that there is a meaningful group difference. If none of them are large enough to matter, then you can confidently assert that the difference between the groups is too small to matter. If the confidence interval spans differences that include both large enough to matter and too small to matter, then your conclusion must be tentative: my best estimate matters (or doesn't, as the case may be), but the data are compatible with the opposite being true.

        The p-value gives you none of that; it just tells you whether something you already know to be false is incompatible with your data.

        Comment


        • #5
          An interesting and recent article on the relevant topic wisely highlighted by Clyde is: https://besjournals.onlinelibrary.wi...041-210X.13159
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            I guess I see your point but my output implies that I should accept the null hypothesis that the difference of the coefficients are zero. So should I still worry about the confidence intervals when P is large?

            Comment


            • #7
              You cannot accept the null hypothesis unless you have first done a formal a priori power analysis showing that you have adequate data to support that conclusion. Without that, you can only reject the null hypothesis (if p is sufficiently small) or fail to reject the null (which is not the same as accepting the null--it is more like being agnostic about it.) So, your output does not, by itself imply that you must accept the null hypothesis. By itself, it just says you can't be confident the null hypothesis is false--but that doesn't even come close, by itself, to making it true.

              And really, ask your self, if you had no data available, would you consider it at all reasonable that there might be no difference whatsoever between the two groups? If so, then testing that null hypothesis takes on a small bit of reasonableness. But if not (the more usable case), then testing the null hypothesis, though often done in practice because people don't think about it, makes no real sense--what you would really want to know is whether the difference matters. That leads you back to my line of reasoning in #4.

              Comment


              • #8
                Thank you very much for the explanation.
                Given my results,
                1) I fail to reject the null which implies that the difference of the coefficients can be small...
                2) In a different set up (with confidence intervals containing large or small values), I can have more to say... but not in this situation.

                Comment


                • #9
                  Nazlika:

                  Code:
                  expshare Coef. Std. Err. t P>t [95% Conf. Interval]
                  
                  
                  .0013076.   .0435208 0.03 0.976 -.0840129 .086628
                  1) as per your results, you fail to reject the null because the bounds of the 95%CI cross the 0 value. This means that you cannot rule out that the difference is 0 (ie, no difference). Please note that the high P-value is simply the other face of the coin of having a 95% CI that crosses 0. Your 95% CI tells you that, if you were to select, say, 100 random samples from the population your original sample was drawn from, 95 out of 100 95% CI intervals would contain the real, fixed and unknown value of the difference you were investigating (this is a frequentist notion; bayesians have different takes, which include the randomness of the population parameters).
                  Your large 95% CI may depend on a too small sample size, an actual absence of difference in the population from which your sample has been drawn, the role of other predictors or else.
                  2) I fail to get your second statement.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    I want to ask one more thing if possible:
                    Whats the difference between a &b?

                    A)
                    gen group=1 if dummy_highimport==1
                    replace group=2 if dummy_highimport==0
                    tab group
                    xtreg expshare i.group##(c.lfx c.dllabprod c.VIX c.Dolratetota c.col c.leverage2 c.lFGDP_s c.ipsectoralgrowth c.log_GDP) if year>2001, fe robust
                    look the coefficient (p value, confidence interval etc.) of c.lfx to see whether to see the impact of lfx is higher in high import than low import?


                    B)
                    gen IO_lfx=dummy_highimport*lfx
                    xtreg expshare lfx dllabprod IO_lfx VIX Dolratetota lrsale col leverage2 lFGDP_s ipsectoralgrowth log_GDP if year>2001,fe robust
                    . lincom IO_lfx

                    ( 1) IO_lfx = 0

                    ------------------------------------------------------------------------------
                    expshare | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                    (1) | .0858202 .0161961 5.30 0.000 .0540749 .1175656
                    ------------------------------------------------------------------------


                    Comment


                    • #11
                      A) is a valid model incorrectly interpreted. To see if the effect of lfx differs between high and low import you have to look at the coefficient of group#lfx, not at the coefficient of lfx.

                      B) is probably an invalid model. I say "probably" because I do not know exactly how the variable dummy_highimport is constructed. If dummy_highimport is constant within id, then the model is OK. But if dummy_highimport can vary over time within an id, then the model is invalid because it has been omitted. Assuming that dummy_highimport is, in fact, a time-invariant attribute of each id, so that the model is valid, then the results of lincom IO_lfx will match the findings from group#lfx in model A.

                      Comment


                      • #12
                        Dummy is constant within ID. I guess I had a mistake in the previous model as I have interacted all the independent variables. In this version, I got the same results from both models.

                        xtreg expshare i.group##(c.log_industry_rer_96_99_cst) dllabprod VIX Dolratetota col leverage2
                        > lFGDP_s ipsectoralgrowth log_GDP if year>2001

                        Random-effects GLS regression Number of obs = 28,430
                        Group variable: id Number of groups = 4,872

                        R-sq: Obs per group:
                        within = 0.0155 min = 1
                        between = 0.1339 avg = 5.8
                        overall = 0.1079 max = 9

                        Wald chi2(11) = 925.27
                        corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

                        ----------------------------------------------------------------------------------------------
                        expshare | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                        -----------------------------+----------------------------------------------------------------
                        2.group | .5864936 .0745983 7.86 0.000 .4402835 .7327036
                        log_industry_rer_96_99_cst | .0441343 .0223026 1.98 0.048 .0004219 .0878466
                        |
                        group#|
                        c.log_industry_rer_96_99_cst |
                        2 | -.1267491 .016303 -7.77 0.000 -.1587024 -.0947958
                        |
                        dllabprod | .0089485 .0014072 6.36 0.000 .0061904 .0117065
                        VIX | .0002622 .0001179 2.22 0.026 .0000312 .0004932
                        Dolratetotal | .0766487 .0032862 23.32 0.000 .0702078 .0830896
                        col | -.0517889 .0078533 -6.59 0.000 -.0671811 -.0363967
                        leverage2 | .000576 .00067 0.86 0.390 -.0007372 .0018891
                        lFGDP_s | .1066267 .0097353 10.95 0.000 .0875459 .1257076
                        ipsectoralgrowth | -.0296929 .0071745 -4.14 0.000 -.0437547 -.0156312
                        log_GDP | -.0600679 .0173869 -3.45 0.001 -.0941457 -.0259901
                        _cons | -2.34548 .2685857 -8.73 0.000 -2.871898 -1.819062
                        -----------------------------+----------------------------------------------------------------
                        sigma_u | .24569202
                        sigma_e | .12510531
                        rho | .79410442 (fraction of variance due to u_i)
                        ----------------------------------------------------------------------------------------------

                        .
                        end of do-file

                        . do "/var/folders/49/5dv72bd538q8cshsq1_vf2n40000gp/T//SD00263.000000"

                        . xtreg expshare log_industry_rer_96_99_cst IO_RER_PS dummy_highimport dllabprod VIX Dolratetota
                        > col leverage2 lFGDP_s ipsectoralgrowth log_GDP if year>2001

                        Random-effects GLS regression Number of obs = 28,430
                        Group variable: id Number of groups = 4,872

                        R-sq: Obs per group:
                        within = 0.0155 min = 1
                        between = 0.1339 avg = 5.8
                        overall = 0.1079 max = 9

                        Wald chi2(11) = 925.27
                        corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

                        --------------------------------------------------------------------------------------------
                        expshare | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                        ---------------------------+----------------------------------------------------------------
                        log_industry_rer_96_99_cst | -.0826149 .0201291 -4.10 0.000 -.1220672 -.0431625
                        IO_RER_PS | .1267491 .016303 7.77 0.000 .0947958 .1587024
                        dummy_highimport | -.5864936 .0745983 -7.86 0.000 -.7327036 -.4402835
                        dllabprod | .0089485 .0014072 6.36 0.000 .0061904 .0117065
                        VIX | .0002622 .0001179 2.22 0.026 .0000312 .0004932
                        Dolratetotal | .0766487 .0032862 23.32 0.000 .0702078 .0830896
                        col | -.0517889 .0078533 -6.59 0.000 -.0671811 -.0363967
                        leverage2 | .000576 .00067 0.86 0.390 -.0007372 .0018891
                        lFGDP_s | .1066267 .0097353 10.95 0.000 .0875459 .1257076
                        ipsectoralgrowth | -.0296929 .0071745 -4.14 0.000 -.0437547 -.0156312
                        log_GDP | -.0600679 .0173869 -3.45 0.001 -.0941457 -.0259901
                        _cons | -1.758987 .2681726 -6.56 0.000 -2.284595 -1.233378
                        ---------------------------+----------------------------------------------------------------
                        sigma_u | .24569202
                        sigma_e | .12510531
                        rho | .79410442 (fraction of variance due to u_i)
                        --------------------------------------------------------------------------------------------

                        Comment


                        • #13
                          I have one another question. For another specification, I run a difference GMM regression instead on xtreg. I divide the sample into three categories based on firm characteristics which are constant over ids. I want to test whether the coefficients of fx based on sample 1&sample2, sample 2&sample3 and sample 1&sample3 are statistically different from each other.

                          I assume that
                          1) GMM asymptotically distributed as normal distribution, and
                          2) the three samples are independent.
                          Then I calculate the t values as
                          (c1-c2)/sqrt(s1^2+s2^2).
                          where c1 and c2 are the coefficients of fx, while s1 and s2 are standard deviations of the fx
                          Here I compare t-stat with 1.96 and decide whether the coefficients are statistically different from each other.
                          I would appreciate if you could share your thoughts on this approach?
                          Thanks again for your time,
                          Nazlı

                          Comment


                          • #14
                            I have a related question about testing whether coefficients of two regression models are statistically different. I'm looking at the effects of unemployment (and related characteristics) on perceived job security, and whether unemployment effects differ by mental health status. I first tried estimating two separate models for those in poor and good mental health, and then using the suest command to test for differences in the unemployment coefficient. However, I'm using fixed effects regressions and I got an error message saying that "xtreg is not supported by suest". My code is as follows:

                            Code:
                            xtreg jobsec unemp male1 age tenure contract fire1 hire1 i.j1 if mh9_q1==0, fe i(id) cluster(id)
                            est store e4
                            xtreg jobsec unemp male1 age tenure contract fire1 hire1 i.j1 if mh9_q1==1, fe i(id) cluster(id)
                            est store e5
                            
                            suest e4 e5
                            Is there an alternative command that I can use in this case to test whether the 'unemp' coefficients are statistically different in the two models? Or is the feasible alternative to use an interaction model as follows, and look at the coefficient for mh9_q1##unemp?

                            Code:
                            xtreg jobsec mh9_q1##( c.unemp male1 c.age c.tenure contract c.fire1 c.hire1) i.j1, cluster(id)

                            Comment


                            • #15
                              Ashani:
                              your last code is the way to go.
                              Then you can check what you're after via -test- and -lincom-.
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X