Non estimable marginal effect with margins, even though all coefficients are estimated

Arthur Carvalho Brito

Join Date: Jan 2021
Posts: 45

Non estimable marginal effect with margins, even though all coefficients are estimated

09 Nov 2023, 15:55

Hi,

I am estimating the following linear regression - this is just a dummy for China interacted with time dummies, plus the same for a Hong Kong dummy

Code:

reghdfe unit_value i.china_dummy##i.year i.hk_dummy##i.year, absorb(hs8_group##ym state_group) vce(robust)

This is the output:

Code:

. reghdfe unit_value i.china_dummy##i.year i.hk_dummy##i.year, absorb(hs8_group##ym state_group) vce(robust)
(dropped 73 singleton observations)
(MWFE estimator converged in 7 iterations)
note: 2013bn.year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
note: 2014bn.year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
note: 2015bn.year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
note: 2016bn.year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
note: 2017bn.year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
note: 2018bn.year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
note: 2019bn.year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
note: 2020bn.year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
note: 2021bn.year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)
note: 2022bn.year is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-09)

HDFE Linear regression                            Number of obs   =    130,799
Absorbing 2 HDFE groups                           F(  22, 129253) =      21.19
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.2008
                                                  Adj R-squared   =     0.1912
                                                  Within R-sq.    =     0.0006
                                                  Root MSE        =     3.5441

----------------------------------------------------------------------------------
                 |               Robust
      unit_value | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-----------------+----------------------------------------------------------------
   1.china_dummy |   -.493448   .1374035    -3.59   0.000    -.7627565   -.2241395
                 |
            year |
           2013  |          0  (omitted)
           2014  |          0  (omitted)
           2015  |          0  (omitted)
           2016  |          0  (omitted)
           2017  |          0  (omitted)
           2018  |          0  (omitted)
           2019  |          0  (omitted)
           2020  |          0  (omitted)
           2021  |          0  (omitted)
           2022  |          0  (omitted)
                 |
china_dummy#year |
         1 2013  |   .7311467   .3358236     2.18   0.029     .0729384    1.389355
         1 2014  |   .7689996    .443334     1.73   0.083    -.0999272    1.637926
         1 2015  |   .8111661   .1675883     4.84   0.000      .482696    1.139636
         1 2016  |    .560081   .1590176     3.52   0.000     .2484093    .8717527
         1 2017  |   .4758682   .1577632     3.02   0.003      .166655    .7850813
         1 2018  |   .8301727   .1627107     5.10   0.000     .5112626    1.149083
         1 2019  |    .866174   .1661845     5.21   0.000     .5404552    1.191893
         1 2020  |   .6821976   .1717615     3.97   0.000      .345548    1.018847
         1 2021  |   .7371718   .1756435     4.20   0.000     .3929136     1.08143
         1 2022  |   .2713062   .2159342     1.26   0.209    -.1519211    .6945334
                 |
      1.hk_dummy |   .0622282   .0493742     1.26   0.208    -.0345443    .1590007
                 |
   hk_dummy#year |
         1 2013  |   .4297135   .0622894     6.90   0.000     .3076274    .5517997
         1 2014  |   .3260669   .0652209     5.00   0.000      .198235    .4538987
         1 2015  |   .0318809   .0638135     0.50   0.617    -.0931924    .1569543
         1 2016  |   .0770146   .0625716     1.23   0.218    -.0456247    .1996538
         1 2017  |  -.0555324   .0649348    -0.86   0.392    -.1828035    .0717387
         1 2018  |    .027625   .0657469     0.42   0.674    -.1012377    .1564876
         1 2019  |  -.2082516   .0677319    -3.07   0.002     -.341005   -.0754983
         1 2020  |  -.3769615   .0644718    -5.85   0.000     -.503325    -.250598
         1 2021  |  -.3950385   .0692449    -5.70   0.000    -.5307573   -.2593197
         1 2022  |  -.3629789   .0930993    -3.90   0.000    -.5454518    -.180506
                 |
           _cons |   4.576375   .0106752   428.69   0.000     4.555452    4.597298
----------------------------------------------------------------------------------

Absorbed degrees of freedom:
--------------------------------------------------------+
    Absorbed FE | Categories  - Redundant  = Num. Coefs |
----------------+---------------------------------------|
   hs8_group#ym |      1500           0        1500     |
    state_group |        25           1          24     |
--------------------------------------------------------+

.
end of do-file

As you can see, I am also calculating the marginal effects for each time period - in this case they are just the coefficient on China + the interaction term with China for each coefficient. (likewise for Hong Kong)

Code:

margins 1.china_dummy 1.hk_dummy, dydx(i.year)

Even though my regression output does give me estimated coefficients for all interactions and intercepts, the margins command says that everything is not estimable:

Code:

Average marginal effects                               Number of obs = 130,799
Model VCE: Robust

Expression: Linear prediction, predict()
dy/dx wrt:  2013.year 2014.year 2015.year 2016.year 2017.year 2018.year 2019.year 2020.year 2021.year 2022.year

-------------------------------------------------------------------------------
              |            Delta-method
              |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
2012.year     |  (base outcome)
--------------+----------------------------------------------------------------
2013.year     |
1.china_dummy |          .  (not estimable)
   1.hk_dummy |          .  (not estimable)
--------------+----------------------------------------------------------------
2014.year     |
1.china_dummy |          .  (not estimable)
   1.hk_dummy |          .  (not estimable)
--------------+----------------------------------------------------------------
2015.year     |
1.china_dummy |          .  (not estimable)
   1.hk_dummy |          .  (not estimable)
--------------+----------------------------------------------------------------
2016.year     |
1.china_dummy |          .  (not estimable)
   1.hk_dummy |          .  (not estimable)
--------------+----------------------------------------------------------------
2017.year     |
1.china_dummy |          .  (not estimable)
   1.hk_dummy |          .  (not estimable)
--------------+----------------------------------------------------------------
2018.year     |
1.china_dummy |          .  (not estimable)
   1.hk_dummy |          .  (not estimable)
--------------+----------------------------------------------------------------
2019.year     |
1.china_dummy |          .  (not estimable)
   1.hk_dummy |          .  (not estimable)
--------------+----------------------------------------------------------------
2020.year     |
1.china_dummy |          .  (not estimable)
   1.hk_dummy |          .  (not estimable)
--------------+----------------------------------------------------------------
2021.year     |
1.china_dummy |          .  (not estimable)
   1.hk_dummy |          .  (not estimable)
--------------+----------------------------------------------------------------
2022.year     |
1.china_dummy |          .  (not estimable)
   1.hk_dummy |          .  (not estimable)
-------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

.
end of do-file

Can you help me figure out why, please?

Last edited by Arthur Carvalho Brito; 09 Nov 2023, 15:56. Reason: margins

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

09 Nov 2023, 16:48

Even though my regression output does give me estimated coefficients for all interactions and intercepts, the margins command says that everything is not estimable:

No, your output does not give you estimated coefficients for the year intercepts, and you are asking for the marginal effects of these year variables. Not possible.
Comment
Arthur Carvalho Brito

Join Date: Jan 2021

Posts: 45
#3

09 Nov 2023, 16:55

Clyde Schechter thanks for clarifying. Maybe I am inputing the wrong effect in margins, then?

I want to answer the following: holding all else equal, what is the increase in unit value associated with China in a given year? Say 2013.

So I want the coefficient on the China intercept + the coefficient on the interaction between 1.china and year == 2013. or -0.49 + 0.73.

How could I rewrite the margins call to get that?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#4

10 Nov 2023, 05:54

Are the china and hk dummies exhaustive? If so, you have to drop one set of interactions and use the other as the base group. But if they are, I’m surprised coefficients are given on both. Perhaps you’re absorbing in a way that leaves them in but knocks out the year dummies.
Comment
Arthur Carvalho Brito

Join Date: Jan 2021

Posts: 45
#5

10 Nov 2023, 09:44

Jeff Wooldridge they are not exhaustive, if I understood correctly what you mean by that: there are observations with china_dummy = 0 and hk_dummy = 0. It is never the case that china_dummy = 1 and hk_dummy = 1
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

10 Nov 2023, 10:05

How could I rewrite the margins call to get that?

Code:

margins 2013.year, dydx(china)

to get it for just year 2013. If you want it for every year

Code:

margins year, dydx(china)

Having responded to that, bear in mind that Jeff Wooldridge's question has bearing on how you interpret this. The marginal effects for the china indicator (dummy) variable that you are estimating is basically the difference in 2013 (or specified year in the case of the second code) between the expected value of unit_value in the observations where china == 1 and its expected value in the observations where china == 0. The meaning of the latter category, china == 0, is what is in question here.

Last edited by Clyde Schechter; 10 Nov 2023, 10:09.
Comment
Arthur Carvalho Brito

Join Date: Jan 2021

Posts: 45
#7

10 Nov 2023, 15:19

Clyde Schechter thank you. I want to compare differences in Chinese and Hong Kong unit values across time. So I think comparing these two marginal effects through time gives me what I need.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

10 Nov 2023, 15:35

Oh, I think there's a better way to do that. Instead of having separate indicators for China and Hong Kong, create a new variable:

Code:

gen location = 1 if china_dummy == 1 replace location = 2 if hk_dummy == 1 replace location = 3 if missing(location)

Now rerun your regression, modified as follows:

Code:

reghdfe unit_value ib1.location##i.year, absorb(hs8_group##ym state_group) vce(robust) margins year, dydx(2.location)
Comment
Arthur Carvalho Brito

Join Date: Jan 2021

Posts: 45
#9

10 Nov 2023, 15:40

Clyde Schechter both of the suggested solutions still gives me the "non estimable" answer

Last edited by Arthur Carvalho Brito; 10 Nov 2023, 15:43.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

10 Nov 2023, 15:55

Hmm, that gives me pause. There are situations where Stata will declare things as not estimable when they are, in fact estimable. But they aren't common. I'm now wondering exactly what the colinearity was that caused all of the year indicator variables to be dropped. If the colinearity did not involve anything that is itself related to the location variable (or the original china and hk dummies), then these marginal effects should be estimable. To overcome Stata's reticence, I would simply go back to doing it the "old fashioned" way.

Code:

forvalues y = 2013/2022 { lincom _b[2.location] + _b[2.location##`y'.year] }

But I worry that we're missing something here. Stata may be picking up on an unidentifiability in the model that you and I are missing. What exactly are the variables hs8_group, ym, and state_group? Are any of these time-invariant attributes of the location variable (or the original china or hk dummies)? Are any of them location-invariant attributes of years? If so, we may be dealing with an actual non-estimability of these marginal effects--that is, it may actually be mathematically impossible to estimate those marginal effects from this model, and the numbers you would get using the code above would just be nonsense.
Comment
Arthur Carvalho Brito

Join Date: Jan 2021

Posts: 45
#11

10 Nov 2023, 16:08

Clyde Schechter I think I may have found the issue.

the ym fixed effect is a year-month FE, so that would cause the year FE to be absorbed and then Stata would think the effects are not estimable? If I remove that FE, your code would work.

Here is a better explanation of my setup.

An observation is a product (hs8), state in Brazil (state_group), to a given country (China mainland, Hong Kong, or other), in a given month-year.

Assume year is a single year, so that it is a dummy (T)
to fix ideas what I want do to is to get:

E[y | China = 1, T = 1] - E[y | Hong Kong = 1, T = 1]

So if I run a regression like this

y = A + b_1*China*T + b_2*T + b_3*China + b_4 HK*T + b_5*HK + fixed effects + epsilon

I want to mainly get (b_1 + b_3) and (b_4 + b_5)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

10 Nov 2023, 17:28

So, you do have an authentically intractable problem here. The ym fixed effect is the culprit, and as long as you have those in the model, you will be unable to get valid estimates of those location effects by year. As you've observed, if you take ym out of the fixed effects, the code I suggested works. That's because once the ym effects are included, the model is unidentifiable. The -lincom- code I showed you produces meaningless results because the coefficients themselves are meaningless in this model. So ym has to go, or you have to give up on this quest.

The same is true of the model you write out at the end of #11. If the fixed effects include ym, then the coefficient estimates you get for b1, b2, and b4 are all bogus, they are meaningless artifacts of the way -reghdfe- resolves the colinearity problem.

Last edited by Clyde Schechter; 10 Nov 2023, 17:32.
Comment
Arthur Carvalho Brito

Join Date: Jan 2021

Posts: 45
#13

10 Nov 2023, 17:38

Clyde Schechter but then Is there a way to answer the following question: what is the difference in outcome y across two different sets of units:

sellers from state s of product h to China in month-year ym vs sellers from state s of product h to Hong Kong in month-year ym

And I would like to know this for every year.

This is what I was trying to capture
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#14

10 Nov 2023, 18:03

Is there a way to answer the following question: what is the difference in outcome y across two different sets of units:

Yes, there is a way to approach that question, but you cannot do it with a fixed-effects model. A random effects model would not raise these difficulties. Of course, I'm sure you know that in economics there is an aversion to such models because they may be inconsistent. And that is true: if the treatment whose effects you seek to measure are not randomized, one typically has to start throwing in more covariates in the hopes that you will achieve independence of the residuals from the regressors. And, of course, you can never be sure that you have achieved that.

But if you stay within the realm of fixed effects models, then, no, there is no way to answer that question in such a model when the fixed effects are colinear with year.
Comment

Announcement

Non estimable marginal effect with margins, even though all coefficients are estimated

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment