coefficient larger than mean of dependent variable

Marry Lee

Join Date: Nov 2020
Posts: 186

coefficient larger than mean of dependent variable

16 Mar 2024, 11:57

Dear all,

I have the following model

Code:

reghdfe Y X1 X2 X3  ,  absorb(school_code Year) vce(cluster school_code#Year)

where Y is binary with 5% mean (only 5% of the population have the value 1)

I have the following result:

Code:

HDFE Linear regression                            Number of obs   =     20,001
Absorbing 2 HDFE groups                           F(   3,    186) =       1.12
Statistics robust to heteroskedasticity           Prob > F        =     0.3408
                                                  R-squared       =     0.0062
                                                  Adj R-squared   =     0.0048
                                                  Within R-sq.    =     0.0002
Number of clusters (school_code#Year) =        187Root MSE        =     0.2235

                             (Std. err. adjusted for 187 clusters in school_code#Year)
--------------------------------------------------------------------------------------
                     |               Robust
Y                   | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
                     X1|  -.1149805   .0667706    -1.72   0.087    -.2467057    .0167446
                     X2|  -.0194905   .1469087    -0.13   0.895     -.309312    .2703309
                     X3|   .0116331   .0122768     0.95   0.345    -.0125866    .0358528
               _cons |    .047886   .0097145     4.93   0.000     .0287213    .0670508
--------------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
 school_code |        17           0          17     |
        Year |        11           1          10     |
-----------------------------------------------------+

My question is why am I having a coefficient for X1 that is higher than the mean? Do you have any ideas please?
Note that I cannot use probit or logit models because they do not allow for interaction in clustering.

All the best

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29790
#2

16 Mar 2024, 12:19

First, you are using a linear probability model, so predicted probabilities outside the 0 to 1 interval are possible. They suggest that a linear probability model may not be appropriate for this data.

But before reaching a conclusion of that nature, it is important to consider the scale of variable X1. The coefficient represents the expected change in Y associated with a 1-unit difference in X1. If the scale of variable X1 is very small, such that a unit difference in X1 is not even possible, or would be extraordinarily unusual, in the real world, then there is nothing surprising about such a coefficient. If, for example, X1 ranges only between 0.5 and 0.6, then the maximum possible difference in values of X1 is only 0.1, and correspondingly the maximum associated difference in Y would be 0.1*-.1149805 = -.01149805, which is perfectly reasonable.
3 likes
Comment
Marry Lee

Join Date: Nov 2020

Posts: 186
#3

17 Mar 2024, 08:32

Clyde Schechter thank you so much for your answer.
You are totally right. X1 varies only between 0 and 0.33. So in my text, should I interpret the results with a 0.1 variation of X1?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29790
#4

17 Mar 2024, 12:01

Yes, in a situation like this, it is probably better to describe your results in terms of the effect on Y of a 0.1 difference in X1, as a unit difference in X never occurs, and perhaps doesn't even make sense at any level.
1 like
Comment

Announcement

coefficient larger than mean of dependent variable

Comment

Comment

Comment