Difference-in-differences model with different numbers of pre- and post-treatment observations

Oscar Jones

Join Date: May 2016
Posts: 10

Difference-in-differences model with different numbers of pre- and post-treatment observations

22 May 2016, 08:46

Hi all,

For my master thesis, I am looking at the effect of cross-listing on a firm's leverage ratio. I have formed a treatment group (=cross-listed firms) and a control group (=non-cross-listed firms), based on supposedly similar characteristics such as firm size, market-to-book ratio, and cost of capital. Due to lack of data for numerous observations, I have unbalanced panel data. I am a novice Stata user and not very experienced with statistics in general.

I have run a difference-in-differences analysis, where the time dummy comprises the 5 years prior to (=0) and 5 years after (=1) the treatment:

Code:

. diff lvg, t(d_cl) p(d_time5)

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 2967
            Baseline       Follow-up
   Control: 673            1020        1693
   Treated: 574            700         1274
            1247           1720
------------------------------------------------------
 Outcome var.   | lvg     | S. Err. |   t   |  P>|t|
----------------+---------+---------+-------+---------
Baseline        |         |         |       | 
   Control      | 0.214   |         |       | 
   Treated      | 0.235   |         |       | 
   Diff (T-C)   | 0.022   | 0.011   | 2.01  | 0.045**
Follow-up       |         |         |       | 
   Control      | 0.204   |         |       | 
   Treated      | 0.216   |         |       | 
   Diff (T-C)   | 0.012   | 0.009   | 1.31  | 0.191
                |         |         |       | 
Diff-in-Diff    | -0.009  | 0.014   | -0.66 | 0.508
------------------------------------------------------
R-square:    0.00
- Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1

I notice that the outcome is rather puzzling: a pre-treatment statistical difference in leverage ratios between treated and control group, and a smaller and insignificant difference post-treatment.

When I change the time period to only look at the two years before and after the treatment, the outcome seems to be slightly more satisfying, but not yet usable for my research:

Code:

. diff lvg, t(d_cl) p(d_time2)

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 1461
            Baseline       Follow-up
   Control: 371            455         826
   Treated: 308            327         635
            679            782
------------------------------------------------------
 Outcome var.   | lvg     | S. Err. |   t   |  P>|t|
----------------+---------+---------+-------+---------
Baseline        |         |         |       | 
   Control      | 0.205   |         |       | 
   Treated      | 0.230   |         |       | 
   Diff (T-C)   | 0.025   | 0.015   | 1.66  | 0.098*
Follow-up       |         |         |       | 
   Control      | 0.215   |         |       | 
   Treated      | 0.209   |         |       | 
   Diff (T-C)   | -0.006  | 0.014   | -0.42 | 0.673
                |         |         |       | 
Diff-in-Diff    | -0.031  | 0.021   | -1.49 | 0.135
------------------------------------------------------
R-square:    0.00
- Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1

Ideally, I want to find no statistical differences in leverage pre-treatment, and a statistical difference (I suspect the cross-listing to have a negative effect on leverage ratio) post-treatment.

Can an explanation for the lack of desired outcome be the difference in numbers of observations between the pre- and post-treatment period?

I have read that differences between size of control and treatment group should not matter, but I have never read about pre- and post-treatment differences.

Thanks!

Oscar

Tags: difference-in-differences, unbalanced panel data

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

22 May 2016, 10:03

The number of pre- and post-treatment observations does not need to be the same. It doesn't matter. In general, when doing comparisons of groups with different numbers of observations (whether over time or number of people), the "effective sample size" (in terms of statistical power) is closer to that of the smaller group than the larger. (It's the harmonic mean, actually.) But looking at your output, the magnitude of the effect you are focusing on appears to be very, very small, and your sample size is respectable for finding effects that are large enough to matter practically. So I don't think you have a statistical power issue here. I think the effect you hoped to find is just much smaller than you imagined.
1 like
Comment
Oscar Jones

Join Date: May 2016

Posts: 10
#3

22 May 2016, 10:27

Dear Mr. Schechter,

It indeed looks like the effect is not as big as I hoped. Hopefully when I break down my sample in subsets (e.g. based on country of origin) I am able to find an outcome that is of more relevance to present in my thesis.

Thank you for your reply and the clear explanation!
Comment
Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#4

22 May 2016, 10:32

You could also consider to use additional covariates to control for differences in the control and treatment group (see the -cov- option of the -diff- command). The -diff- command also allows to use propensity score matching to create a matched sample before applying the diff-in-diff approach. You may consider this as well if it is not beyond the scope of your thesis (there is also an ongoing debate about whether using matched samples is actually a good idea).
1 like
Comment

Oscar Jones

Join Date: May 2016
Posts: 10

23 May 2016, 02:23

That is a good idea Mr. Geiger, thanks!

So for the -cov- command, I would add fixed effects such as domestic country (cntry) and industry (sic1)?

I read about the -addcov- command as well. Is this command for those variables on which the control group is formed (firm size, market-to-book, etc.)? Does the fact that I have missing values for some of these variables lead to issues when adding them in the regression?

I have also improved my control group by eliminating firms that don't have at least observations in the two periods before and after the treatment date, and run the regression again. The results look more promising in terms of statistical and economic significance. And the inclusion of -cov- leads to interesting pre- and post-treatment differences in the two groups compared to exclusion, without affecting the DiD-estimator:

Code:

. diff lvg if d_yr1_2==1, t(d_cl) p(d_time3) cluster(id)

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 1245
            Baseline       Follow-up
   Control: 269            282         551
   Treated: 341            353         694
            610            635
------------------------------------------------------
 Outcome var.   | lvg     | S. Err. |   t   |  P>|t|
----------------+---------+---------+-------+---------
Baseline        |         |         |       | 
   Control      | 0.185   |         |       | 
   Treated      | 0.230   |         |       | 
   Diff (T-C)   | 0.045   | 0.022   | 2.09  | 0.037**
Follow-up       |         |         |       | 
   Control      | 0.199   |         |       | 
   Treated      | 0.200   |         |       | 
   Diff (T-C)   | 0.001   | 0.023   | 0.06  | 0.956
                |         |         |       | 
Diff-in-Diff    | -0.044  | 0.019   | -2.30 | 0.023**
------------------------------------------------------
R-square:    0.01
- Means and Standard Errors are estimated by linear regression
- Clustered Std. Errors
**Inference: *** p<0.01; ** p<0.05; * p<0.1

. diff lvg if d_yr1_2==1, t(d_cl) p(d_time3) cluster(id) cov(cntry sic1)
DIFFERENCE-IN-DIFFERENCES WITH COVARIATES

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 1245
            Baseline       Follow-up
   Control: 269            282         551
   Treated: 341            353         694
            610            635
------------------------------------------------------
 Outcome var.   | lvg     | S. Err. |   t   |  P>|t|
----------------+---------+---------+-------+---------
Baseline        |         |         |       | 
   Control      | 0.219   |         |       | 
   Treated      | 0.270   |         |       | 
   Diff (T-C)   | 0.051   | 0.021   | 2.39  | 0.018**
Follow-up       |         |         |       | 
   Control      | 0.233   |         |       | 
   Treated      | 0.240   |         |       | 
   Diff (T-C)   | 0.007   | 0.022   | 0.32  | 0.749
                |         |         |       | 
Diff-in-Diff    | -0.043  | 0.019   | -2.26 | 0.025**
------------------------------------------------------
R-square:    0.02
- Means and Standard Errors are estimated by linear regression
- Clustered Std. Errors
**Inference: *** p<0.01; ** p<0.05; * p<0.1

Am I correct in including the cluster(id) command due to heteroskedasticity?

Thanks so much for the help!

Comment

Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#6

23 May 2016, 05:43

As far as I understand the diff command, the addcov() option is only relevant when you are performing a combination of propensity score matching and diff-in-diff estimation. In this case, the cov() variables are used in the propensity score matching and the addcov() covariates in the diff-in-diff estimation. However, since that you are only using a diff-in-diff estimation cov() should be alright, i.e. you should put all covariates of your model in this option.

Are the variables cntry and sic1 dummy variables? If so, your approach is correct. However, if they are categorical/factor variables you need to generate dummies for each category (leaving one out as your reference category). Unfortunately the diff command does not support the factor variable notion (i.). Alternatively, you could simply use the standard reg command to perform your diff-in-diff estimation. For that you need only a dummy indicating the base period (=0) and the follow-up period (=1) as well as a dummy for the treated (=1) and non-treated (=0). Since you already have them, you actually need just to generate their interaction term:

Code:

gen diff_in_diff = d_cl * d_times

Now you can run the regression:

Code:

reg lvg d_cl d_times diff_in_diff i.cntry i.sic1, vce(cluster id)

If the variables cntry and sic1 are dummies you can omit the i. notion. The coefficient of the diff_in_diff variable should be the same as the value for "Diff-in-Diff" in the output of the -diff- command. If not, this should raise red flags ;-).

Clustering the standard errors should be appropriate in your estimation, because you have multiple observations for the firms in the sample, which leads to intraclass correlation and this is causing biased standard errors.

Missings are only problematic if they a) are systematic (e.g. smaller firms have missings while big firms have not) and/or b) they reduce the sample size significantly.

PS: You don't have to call me Mr. Geiger. This makes me look older than I am ;-).
Comment

Oscar Jones

Join Date: May 2016
Posts: 10

23 May 2016, 11:07

All right, thanks a lot Sebastian (thought to play it safe at first;-) )!

The variables cntry and sic1 are indeed categorical as you assume. However, for the scope of my thesis I'm not looking into country or sic-code specific results. Or is that what -cov- does?

I thought it meant to separately run the regression on the subgroups based on the specified factors (country and sic), as in 'telling' Stata that it should compare firms on a country- and sic-level, but then still reporting one outcome: the DiD-estimator. Or is that not correct?

If not, is there a way to calculate that? Or is that the regression below? I'm getting a bit confused now.

Running your command leads to:

Code:

. reg lvg d_cl d_time3 diff_in_diff3 i.cntry i.sic1

      Source |       SS           df       MS      Number of obs   =     2,015
-------------+----------------------------------   F(19, 1995)     =      9.39
       Model |  6.12096308        19  .322155952   Prob > F        =    0.0000
    Residual |  68.4538033     1,995  .034312683   R-squared       =    0.0821
-------------+----------------------------------   Adj R-squared   =    0.0733
       Total |  74.5747663     2,014  .037028186   Root MSE        =    .18524

-------------------------------------------------------------------------------
          lvg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         d_cl |   .0164483   .0127446     1.29   0.197    -.0085459    .0414425
      d_time3 |   .0053963   .0111623     0.48   0.629    -.0164948    .0272874
diff_in_diff3 |  -.0292504   .0167893    -1.74   0.082    -.0621769     .003676
              |
        cntry |
      Brazil  |   .1310015   .0198752     6.59   0.000     .0920231    .1699798
       Chile  |   .0427062   .0254079     1.68   0.093    -.0071225    .0925349
       China  |  -.0180593   .0174077    -1.04   0.300    -.0521986    .0160799
      France  |  -.0084356   .0220481    -0.38   0.702    -.0516752     .034804
     Germany  |  -.0005315   .0209675    -0.03   0.980     -.041652    .0405889
       Japan  |   .0628139   .0236069     2.66   0.008     .0165172    .1091106
      Mexico  |   .1340764   .0243393     5.51   0.000     .0863433    .1818095
        U.K.  |   .0079611    .016674     0.48   0.633    -.0247392    .0406614
              |
         sic1 |
           1  |   .1063235   .0846518     1.26   0.209    -.0596918    .2723387
           2  |     .13539   .0837618     1.62   0.106    -.0288798    .2996598
           3  |   .1362238   .0837978     1.63   0.104    -.0281166    .3005642
           4  |   .0926459   .0857151     1.08   0.280    -.0754546    .2607464
           5  |   .1207486   .0853536     1.41   0.157     -.046643    .2881401
           7  |   .0986554   .0841225     1.17   0.241    -.0663218    .2636327
           8  |   .1049014   .0871013     1.20   0.229    -.0659176    .2757203
           9  |  -.0638105   .0952559    -0.67   0.503     -.250622    .1230011
              |
        _cons |   .0644306   .0852752     0.76   0.450    -.1028072    .2316684
-------------------------------------------------------------------------------

When I first tried with the -vce(cluster id)-, Stata tells me that the VCE "is not of sufficient rank to perform the model test" (-help j_robustsingular-).

Now it shows coefficients per country and per sic code (for both having removed the first value, I'm aware of that and how to change it). Is this then what would be referred to as adding fixed effects to a model? I guess quite a beginner's question..

Thanks again!

Comment

Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#8

23 May 2016, 14:44

The -cov- option does NOT compute country- or sector-specific treatment effects. It basically adjusts your sample for differences in the treatment and control group. It also takes into account that the leverage ratio may change due to changes in the covariates (e.g. firm size) rather than due to the change between not cross-listed to cross-listed.

Using country or sector dummies may be seen as using country and sector fixed effects (others would argue that only firm-specific fixed effects are "real" fixed effects, but this kind of fixed effects are not possible within an diff-in-diff approach). Non-technically speaking, if we use these dummies, the regression only compares the firms within one country and within one sector. Even though I'm not familiar with this research area, I think you need to include this kind of dummies to avoid treating firms from, for example, China and Germany as essentially similar (which they are probably not).

If you just put the variables cntry and sic1 in the cov() option, they will be assumed to be continuous/metric. For example, firms from France (=4) are assumed to be two times "higher" than firms from Chile (=2). This, however, is not a correct interpretation of what this variable actually includes. In fact, it just indicates from which country the firm is - it's an nominal variable (i.e. you cannot put the values in any meaningful order). Therefore, you need to either use the factor notation (i.), as I did it in the reg command, or you need to generate separate dummies beforehand (diff does not support the factor notion). One quick way to do so is to use -tab-

Code:

tab cntry, gen(cntry_)

This will generate new dummy variables for each category: cntry_1, cntry_2 and so on. These variable, except one, you can put in the cov() option.

Ultimately, the two approaches should yield the same results in the diff-in-diff estimator, because the -diff- command does exactly the same as the -reg- command.

Code:

gen diff_in_diff3 = d_cl * d_times3 reg lvg d_cl d_time3 diff_in_diff3 i.cntry i.sic1 if if d_yr1_2==1, vce(cluster id)

Code:

tab sic1, gen(sic1_) tab cntry, gen(cntry_) diff lvg if d_yr1_2==1, t(d_cl) p(d_time3) cluster(id) cov(cntry_1 cntry_2 cntry_2 cntry_3 cntry_4 cntry_5 cntry_6 cntry_7 cntry_8 sic1_1 sic1_2 sic1_3 sic1_4 sic1_5 sic1_6 sic1_7 sic1_8 sic1_9)

I don't quite understand why the vce(cluster id) option returns an error. Maybe someone else has an idea.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#9

23 May 2016, 15:22

To diagnose the vce(cluster id) error: how many distinct values of "id" are there? How many in each of the four treatment/period groups? (The total of these four might be > then the overall distinct number, if some firms had data in two time periods.)

Last edited by Steve Samuels; 23 May 2016, 15:34.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#10

23 May 2016, 15:45

Is this not the reason for clustering the standard errors because there are some identical firms for different points of time? In addition, if we have observations for firms just at one point of time, the diff-in-diff approach would not work.
Comment

Oscar Jones

Join Date: May 2016
Posts: 10

#11

23 May 2016, 16:10

Wow, the -tab- command could have saved me a lot of time in the past! Thanks!

After including your suggestions, Daniel, the results get even stranger:

Code:

. diff lvg if d_yr1_2==1, t(d_cl) p(d_time3) cluster(id) cov( cntry_1 cntry_2 cntry_3 cntry_5 cntry_6 cntry_7 cntry_8 sic_1 sic_2 sic_3 sic_4 sic_5 sic_6 sic_7 sic_8 )
DIFFERENCE-IN-DIFFERENCES WITH COVARIATES

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 1242
            Baseline       Follow-up
   Control: 266            282         548
   Treated: 341            353         694
            607            635
------------------------------------------------------
 Outcome var.   | lvg     | S. Err. |   t   |  P>|t|
----------------+---------+---------+-------+---------
Baseline        |         |         |       | 
   Control      | -0.034  |         |       | 
   Treated      | 0.005   |         |       | 
   Diff (T-C)   | 0.039   | 0.021   | 1.81  | 0.072*
Follow-up       |         |         |       | 
   Control      | -0.019  |         |       | 
   Treated      | -0.022  |         |       | 
   Diff (T-C)   | -0.002  | 0.021   | -0.11 | 0.915
                |         |         |       | 
Diff-in-Diff    | -0.041  | 0.019   | -2.11 | 0.036**
------------------------------------------------------
R-square:    0.13
- Means and Standard Errors are estimated by linear regression
- Clustered Std. Errors
**Inference: *** p<0.01; ** p<0.05; * p<0.1

. diff lvg if d_yr1_2==1, t(d_cl) p(d_time3) cluster(id) cov( cntry_1 cntry_2 cntry_3 cntry_4 cntry_5 cntry_6 cntry_7 sic_1 si
> c_2 sic_3 sic_4 sic_5 sic_6 sic_7 sic_8 )
DIFFERENCE-IN-DIFFERENCES WITH COVARIATES

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 1242
            Baseline       Follow-up
   Control: 266            282         548
   Treated: 341            353         694
            607            635
------------------------------------------------------
 Outcome var.   | lvg     | S. Err. |   t   |  P>|t|
----------------+---------+---------+-------+---------
Baseline        |         |         |       | 
   Control      | 0.001   |         |       | 
   Treated      | 0.043   |         |       | 
   Diff (T-C)   | 0.043   | 0.022   | 1.97  | 0.050**
Follow-up       |         |         |       | 
   Control      | 0.016   |         |       | 
   Treated      | 0.018   |         |       | 
   Diff (T-C)   | 0.002   | 0.021   | 0.10  | 0.920
                |         |         |       | 
Diff-in-Diff    | -0.040  | 0.019   | -2.09 | 0.038**
------------------------------------------------------
R-square:    0.13
- Means and Standard Errors are estimated by linear regression
- Clustered Std. Errors
**Inference: *** p<0.01; ** p<0.05; * p<0.1

These are just two examples, with different SICs and countries out and in. I've tried a bit more with leaving out different ones every time, and the results change a lot. These two I find quite strange, given the negative or really low leverage ratios, which is not possible (not one leverage ratio in my dataset is negative).

I think I have made an error in the way I organised the data for the regression: the time dummy comprises multiple years prior to and after the cross-listing. So in these regressions, I have data on 3 time periods before and 3 time periods after the cross-listing. Can that be the cause of these strange results?

I have read, but not fully understood, about how to solve this: something with creating dummies for every year, I guess pretty similar to what we've done with the sic's and country's now? Actually, adding them in -cov- doesn't change the outcome so I guess that's not it..

How many distinct values of "id" are there?

Steve, I have 171 treated firms, and 170 control firms, all with a unique id. However, the number of firms I use in the regression changes, depending on:
(1) The time period of the observations: -d_time5- shows the obs. from t-5 to t+5, -d_time4- only shows obs. from t-4 to t+4, etc. to d_time1
(2) The obs. per firm: -d_yr1_2- includes only those firms for which at least data for t-1, t-1, t0, t+1 and t+2 is available, for -d_yr1_3-, also t-3 and t+3 are required

Perhaps it is worth mentioning that I always exclude time period t0, because I have yearly data points, so I can't tell whether the treatment was given at the beginning or the end of the year. A firm's leverage ratio needs time to develop, so this shouldn't matter. At least, short-term effects don't matter that much.

Hope the explanation makes sense!

Comment

Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#12

23 May 2016, 17:19

These are just two examples, with different SICs and countries out and in. I've tried a bit more with leaving out different ones every time, and the results change a lot. These two I find quite strange, given the negative or really low leverage ratios, which is not possible (not one leverage ratio in my dataset is negative).

The interpretation of the diff-in-diff coefficient is not that cross-listed firms have a negative leverage ratio, the interpretation would be more like "after becoming cross-listed the leverage ratio increased (positive sign) or decreased (negative sign) by x percentage points". How to translate the diff-in-diff coefficient into percentage points depends on the scale of the lvg variable (is it 0 to 100 or 0 to 1?).

Or do you mean the values displayed below "baseline" and "follow-up"? Since we use additional covariates, these are not actual means of the lvg variable, but rather expected values for the reference group. It may seem puzzling that the value becomes negative (because you do not have negative lvg values), but this comes from the estimation of the effects that your additional dummy variables have. If you choose a country or industry sector with a rather large mean for the lvg variable, the negative sign should disappear. I would not pay to much attention to these values. The interesting effect is the diff-in-diff coefficient, and this one should be (quite) similar in each estimation, regardless of your choice for a reference category.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#13

23 May 2016, 17:49

It seems that you have enough firms overall, and you don't seem be seeing the vce(cluster id) error in your most recent runs, and because I haven't been following too closely I can't tell what has changed.

If the problem recurs, count the relevant number of firms with:

Code:

codebook firm if *

where * is a clause that defines exactly what firms would have been analyzed; you can include a condition to define the treatment/period as well in *

Sebastian's statement[QUOTE] if we use these dummies, the regression only compares the firms within one country and within one sector[/CODE] may hold a clue.

Code:

egen firmtag = tag(firm) // identify one observation for counting purposes tab country sic if firmtag & *

Perhaps there just too few firms in some combinations.

Last edited by Steve Samuels; 23 May 2016, 18:11.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Sebastian Geiger

Join Date: Oct 2015

Posts: 124
#14

23 May 2016, 18:07

If I understood Oscar correctly, the error appears only with the -reg- command which I suggested to mimic what the -diff- command does when specifying the cluster() option. This is quite puzzling because I'm pretty sure that -diff- does exactly the same.

The source code (mata code) of -diff- shows:

Code:

if "`cluster'" != "" { local clust cluster(`cluster') } [...] `bsp' reg `output' `period' `treated' _diff `cov' `if' `in' [`weight'`exp'], `robust' `clust'

That is, it creates a local if the cluster option is specified which is inserted in the regression command. Nothing else did I.
Comment
Oscar Jones

Join Date: May 2016

Posts: 10
#15

24 May 2016, 01:44

Steve and Sebastian,

Thanks for your comments!

Perhaps there are just too few firms in some combinations.

Steve, this may be true, see the table below:

Code:

tab cntry1 sic1 if firmtag & d_yr1_1 | sic1 cntry1 | 0 1 2 3 4 5 7 8 9 | Total -----------+---------------------------------------------------------------------------------------------------+---------- Australia | 0 16 8 3 0 4 2 0 0 | 33 Brazil | 0 6 12 5 16 4 0 0 2 | 45 Chile | 0 0 8 4 0 4 0 0 0 | 16 China | 1 4 18 21 2 0 13 4 0 | 63 France | 0 3 0 14 0 0 7 4 0 | 28 Germany | 0 0 9 11 0 0 8 0 0 | 28 Japan | 0 0 0 14 0 3 4 0 0 | 21 Mexico | 0 0 5 9 2 1 1 0 0 | 18 U.K. | 0 11 33 18 2 8 14 4 1 | 91 -----------+---------------------------------------------------------------------------------------------------+---------- Total | 1 40 93 99 22 24 49 12 3 | 343

I could drop the sic values of 0 and 9 for example. Alternatively, I could add either countries or sic, or neither, instead of both in -cov()-; I am not interested in country or industry specific values, just the DiD-estimator in general (and some other related stuff but that's maybe for another post). But, if not accounting for industry- and country-specifics would distort the results, I guess I have to include it in the -diff-?

For my thesis (and in general), I am more interested in the time effects; I suspect that a firm's leverage ratio needs some time to develop, and it would be interesting if I could prove that. E.g.: in t+1 leverage decreased with 8%, in t+2 with 6%, and in t+3 with 2%, something like that. I think what I'm missing still is a way to account for changes per year.

Also Sebastian, leverage ratio is indeed ranging from 0-1, so 0.2 = 20% (equity, 80% debt).

Is this not the reason for clustering the standard errors because there are some identical firms for different points of time? In addition, if we have observations for firms just at one point of time, the diff-in-diff approach would not work.

Sebastian, I think the issue is that I have observations for different points of time BEFORE the treatment, and observations for different points of time AFTER the treatment (so one firm could have 4 data points, 2 pre-treatment and 2 post-treatment). I don't know if Stata understands that and takes it into account.

Or do you mean the values displayed below "baseline" and "follow-up"? Since we use additional covariates, these are not actual means of the lvg variable, but rather expected values for the reference group.

Sebastian, what do you mean with 'expected values for the reference group'? If the control group has a value of -.034 pre-treatment, we can conlude that it has a 3.4% lower leverage than ...? The pre-treatment treatment group? The post-treatment control group? Something else?

Hopefully anyone knows a way to show leverage development over time / yearly coefficients, that would be great!

Thanks for the help so far!
Comment

Announcement

Difference-in-differences model with different numbers of pre- and post-treatment observations

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment