How to estimate the effect of an event on entry into dataset

William Park

Join Date: Jan 2024

Posts: 5
#1

How to estimate the effect of an event on entry into dataset

19 Jan 2024, 10:38

I have an annual dataset of companies.

I define entry as follows: a company is considered to have started business in year t if the business first appeared in my dataset in year t.

In 2000, there was an event that might have affected entry of companies.

I want to learn how the effect of this 2000 event on company entry differed across states by Republican vote shares.

Can I just run this?

1{Enter}_i,s,t = beta*1{year>=2000}*RepVoteShare_s + FE_s,t + error

where
i = company
s = state
t = year
1{Enter}_i,s,t= 1 if company i, which is in state s, first entered the dataset in year t (was missing in years before year t).

This looks OK at first glance.

But if I think about it, those that did not enter are not included in my dataset. So is that problematic in a way that "beta" doesn't capture what I want it to capture? Or perhaps FE removes this concern?
Tags: causal, difference-in-differences
Daniel Schaefer

Join Date: Mar 2020

Posts: 807
#2

19 Jan 2024, 13:06

Yes, you can estimate a model with an indicator variable where 1 indicates years greater than or equal to the year 2000. The coefficient on the indicator will give you the change in the model intercept for measurements taken in the year 2000 or later. Essentially, this is the average difference between the outcome before and after 2000 holding all else equal (at zero). Suppose you mean center all of your variables. Then the intercept is the average outcome holding all else at the mean. The coefficient on the dummy variable will indicate the change in the mean of the outcome after 2000 holding all else at the mean. If that's all you want to capture, great! Its not a problem that you are using an FE model here.

I think you're really looking for a Difference in Difference (DiD) model. I'm not an expert in DiD models (though a few people here are), I just know enough to have the intuition that this is a modeling rout you may want to go down.
Comment
William Park

Join Date: Jan 2024

Posts: 5
#3

19 Jan 2024, 20:08

Originally posted by Daniel Schaefer View Post

Yes, you can estimate a model with an indicator variable where 1 indicates years greater than or equal to the year 2000. The coefficient on the indicator will give you the change in the model intercept for measurements taken in the year 2000 or later. Essentially, this is the average difference between the outcome before and after 2000 holding all else equal (at zero). Suppose you mean center all of your variables. Then the intercept is the average outcome holding all else at the mean. The coefficient on the dummy variable will indicate the change in the mean of the outcome after 2000 holding all else at the mean. If that's all you want to capture, great! Its not a problem that you are using an FE model here.

I think you're really looking for a Difference in Difference (DiD) model. I'm not an expert in DiD models (though a few people here are), I just know enough to have the intuition that this is a modeling rout you may want to go down.

The key here is that the dependent variable is entry.

If entry outcome is 1, then they exist in the sample, but if entry outcome is 0, then they don't even exist in the sample.

That is, whether or not an observation exists in the same depends on the outcome variable of that observation.

I am not sure if this is problematic or ok in this setting.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 807
#4

21 Jan 2024, 10:17

If entry outcome is 1, then they exist in the sample, but if entry outcome is 0, then they don't even exist in the sample.

Yes, that could pose a problem. I see two possible issues: First, it sounds like you might be saying if a company is in the dataset, entry is 1. That clearly won't work, because there wouldn't be variation in your outcome. Second, suppose you have measurements for each company in each year. Entry equals 1 the year the company enters the market, and is zero on every other year. Now the issue is that most of your observed years will be zero and you are stuck with a rare events analysis (and all of the associated problems) because most Company/Years will be zero.

There is another problem worth thinking about: You can't observed companies that could have entered the market but didn't.

One option if you have country level data is to aggregate the data to the State level and use the number of companies that entered the market in a given state as the outcome. Then use a Poisson or negative binomial fixed effects regression to model the outcome.
1 like
Comment

Announcement

How to estimate the effect of an event on entry into dataset

Comment

Comment

Comment