Matching Treatment/Control Groups on DiD with Repeated Cross-Sectional Data

Panagiotis Vorgias

Join Date: Nov 2017

Posts: 23
#1

Matching Treatment/Control Groups on DiD with Repeated Cross-Sectional Data

12 Apr 2025, 21:20

Dear Statlist users,

I am trying to estimate the effect of a patent policy change on citations. My sample consists of patents matched to firms for a 1970-2000 period for which I have the total number of forward citations plus covariates. The policy referred to all patents that receive public funding after 1983. My intuition says that this can be done with a DiD.

However, I get confused on whether I should consider the data a panel at firm-level or a cross-section at patent-level.

I can have a balanced sub-sample of firms that can follow over time but this will cost a significant reduction of the sample size.
My main concern is that no patent is observed at different points of time since is granted once. Thus not literally a panel, right?
If treat it as cross-sectional how do I do the matching of control and treatment groups? Once for treatment patents with pre-1983 'treated' patents that act as the baseline for treatment group and then those with patents that act as baseline for control group? And what about the post-treatment control group then?

I apologise in advance for the abstract question. I just cannot wrap my mind around this issue, didn't find anything on that and time is running out for a presentation.

Currently, I have used treated and non-treated firms as the groups to create my pre-treatment baselines (ie pre-treatment: patents filed pre-1983 from firms whose post-1983 patents got treated, respectively for controls). My issue with this is that firm FE then drop the treatment dummy and I would like to report that.

Any thoughts? If confident on how to handle this all ideas will help greatly. Thanks
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

12 Apr 2025, 22:17

I'm not sure I fully understand what you've written. But it sounds like there are firms that are in a "treatment" group and others in a "control" group, and, among those in the treatment group, an intervention takes place starting in 1983. You are trying to do a difference-in-differences analysis, and you are concerned because the treatment group indicator ("dummy") gets omitted due to colinearity with the firm-level fixed effects, and you want to report its coefficient.

If I have that right, if you somehow coaxed Stata into retaining the treatment indicator, because this is a fixed-effects model, whatever coefficient you got for the treatment indicator would just be a meaningless artifact of the particular way you constrained the model to get the treatment indicator preserved. It is not something you would want to report. In a fixed effects model, effects of time-invariant attributes of the panel variable are inherently unidentifiable--this is linear algebra and there is no way around it. You can create the appearance of estimating the effect by imposing some identifying constraint on the model, but it is not reality: the "effect" you get that way says nothing about treatment and is just an epiphenomenon of the particular constraint used. In fact, if you pick a number that you would like that "effect" to be, there is always a constraint you can impose on the model that will produce that result. It has no connection to any actual treatment effect in the real world.

If you are concerned about having adequate adjustment for the "treatment" effect, there is no need to worry about that. In a fixed effect model, although the effects of time-invariant attributes of the panel variable are unidentifiable, they are automatically fully adjusted for, even if they haven't been measured.

So, you are tormenting yourself about a non-problem. The only effect you really need is the DID estimate of the causal treatment effect, and that comes from the treatment#pre-post interaction term. So focus on that and start preparing your presentation.

Added: I want to be clear, however, that I do not quite understand the setup here. Is the treatment directed at the firm level or the patent level, and is the effect exerted by the actions of the firm or by the properties of the patent? These are questions that bear on whether an analysis with fixed-effects regression or an OLS analysis is more appropriate for your problem. So my remarks above are directed only at how to interpret a fixed-effects analysis should you ultimately settle on that approach. I do not have the information or understanding necessary to decide on which model is most appropriate.

Last edited by Clyde Schechter; 12 Apr 2025, 23:01.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#3

13 Apr 2025, 07:36

Like Clyde, I'm a bit unsure of the setup, but I think his added comment gets at the crux of the issue. It seems like the patent receives the treatment, and there aren't treated or control firms because any firm can get a patent that had public funding. Now, of course, firms likely vary a lot on how much they use public funding for patents, so you will want to control for firm fixed effects.

So I think the unit of observation is a patent, which makes this like a repeated cross section. But you will have to control for firm differences via fixed effects. Assuming you have a dummy variable "public" and year dummy variables d1970 to d2000, you can estimate a separate effect in each period as

Code:

reghdfe cites c.public#(c.d1983 c.d1984 ... c.d2000) public, absorb(firmid year) vce(cluster firmid)

Or, for a single effect, replace c.public#(c.d1983 c.d1984 ... c.d2000) with c.public#c.post where post is a post 1983 dummy variable.

The above is sometimes called a "lags only" approach because it doesn't estimate pre-treatment placebo effects. A "leads and lags" event study approach is

Code:

reghdfe cites c.public#(c.d1970 ... c.d1981 c.d1983 c.d1984 ... c.d2000) public, absorb(firmid year) vce(cluster firmid)

which uses 1982 as the reference period. -- the most common event study estimator.

I think you can get jwdid to do this as follows (but I'm less sure because it's a repeated cross section):

Code:

gen cohort = 0 replace cohort = 1983 if public & year >= 1983 jwdid cites, tvar(year) gvar(cohort) group(firmid) vce(cluster firmid)

The Stata command hdidregress should also handle this case.
Comment

Announcement

Matching Treatment/Control Groups on DiD with Repeated Cross-Sectional Data

Comment

Comment