Hi all -
This is my first post here, so I hope I'm in the right place. I'm finishing up an article on my data about patent litigation. I've got all the cases associated with a set of patents - those that led to an invalidity judgment, as well as those that didn't (usually because they settle early). This is relatively novel, as most folks discard all cases that don't reach a judgment on the merits, which is relatively few cases. So, I have a bunch of cases coded by patent number, and whether the patent was invalidated, plus other information.
What I want to do is estimate the likelihood that a patent will be held invalid in a case based on independent variables relating to a) the patent (e.g. how many times it is cited) and b) the parties/litigation (e.g. number of defendants). I've got a reasonable straight logistic regression.
But the issue I'm thinking about should be clear - there's a sort of selection effect going on: which cases select into getting a ruling? And once selected, is there anything about the patent that tells us which will be invalid? This is important because some plaintiffs may get selected to have rulings more often (that is, the defendants fight), OR it may be that their patents are worse once they get rulings. A single logistic coefficient is ambiguous for some variables.
The question is, can I test the selection separately? I thought about Heckman selection, but everything I've read (including other patent studies) says no, because that's where you've got unobserved dependent variables - I don't have that. I've got all the observations, and within those, some selected to push for a merits ruling and some did not.
And here's the Stata question: I've read a couple articles on two-stage regression, but I can't figure out how to make happen in Stata - any thoughts are appreciated, including whether I'm overthinking this, and I can just show what I want by dropping variables to show that the effects are selection versus quality based (I've done that already).
Any input is much appreciated.
This is my first post here, so I hope I'm in the right place. I'm finishing up an article on my data about patent litigation. I've got all the cases associated with a set of patents - those that led to an invalidity judgment, as well as those that didn't (usually because they settle early). This is relatively novel, as most folks discard all cases that don't reach a judgment on the merits, which is relatively few cases. So, I have a bunch of cases coded by patent number, and whether the patent was invalidated, plus other information.
What I want to do is estimate the likelihood that a patent will be held invalid in a case based on independent variables relating to a) the patent (e.g. how many times it is cited) and b) the parties/litigation (e.g. number of defendants). I've got a reasonable straight logistic regression.
But the issue I'm thinking about should be clear - there's a sort of selection effect going on: which cases select into getting a ruling? And once selected, is there anything about the patent that tells us which will be invalid? This is important because some plaintiffs may get selected to have rulings more often (that is, the defendants fight), OR it may be that their patents are worse once they get rulings. A single logistic coefficient is ambiguous for some variables.
The question is, can I test the selection separately? I thought about Heckman selection, but everything I've read (including other patent studies) says no, because that's where you've got unobserved dependent variables - I don't have that. I've got all the observations, and within those, some selected to push for a merits ruling and some did not.
And here's the Stata question: I've read a couple articles on two-stage regression, but I can't figure out how to make happen in Stata - any thoughts are appreciated, including whether I'm overthinking this, and I can just show what I want by dropping variables to show that the effects are selection versus quality based (I've done that already).
Any input is much appreciated.
Comment