What Estimation Method to use? (PSM, Probit, Logit, Heckman)

Vincent Maurits

Join Date: Dec 2022

Posts: 3
#1

What Estimation Method to use? (PSM, Probit, Logit, Heckman)

22 Dec 2022, 03:46

Hi all,

I'm currently writing my thesis in Finance on the explanatory power of ESG scores on the delisting decision of firms. For this study I have collected deal data with ISIN codes of target firms and collected E-S- and G pillar scores, Tobin's Q, Log Assets (Size), Firm Age and industry classifications per firm. I have now created two samples: one with delisted firms (around 400 firms) and one with listed firms (around 800 firms). My dependent variable, delisting, is a binary response variable.

My hypothesis is that smaller firms with low ESG scores have a higher probability of delisting from the exchange. My supervisor told me that because of the way I have collected my sample, I need to use propensity score matching first to match a delisted firm with a listed firm of similar size/industry. However, I am not sure which test I need to use after the propensity score matching. Another professor has mentioned the Heckman command in STATA, which could solve the problem with selection bias that might be present in the current model. In my Econometrics textbook there is an example of a probit model trying to explain whether a firm is taken over by another firm during a given year, and uses a similar formula as my model. However, I am unsure about the sample being used in the textbook.

I am not getting clear answers from supervisors and therefore I have turned to this platform to ask people with a little bit more expertise about which test suits this study best and how to implement this in STATA.

Hope to hear from you soon.

Kind regards,

V. Maurits
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 662
#2

22 Dec 2022, 04:54

There are so many options here and all are probably fine. I cannot state anything about the heckman solution, but logit and probit will give very similar results. Matching 1:1 is probably not up to date any more. Modern solutions use either kernel matching or entropy balancing. An ado that provides these solutions is kmatch in Stata. Potentially, you could test several statistical approaches and see how they perform. Running the commands after having prepared the data is very fast.

Best wishes

(Stata 16.1 MP)
Comment
George Ford

Join Date: Aug 2014

Posts: 3120
#3

23 Dec 2022, 15:07

CEM might work too. Match on the "treated" firms using size and such. Then probit/logit with CEM weights, or on matched sample if you do 1:1 (k2k in the command). You can also do multiple matches by excluding the matches from the first k2k round (but not the treated units, of course).
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#4

23 Dec 2022, 15:55

I think probit and logit are appropriate here, where the delisting dummy is the dependent variable and the main explanatory variables are the ESG variables and size.

You can also match delisted to non-delisted firms on size any way you know how to do it it, and then check whether the ESG variables are different between the delisted and non-delisted in the matched sample.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#5

24 Dec 2022, 06:27

You shouldn't be matching on the basis of your outcome variable, Y. Matching is done on conditioning variables -- but you don't have a binary "treatment" so it's not clear what purpose that would serve. If you're treating your explanatory variables as exogenous, use logit or probit. A linear model might give similar results.

There is one potential issue depending on how you obtained your sample. Did you sample on the basis of Y? Or is this a representative sample from your population? If the former, then you have a choice-based sampling issue. It can be fixed by weighting if you know the frequencies in the population. But maybe you did just take a random sample, which is easiest to analyze. There's not need to think of having two samples in this case. You have a single sample and variation in your outcome Y, which you clearly need.
2 likes
Comment
Vincent Maurits

Join Date: Dec 2022

Posts: 3
#6

28 Dec 2022, 04:28

Thanks all for your thoughts.

To Joro Kolev, indeed the idea from my supervisor was to match the delisted to the non-delisted firms on variables like Firm Size and Firm Age and then compare the ESG scores.

To Jeff Woolridge, you are right about the issue on how I obtained the sample. The thing is that the sample is not randomly chosen. I have chosen specific criteria on Zephyr's M&A deal database to get the list of delisted firms first and then listed firms in a different excel sheet. The way I selected was on deal type (public takeovers), between 2007 and now, US only, deal value > 1 m EUR. This is the selection bias I was talking about and was wondering whether it could be solved with the Heckman model in STATA.
Comment
Vincent Maurits

Join Date: Dec 2022

Posts: 3
#7

29 Dec 2022, 13:36

Jeff Wooldridge what is your view on the best solution to the sample selection bias? Propensity score matching with a probit/logit model or Heckman model?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#8

29 Dec 2022, 23:56

The Heckman model is not a solution to this by design "selection bias" that you have.

If you want to do inference for deals that are "deal value > 1 m EUR" there is no selection bias. There would be a selection bias if you have collected only big deals, but you want to do inference about small deals too.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#9

29 Dec 2022, 23:59

Originally posted by Jeff Wooldridge View Post

You shouldn't be matching on the basis of your outcome variable, Y. Matching is done on conditioning variables -- but you don't have a binary "treatment" so it's not clear what purpose that would serve. If you're treating your explanatory variables as exogenous, use logit or probit. A linear model might give similar results.

There is one potential issue depending on how you obtained your sample. Did you sample on the basis of Y? Or is this a representative sample from your population? If the former, then you have a choice-based sampling issue. It can be fixed by weighting if you know the frequencies in the population. But maybe you did just take a random sample, which is easiest to analyze. There's not need to think of having two samples in this case. You have a single sample and variation in your outcome Y, which you clearly need.

Why should one not match the way how I described: matching of delisted to a comparator group/singleton of nondelisted firms, where the matching is done on exogenous variables such as firm size?
Comment

Announcement

What Estimation Method to use? (PSM, Probit, Logit, Heckman)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment