Control Function with sample selection

Ibai Ostolozaga Falcon

Join Date: May 2021

Posts: 36
#1

Control Function with sample selection

17 Feb 2025, 04:17

Dear Statalist,

I would like to show you the problem that I am encoutering in my current research.
I have a database with information of 1,000 firms. In this database I can check whether a firm had contact with Public Administration or not (dichotomous variable). If they had contact, then, I can observe whether they pay a bribe or not (dichotomous variable). But, If they did not have contact with Public Administration, then, I cannot observe If they paid for a bribe. In my research, I want to study the effect of firm bribery on labor productivity, but as you can see I have a sample selection issue. This could be handle by using Heckman selection model. However, the main problem here is that at the same time, an according to the literature of my field, bribery is a endogenous variable because of simultaneity. So, I have a selection sample and simultaneity problems. As a consequence, I have solved my problem by this way,

Code:

probit contact_with_PA W CONTROLS predict xb if e(sample), xb gen imr = normalden(xb) / normal(xb) probit bribe_payment Z CONTROLS predict u if e(sample), score reg labor_productivity bribe_payment imr u CONTROLS

Basically,in my regression of interest (the last one), I am including the inverse Mills ratio from the first regression and the generalized residuals of the second one (as in Woolridge 2015), where W and Z are a selection variable that can influence to be in contact with the Public Administration and the instrument for bribe_payment, respectively.

I would like to ask you whether this approach is correct or if I am missing something relevant.
Thank you in advanced,
Ibai

Last edited by Ibai Ostolozaga Falcon; 17 Feb 2025, 04:31.
Tags: None
Harald Tauchmann

Join Date: Aug 2017

Posts: 26
#2

17 Feb 2025, 06:00

Dear Ibai, not sure. Isn't it a problem that your probit model explaining bribe_payment is already subject to a selection problem? An alternative would be to estimate all three submodels simultaneously (e.g. assuming joint normality and using cmp).

Code:

gen selectvar = contact_with_PA cmp (labour_productivity = bribe_payment CONTROLS) (bribe_payment = Z CONTROLS) (selectvar = W CONTROLS), ind(selectvar*$cmp_cont selectvar*$cmp_probit $cmp_probit) nolr qui

Best wishes,
Harald
Comment
Ibai Ostolozaga Falcon

Join Date: May 2021

Posts: 36
#3

17 Feb 2025, 06:59

Originally posted by Harald Tauchmann View Post

Dear Ibai, not sure. Isn't it a problem that your probit model explaining bribe_payment is already subject to a selection problem? An alternative would be to estimate all three submodels simultaneously (e.g. assuming joint normality and using cmp).

Code:

gen selectvar = contact_with_PA cmp (labour_productivity = bribe_payment CONTROLS) (bribe_payment = Z CONTROLS) (selectvar = W CONTROLS), ind(selectvar*$cmp_cont selectvar*$cmp_probit $cmp_probit) nolr qui

Best wishes,
Harald

Hello Harald,

Yes, you are right, so I do not know if would be right to include also imr from first step in the probit model explaing bribe_payment. I have used cmp as you suggested. However, with this methodlogy I have the next question, imagine that my endogenous variable bribe_payment is interacted with another exogenous variable X. Hence, how should I proceed by using cme?
Thank you.

Last edited by Ibai Ostolozaga Falcon; 17 Feb 2025, 07:08.
Comment
Ibai Ostolozaga Falcon

Join Date: May 2021

Posts: 36
#4

18 Feb 2025, 00:30

Please, any suggestion? I am a little bit confused about this issue.

Thank you.
Comment
Harald Tauchmann

Join Date: Aug 2017

Posts: 26
#5

18 Feb 2025, 06:20

Is the variable you want bribe_payment to interact with itself binary or continuous (or of some other type)?
Comment
Ibai Ostolozaga Falcon

Join Date: May 2021

Posts: 36
#6

19 Feb 2025, 02:08

Originally posted by Harald Tauchmann View Post

Is the variable you want bribe_payment to interact with itself binary or continuous (or of some other type)?

Hello Harald,

I want to interact bribe_payment (which is a binary variable) with an exogenous continuous variable.

Thank you.

Last edited by Ibai Ostolozaga Falcon; 19 Feb 2025, 02:59.
Comment
Daniel Santos Torres

Join Date: Feb 2023

Posts: 15
#7

19 Feb 2025, 08:11

Hi Ibai,

Maybe this doesn't make any sense but I think that the main issue stems from the fact that you don't observe bribery outcomes for those firms with no contact to Public Administration—PA—, right? If you could observe the bribery outcomes for those firms which are not in contact to PA, maybe you might use as an instrument for bribery the dummy variable of being in contact to PA, as it probably would not affect directly labor productivity but indirectly through its effect on bribery.

Therefore, the problem I see is that for those firms having "0" in the variable contact_with_PA you don't have information about bribery. Is there any way from which you could infer credibly the most likely outcomes in bribery for those firms with no contact to public administration?

Best regards,
Daniel
Comment
Harald Tauchmann

Join Date: Aug 2017

Posts: 26
#8

19 Feb 2025, 08:26

Dear Ibai, as I understand it, cmp still specifies the joint likelihood function correctly if you just add i.bribe_payment#c.X to the right-hand-side of the equation explaining labour_productivity (and probably X to the right-hand-sides of the other two equations). But I could be wrong. Best, Harald
Comment
Ibai Ostolozaga Falcon

Join Date: May 2021

Posts: 36
#9

20 Feb 2025, 00:45

Thank you for your answers!
Comment

Announcement

Control Function with sample selection

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment