Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Control Function with sample selection

    Dear Statalist,


    I would like to show you the problem that I am encoutering in my current research.
    I have a database with information of 1,000 firms. In this database I can check whether a firm had contact with Public Administration or not (dichotomous variable). If they had contact, then, I can observe whether they pay a bribe or not (dichotomous variable). But, If they did not have contact with Public Administration, then, I cannot observe If they paid for a bribe. In my research, I want to study the effect of firm bribery on labor productivity, but as you can see I have a sample selection issue. This could be handle by using Heckman selection model. However, the main problem here is that at the same time, an according to the literature of my field, bribery is a endogenous variable because of simultaneity. So, I have a selection sample and simultaneity problems. As a consequence, I have solved my problem by this way,

    Code:
    probit contact_with_PA W CONTROLS
    predict xb if e(sample), xb
    gen imr = normalden(xb) / normal(xb)
    
    probit bribe_payment Z CONTROLS
    predict u if e(sample), score
    
    reg labor_productivity bribe_payment imr u CONTROLS
    Basically,in my regression of interest (the last one), I am including the inverse Mills ratio from the first regression and the generalized residuals of the second one (as in Woolridge 2015), where W and Z are a selection variable that can influence to be in contact with the Public Administration and the instrument for bribe_payment, respectively.

    I would like to ask you whether this approach is correct or if I am missing something relevant.
    Thank you in advanced,
    Ibai
    Last edited by Ibai Ostolozaga Falcon; 17 Feb 2025, 05:31.

  • #2
    Dear Ibai, not sure. Isn't it a problem that your probit model explaining bribe_payment is already subject to a selection problem? An alternative would be to estimate all three submodels simultaneously (e.g. assuming joint normality and using cmp).
    Code:
    gen selectvar = contact_with_PA
    cmp (labour_productivity = bribe_payment CONTROLS) (bribe_payment = Z CONTROLS) (selectvar = W CONTROLS), ind(selectvar*$cmp_cont selectvar*$cmp_probit $cmp_probit) nolr qui
    Best wishes,
    Harald

    Comment


    • #3
      Originally posted by Harald Tauchmann View Post
      Dear Ibai, not sure. Isn't it a problem that your probit model explaining bribe_payment is already subject to a selection problem? An alternative would be to estimate all three submodels simultaneously (e.g. assuming joint normality and using cmp).
      Code:
      gen selectvar = contact_with_PA
      cmp (labour_productivity = bribe_payment CONTROLS) (bribe_payment = Z CONTROLS) (selectvar = W CONTROLS), ind(selectvar*$cmp_cont selectvar*$cmp_probit $cmp_probit) nolr qui
      Best wishes,
      Harald
      Hello Harald,

      Yes, you are right, so I do not know if would be right to include also imr from first step in the probit model explaing bribe_payment. I have used cmp as you suggested. However, with this methodlogy I have the next question, imagine that my endogenous variable bribe_payment is interacted with another exogenous variable X. Hence, how should I proceed by using cme?
      Thank you.
      Last edited by Ibai Ostolozaga Falcon; 17 Feb 2025, 08:08.

      Comment


      • #4
        Please, any suggestion? I am a little bit confused about this issue.

        Thank you.

        Comment


        • #5
          Is the variable you want bribe_payment to interact with itself binary or continuous (or of some other type)?

          Comment


          • #6
            Originally posted by Harald Tauchmann View Post
            Is the variable you want bribe_payment to interact with itself binary or continuous (or of some other type)?
            Hello Harald,

            I want to interact bribe_payment (which is a binary variable) with an exogenous continuous variable.

            Thank you.
            Last edited by Ibai Ostolozaga Falcon; 19 Feb 2025, 03:59.

            Comment


            • #7
              Hi Ibai,

              Maybe this doesn't make any sense but I think that the main issue stems from the fact that you don't observe bribery outcomes for those firms with no contact to Public Administration—PA—, right? If you could observe the bribery outcomes for those firms which are not in contact to PA, maybe you might use as an instrument for bribery the dummy variable of being in contact to PA, as it probably would not affect directly labor productivity but indirectly through its effect on bribery.

              Therefore, the problem I see is that for those firms having "0" in the variable contact_with_PA you don't have information about bribery. Is there any way from which you could infer credibly the most likely outcomes in bribery for those firms with no contact to public administration?

              Best regards,
              Daniel

              Comment


              • #8
                Dear Ibai, as I understand it, cmp still specifies the joint likelihood function correctly if you just add i.bribe_payment#c.X to the right-hand-side of the equation explaining labour_productivity (and probably X to the right-hand-sides of the other two equations). But I could be wrong. Best, Harald

                Comment


                • #9
                  Thank you for your answers!

                  Comment

                  Working...
                  X