Hello everyone,
I am a new statalist user and I really hope that you can help me with this problem.
My PhD research is focused on evaluate the impact of firm’s network on the probability of recording a green patent. In order to do this I would like to estimate a random effects probit model using a panel dataset. However, this probability is influenced by the probability of recording a generic patent by the firms, generating a sample selection bias. So I estimated an heckman model using the xteprobit command with the “select” option, but there was a problem to reach the convergence, for this reason I thought that can be useful the command “cmp” for an heckman model, described in the paper Roodman D (2011) Fitting fully observed recursive mixed-process models with cmp. Stata J 11:159–206.
As I am not really confident with this estimation strategy, I wanted to know if I used this command correctly and what is the interpretation of the output. In the code section are illustrated firstly an example generated by dataex in which I describe the main variables used in the model, secondly is showed the command I used for the estimation and its output.
Variables explanation: green_patent is a dummy which is 1 if the firm records a green patent and 0 if the firm records another type of patent; patent is a dummy which is 1 if the firm records a patent and 0 if the firm doesn't record any patent; network_lag2 is a lagged dummy which is 1 if the firm is in a network and 0 otherwise; ln_x2_lag1 is a lagged variable of firm revenues; ln_x3_lag1 and ln_x4_lag1 are a control lagged variables; ln_z1_lag1 is an instrumental variable which influence directly the probability of record a patent and doesn't influence directly the probability of record a green patent for a firm.
I am a new statalist user and I really hope that you can help me with this problem.
My PhD research is focused on evaluate the impact of firm’s network on the probability of recording a green patent. In order to do this I would like to estimate a random effects probit model using a panel dataset. However, this probability is influenced by the probability of recording a generic patent by the firms, generating a sample selection bias. So I estimated an heckman model using the xteprobit command with the “select” option, but there was a problem to reach the convergence, for this reason I thought that can be useful the command “cmp” for an heckman model, described in the paper Roodman D (2011) Fitting fully observed recursive mixed-process models with cmp. Stata J 11:159–206.
As I am not really confident with this estimation strategy, I wanted to know if I used this command correctly and what is the interpretation of the output. In the code section are illustrated firstly an example generated by dataex in which I describe the main variables used in the model, secondly is showed the command I used for the estimation and its output.
Variables explanation: green_patent is a dummy which is 1 if the firm records a green patent and 0 if the firm records another type of patent; patent is a dummy which is 1 if the firm records a patent and 0 if the firm doesn't record any patent; network_lag2 is a lagged dummy which is 1 if the firm is in a network and 0 otherwise; ln_x2_lag1 is a lagged variable of firm revenues; ln_x3_lag1 and ln_x4_lag1 are a control lagged variables; ln_z1_lag1 is an instrumental variable which influence directly the probability of record a patent and doesn't influence directly the probability of record a green patent for a firm.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(ID green_patent patent network_lag2 ln_x2_lag1 ln_x3_lag1 ln_x4_lag1 ln_z1_lag1) 148 . 0 . . . . . 148 . 0 . 11.008137 10.6766 -1.017247 -4.4598875 148 . 0 0 11.089849 10.685332 -1.1712191 -4.5404754 148 . 0 0 10.91624 10.643757 -1.0999008 -3.6860235 148 . 0 0 10.88623 10.693308 -1.1215011 -3.206731 148 . 0 0 10.854965 10.663826 -1.1383761 -2.920066 148 1 1 0 10.967755 10.73531 -1.287427 -3.025512 148 0 1 0 11.0672 10.775618 -1.4089557 -3.237444 148 . 0 0 11.084599 10.80649 -1.4042463 -3.4748454 148 . 0 0 11.114367 10.814565 -1.4681163 -3.6652496 148 0 1 0 11.176088 10.83506 -1.2296044 -3.6670656 end
Code:
cmp ( green_patent=i.network_lag2 ln_x2_lag1 ln_x3_lag1 ln_x4_lag1 || ID:) ( patent=i.network_lag2 ln_x2_lag1 ln_x3_lag1 ln_x4_lag1 ln_z1_lag1 || ID:), indicators($cmp_probit $cmp_probit) For quadrature, defaulting to technique(bhhh) for speed. Fitting individual models as starting point for full model fit. Note: For programming reasons, these initial estimates may deviate from your specification. For exact fits of each equation alone, run cmp separately on each. Iteration 0: log likelihood = -1927.4803 Iteration 1: log likelihood = -1900.6792 Iteration 2: log likelihood = -1900.4961 Iteration 3: log likelihood = -1900.496 Probit regression Number of obs = 7,527 LR chi2(4) = 53.97 Prob > chi2 = 0.0000 Log likelihood = -1900.496 Pseudo R2 = 0.0140 -------------------------------------------------------------------------------- green_patent | Coefficient Std. err. z P>|z| [95% conf. interval] ---------------+---------------------------------------------------------------- 1.network_lag2 | .1632603 .1027586 1.59 0.112 -.0381429 .3646636 ln_x2_lag1 | .0749908 .0129474 5.79 0.000 .0496144 .1003671 ln_x3_lag1 | .0441979 .0735625 0.60 0.548 -.0999819 .1883778 ln_x4_lag1 | .0077381 .017801 0.43 0.664 -.0271513 .0426274 _cons | -2.706493 .7381765 -3.67 0.000 -4.153292 -1.259694 -------------------------------------------------------------------------------- Warning: regressor matrix for green_patent equation appears ill-conditioned. (Condition number = 202.87697.) This might prevent convergence. If it does, and if you have not done so already, you may need to remove nearly collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or nonrtolerance option to the command line. See cmp tips. Iteration 0: log likelihood = -41181.737 Iteration 1: log likelihood = -32833.349 Iteration 2: log likelihood = -31322 Iteration 3: log likelihood = -31216.574 Iteration 4: log likelihood = -31215.833 Iteration 5: log likelihood = -31215.833 Probit regression Number of obs = 702,527 LR chi2(5) = 19931.81 Prob > chi2 = 0.0000 Log likelihood = -31215.833 Pseudo R2 = 0.2420 -------------------------------------------------------------------------------- patent | Coefficient Std. err. z P>|z| [95% conf. interval] ---------------+---------------------------------------------------------------- 1.network_lag2 | .1536963 .0297688 5.16 0.000 .0953506 .212042 ln_x2_lag1 | .3267265 .0036255 90.12 0.000 .3196207 .3338324 ln_x3_lag1 | .3174369 .0169649 18.71 0.000 .2841864 .3506874 ln_x4_lag1 | .0070056 .0037633 1.86 0.063 -.0003703 .0143814 ln_z1_lag1 | .1367399 .0027505 49.72 0.000 .1313491 .1421307 _cons | -7.848525 .1660105 -47.28 0.000 -8.1739 -7.523151 -------------------------------------------------------------------------------- Note: 531 failures and 0 successes completely determined. Warning: regressor matrix for patent equation appears ill-conditioned. (Condition number = 198.30252.) This might prevent convergence. If it does, and if you have not done so already, you may need to remove nearly collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or nonrtolerance option to the command line. See cmp tips. Fitting constant-only model for LR test of overall model fit. Fitting full model. Random effects/coefficients modeled with Gauss-Hermite quadrature with 12 integration points. Iteration 0: log likelihood = -32953.079 Iteration 1: log likelihood = -30996.281 Iteration 2: log likelihood = -28848.869 Iteration 3: log likelihood = -27797.233 Iteration 4: log likelihood = -27507.701 Iteration 5: log likelihood = -27485.384 Iteration 6: log likelihood = -27463.847 Iteration 7: log likelihood = -27458.71 Iteration 8: log likelihood = -27456.006 Performing Naylor-Smith adaptive quadrature. Iteration 9: log likelihood = -27453.801 Iteration 10: log likelihood = -27452.067 Iteration 11: log likelihood = -27450.414 Iteration 12: log likelihood = -27449.003 Iteration 13: log likelihood = -27448.141 Iteration 14: log likelihood = -27447.015 Iteration 15: log likelihood = -27446.366 Iteration 16: log likelihood = -27445.892 Iteration 17: log likelihood = -27445.339 Iteration 18: log likelihood = -27444.425 Iteration 19: log likelihood = -27443.872 Iteration 20: log likelihood = -27443.818 Iteration 21: log likelihood = -27443.772 Iteration 22: log likelihood = -27443.733 Iteration 23: log likelihood = -27443.696 Iteration 24: log likelihood = -27443.646 Iteration 25: log likelihood = -27443.576 Iteration 26: log likelihood = -27443.509 Iteration 27: log likelihood = -27443.466 Iteration 28: log likelihood = -27443.456 Adaptive quadrature points fixed. Iteration 29: log likelihood = -27443.443 Iteration 30: log likelihood = -27443.435 Iteration 31: log likelihood = -27443.428 Iteration 32: log likelihood = -27443.428 Iteration 33: log likelihood = -27443.428 Iteration 34: log likelihood = -27443.428 Iteration 35: log likelihood = -27443.427 Iteration 36: log likelihood = -27443.427 Iteration 37: log likelihood = -27443.427 Iteration 38: log likelihood = -27443.427 Iteration 39: log likelihood = -27443.427 Iteration 40: log likelihood = -27443.427 Iteration 41: log likelihood = -27443.427 Iteration 42: log likelihood = -27443.427 Iteration 43: log likelihood = -27443.427 Iteration 44: log likelihood = -27443.427 Iteration 45: log likelihood = -27443.427 Iteration 46: log likelihood = -27443.426 Iteration 47: log likelihood = -27443.426 Iteration 48: log likelihood = -27443.426 Iteration 49: log likelihood = -27443.426 Mixed-process multilevel regression Number of obs = 702,626 LR chi2(9) = 7584.44 Log likelihood = -27443.426 Prob > chi2 = 0.0000 -------------------------------------------------------------------------------- | Coefficient Std. err. z P>|z| [95% conf. interval] ---------------+---------------------------------------------------------------- green_patent | 1.network_lag2 | .3931953 .2002509 1.96 0.050 .0007108 .7856798 ln_x2_lag1 | .0210191 .0402051 0.52 0.601 -.0577815 .0998197 ln_x3_lag1 | .1517612 .1453396 1.04 0.296 -.1330991 .4366214 ln_x4_lag1 | .0002584 .0365224 0.01 0.994 -.0713242 .071841 _cons | -4.150078 1.656941 -2.50 0.012 -7.397623 -.9025338 ---------------+---------------------------------------------------------------- patent | 1.network_lag2 | .2047174 .0559212 3.66 0.000 .0951138 .314321 ln_x2_lag1 | .4954168 .0091743 54.00 0.000 .4774355 .5133981 ln_x3_lag1 | .2787427 .0292022 9.55 0.000 .2215076 .3359779 ln_x4_lag1 | .0325308 .007655 4.25 0.000 .0175274 .0475342 ln_z1_lag1 | .1816804 .005461 33.27 0.000 .170977 .1923838 _cons | -10.01032 .2889907 -34.64 0.000 -10.57674 -9.443913 ---------------+---------------------------------------------------------------- /lnsig_1_1 | .4271777 .0768412 5.56 0.000 .2765716 .5777837 /lnsig_1_2 | .2160637 .0152018 14.21 0.000 .1862688 .2458587 /atanhrho_1_12 | -.0301985 .0641565 -0.47 0.638 -.1559428 .0955459 /atanhrho_12 | -.2726755 .1104226 -2.47 0.014 -.4890998 -.0562511 -------------------------------------------------------------------------------- ------------------------------------------------------------------------------------ Random effects parameters | Estimate Std. Err. [95% Conf. Interval] ------------------------------------+----------------------------------------------- Level: ID | green_patent | Standard deviations | _cons | 1.532925 .1177918 1.318601 1.782084 patent | Standard deviations | _cons | 1.241181 .0188682 1.204746 1.278719 Cross-eq correlation | green_patent patent | _cons _cons | -.0301893 .064098 -.1546909 .0952562 ------------------------------------+----------------------------------------------- Level: Observations | Standard deviations | green_patent | 1 (constrained) patent | 1 (constrained) Cross-eq correlation | green_patent patent | -.2661126 .1026029 -.4535017 -.0561919 ------------------------------------------------------------------------------------
Comment