Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GSEM for heckprob

    Hello,
    I want to use gsme command for estimating three equations. the first two equations are selection equation and a probit model. If I can account for selection effect for these two equations, I would add the third equation later.
    I want to know how in gsem we can use "heckprob" command or anything that can be used for estimating the two probabilities (selection and outcome dommy) together.

    in the stata help for gsem, page 433, Heckman selection model, explains how to produce the same result as "heckman" when the outcome is a continuous variable.

    Below, I have tried to follow the same procedure as for "heckman" in gsem, for the case when both the dependent variables of selection and outcome are dummy variables. I tried to transfer the results from the gsem procedure to what we can get from "heckprob", but the transfomation of the estimated variables cannot produce the "heckprob" results.
    I was wondering if there is any procedure to produce the same results in gsem context as those that the command "heckprob", produces.
    I greatly appreciate any help.

    Here we can use the following data set from the example in the Stata help for "heckprob" command to check the results from gsem and "heckprob"

    use http://www.stata-press.com/data/r13/school
    *preparing data for heckman method in gsem:

    gen vote_selec=0 if vote==1 /*Creates missing values for vote!=1 */
    gen vote_noselec=0 if vote==0
    gen private1=private
    replace private1=. if vote==0

    *gsem:
    gsem (private1 <- years logptax L) (vote_selec <- years loginc logptax L@1, family(gaussian, udepvar(vote_noselec))), var(L@1 e.private1@a e.vote_selec@a)

    *heckprob
    . heckprob private years logptax, select(vote=years loginc logptax)

    Using the guide line in the section "Heckman selection model" in page 433 of gsem help in stata, I cannot get the same "rho" and other parameters by using the functions introduced there to transfer the results from gsem to "heckman".
    Please let me know if there is any way to do the same precidure as "Heckman" in gsem for "heckprob"
    I also tried the following command using probit and linking both probit together by a latent variable "L":

    gsem (private <- years logptax L , probit ) (vote <- years loginc logptax L@1,probit), var(L@1)

    But i do not know is that a correct way to do it? How can we extract the selection issue (the correlation between two errors) from it? and is it any way to compare it with "heckprob"?
    Any reference of guideline is appreciated.

    Hossein

  • #2
    Would you please let me know if there is a way to do ". heckprob" in the context of gsem using , for example, the following data set?

    use http://www.stata-press.com/data/r13/school

    *heckprob
    . heckprob private years logptax, select(vote=years loginc logptax)

    Comment


    • #3
      Dear Hossein,

      Heckman-selection probit models can be fit with -heckprobit- but also using the -gllamm- wrapper -ssm-. Here's an example:

      Code:
      webuse nlswork, clear
      
      * set some observations to missing
      replace ln_wage=. if union==.
      
      * generate binary high-wage variable
      gen highwage = (ln_wage > 2)
      replace highwage=. if ln_wage==.
      
      * generate selection variable
      gen selvar = (ln_wage!=.)
      
      * keep analytic sample
      keep if !missing(selvar,ttl_exp,collgrad,nev_mar,south)
      
      * rename variables (for simplicity)
      rename highwage y
      rename ttl_exp x1
      rename collgrad x2
      rename selvar s
      rename nev_mar z1
      keep y x1 x2 s z1
      
      * fit model using -heckprobit-
      heckprobit y x1 x2, select(s = x1 x2 z1)
      
      * fit the same model using -ssm-
      ssm y x1 x2, switch(s = x1 x2 z1) select family(binomial) link(probit) trace
      // Note that you get two output tables.
      // They are the same estimates, but transformed.
      // Check this: they have the exact same log-llikelihood, but the lower table are a factor 1.0627 times the upper table.
      Now, let's repeat this using -gsem-.
      Code:
      * duplicate all observations
      gen id = _n
      expand 2, gen(select)
      
      * now create two sets of variables: one for the outcome sample (_o) and one for the selection sample (_s).
      * that is, we are "stacking" two models in one.
      foreach x in x1 x2 z1 {
          gen `x'_o = `x'*(select==0)
          gen `x'_s = `x'*(select==1)
      }
      gen cons_o = (select==0)
      gen cons_s = (select==1)
      
      * set the dependent variable to the outcome in the outcome sample, and to the selection variable in the selection sample.
      gen resp=y
      replace resp=s if select==1
      
      * drop observations that are not needed
      drop if resp==.
      
      * we fit a "stacked" probit model.
      * each sample has its own constant, but the constants are correlated through a random effect at the id-level (remember that this identifies the original, non-duplicated observations)
      * for scaling, the random effect is set to 1 in the selection equation, and its variance is set to 1 as well.
      gsem (resp <- x1_o x2_o cons_o c.cons_o#L[id], nocons probit ) (resp <- x1_s x2_s z1_s cons_s c.cons_s#L[id]@1, nocons probit), var(L[id]@1) vce(cluster id)
      The -gsem- output is nearly identical (slightly different because of a different estimator) to the upper table from the -ssm- command. It only needs to be transformed - and I don't know how - to obtain the -heckprobit- estimates. Hopefully a more knowledgeable person can help us further.

      Comment


      • #4
        The -ssm- manual indicates how to rescale the parameter estimates into the usual scaling of a -heckprobit- model. Estimates from the outcome equation must be divided by the square root of λ2+1 (where lambda is the random effect loading), and estimates from the selection equation must be divided by the square root of 2. Standard errors can be obtained using the delta method. Let's try this:

        Code:
        foreach x in x1_o x2_o cons_o {
            local coefs_o `coefs_o' (_b[resp:`x']/sqrt(_b[resp:c.cons_o#L[id]]^2 + 1))
        }
        foreach x in x1_s x2_s z1_s cons_s {
            local coefs_s `coefs_s' (_b[resp:`x']/sqrt(2))
        }
        nlcom `coefs_o' `coefs_s'
        You can see that the rescaled estimates from -gsem- are nearly identical to those from -heckprobit- and -ssm-.
        Last edited by Bram Hogendoorn; 05 Feb 2021, 12:08.

        Comment

        Working...
        X