Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to work with censored data

    I am working with the rates of returns to education. I wish to look at just the first stage to see the impact of a policy (change in compulsory schooling laws) variable on the years in education for individuals. My sample includes individuals between the ages of 14 and 65. I want to do something along the lines of survival analysis using the Cox method, where the individuals who are still in education are right-censored.

    So, my equation is: E=a+b*P+c*X+ u

    Where E is the years in education, P refers to the policy, and X includes other controls such as age, gender, ethnicity, etc

    I used the following commands:
    gen policy=0
    replace policy=1 if age<25 (since the policy affects only those below the age of 25)
    gen censor=1
    replace censor=0 if main_activity==student
    stset years_in_education, failure(censor)
    stcox policy age age2 urban sinhalese female married if age>14 & age<65

    I want to then obtain a predicted value for the years in education which I could then substitute back into the main regression, which is:
    ln_earnings=E_PREDICTED+c*X+ e

    Can I please know how to obtain an appropriate predicted value?

    I also tried another route (instead of the Cox method) to have a censored regression (using the "cnreg" command) but that is not supported by stata anymore.

    Any advice is appreciated

  • #2
    The help-file for cnreg states that the alternative to cnreg is intreg. The functionality offered by cnreg did not disapear, it was just taken over by the command intreg. So that where I would start looking.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      I am not familiar with survival analysis, nor Cox method, so I can't do any comments on your code.

      However, you said
      also tried another route (instead of the Cox method) to have a censored regression
      If you want to do a censored regression, I advice you to take a look on the Tobit and Heckman model description (the ones in Microeconometrics Using Stata, by Cameron & Trivedi are very good.), which resprectively are designed to censured data and selection models.

      Hope this helped.

      Charlie

      Comment


      • #4
        Yes that is useful. I tried this before and it didn't work.After you suggested it, I went back to it, and it did work! Thank you for the advice

        Comment


        • #5
          I have been reading up on the tobit model. I will also look at the Heckman model. Thank you

          Comment


          • #6
            I second the suggestions that you look into Tobit and interval regression models (think intreg, as Maarten said), as they are designed to handle data in which the outcome variable is censored. (I would not recommend "Heckman" models in this context, as they are for situations when you have sample truncation, not censoring. Crudely speaking, censoring is when you have all the sample but incomplete information on the variable of interest; truncation is when some are not present at all.)

            Your description of your research problem suggests to me that there are several other issues you need to address. (a) "years of education" is typically an integer variable with a rather unusual distribution -- clusters round minimum education completion, an upper bound, and potentially spikes at zero or other very low values if you have developing country data. This may need special treatment in your analysis. (b) you seek a two-stage estimation method, but seem unaware that getting good estimates at the second-stage needs to take into account the uncertainty associated with the predicted variables. Google on "Murphy Topel" to see literature (some of which is in the Stata Journal), and also look at the latest Stata Blog posting http://blog.stata.com/2014/12/08/usi...tion-problems/

            In addition, I don't fully understand from your description how you will utilise data on "all persons 16-65" to examine the change in compulsory schooling law (which presumably occurred in a particular year). There is a huge literature in economics that exploits changes in schooling laws to look at returns to education. If you are mimicking one of these studies, then it might be as well to cite it here (using full bibliographic reference) and explain the methods that they used. My recollection is the methods differ from what you outline.

            Comment

            Working...
            X