Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • education to occupation mismatch using mode (realized matches)

    Dear all,

    i'm trying to compute the level of education to occupation mismatch for the individuals in my dataset. the idea is to take the mode of education for a particular occupation, hence the most common level of education and then take a standard deviation above and a standard deviation below that level and consider those individuals either overeducated or undereducated. the thing is, i don't know how to compute the mode for both these variables, they need to be somehow combined, and this eludes my stata skills. Would you happen to have an idea of how to go about this?
    Thank you very much in advance.


  • #2
    magda:
    as the mode is the most frequent value of a given variable, you may want to go this way (data absolutely not real):
    Code:
    . set obs 10
    obs was 0, now 10
    
    . 
    . g education=runiform() in 1/9
    (1 missing value generated)
    
    . 
    . replace education=education[9] if education==.
    (1 real change made)
    
    . 
    . tab edu
    
      education |      Freq.     Percent        Cum.
    ------------+-----------------------------------
       .0610638 |          1       10.00       10.00
       .1086679 |          1       10.00       20.00
       .1369841 |          1       10.00       30.00
       .5552388 |          2       20.00       50.00
       .5578017 |          1       10.00       60.00
       .6047949 |          1       10.00       70.00
       .6184582 |          1       10.00       80.00
       .6432207 |          1       10.00       90.00
        .684176 |          1       10.00      100.00
    ------------+-----------------------------------
          Total |         10      100.00
    
    . 
    . g mode=.5552388
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Code:
      sysuse nlsw88, clear
      bys occupation : egen mode = mode(grade)
      bys occupation : egen sd = sd(grade)
      gen lb = mode - sd
      gen ub = mode + sd
      gen edfit = cond(grade < lb, 1, ///
                  cond(grade < ub, 2, 3)) if !missing(grade,ub,lb)
      label variable edfit "education to occupation match"
      label define edfit 1 "undereducated" ///
                         2 "fit"           ///
                         3 "overeducated"
      label value edfit edfit
      tab edfit
      
      // alternative definition of over- and undereducated
      bys occupation : egen lb2 = pctile(grade), p(20)
      bys occupation : egen ub2 = pctile(grade), p(80)
      gen edfit2 = cond(grade < lb2, 1, ///
                   cond(grade < ub2, 2, 3)) if !missing(grade,ub2,lb2)
      label variable edfit2 "education to occupation match 2"
      label value edfit2 edfit
      tab1 edfit edfit2, miss
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Maarten:
        outstanding as usual!
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Hello Maarten,
          I just tried to apply your code in stata and it seems to work well- thank you.
          I have one question: I don't understand the code from this point on - could you pls explain what is happening there?

          gen lb = mode - sd
          gen ub = mode + sd
          gen edfit = cond(grade < lb, 1, ///
          cond(grade < ub, 2, 3)) if !missing(grade,ub,lb)

          Thanks
          Anja


          Comment


          • #6
            gen lb = mode - sd
            create a new variable called lb, containing the modal grade for that occupation minus the standard deviation of grade for that occupation

            gen ub = mode + sd
            as above only plus

            gen edfit = cond(grade < lb, 1, ///
            cond(grade < ub, 2, 3)) if !missing(grade,ub,lb)

            create a new variable edfit, which gets a
            • 1 if the person's grade is less than the modal grade minus the standard deviation for it's occupation (under-educated)
            • 2 if the person's grade is between modal grade minus standard deviation and model grade plus standard deviation of it's occupation (normal-educated)
            • 3 if the person's grade is above the modal grade plus the standard deviation for it's occupation (over-educated)
            maybe you got confused by the cond() function. If that is the case, see help cond()
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Originally posted by Maarten Buis View Post
              gen lb = mode - sd
              create a new variable called lb, containing the modal grade for that occupation minus the standard deviation of grade for that occupation

              gen ub = mode + sd
              as above only plus

              gen edfit = cond(grade < lb, 1, ///
              cond(grade < ub, 2, 3)) if !missing(grade,ub,lb)

              create a new variable edfit, which gets a
              • 1 if the person's grade is less than the modal grade minus the standard deviation for it's occupation (under-educated)
              • 2 if the person's grade is between modal grade minus standard deviation and model grade plus standard deviation of it's occupation (normal-educated)
              • 3 if the person's grade is above the modal grade plus the standard deviation for it's occupation (over-educated)
              maybe you got confused by the cond() function. If that is the case, see help cond()

              Hi Marteen,

              I am trying to do something similar. However, I want to control for selection bias using Heckman approach. When I generate the occupation mismatch for the full sample I have individuals who are not working/wages not reported. How do I construct this measure in such a scenario?
              I would appreciate your help.
              Thanks in advance

              Comment


              • #8
                With a Heckmann correction you can get a bias correction for the mean and standard deviation (if all assumptions are correct, and that is far from trivial in this case). I don't think that this is worth it in this case. 1 SD above and below the mode is at its core an arbitrary choice for a definition for over and under educated. By doing something like a Heckmann correction you suggest more precision than actually exists. I would probably use an absolute measure that does not depend on the distribution. Have experts rate a set of occupations on what the required level of education is, and everybody in that occupation who has one level more is overeducated.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment

                Working...
                X