education to occupation mismatch using mode (realized matches)

magda ulceluse

Join Date: Jun 2015

Posts: 1
#1

education to occupation mismatch using mode (realized matches)

24 Jun 2015, 02:01

Dear all,

i'm trying to compute the level of education to occupation mismatch for the individuals in my dataset. the idea is to take the mode of education for a particular occupation, hence the most common level of education and then take a standard deviation above and a standard deviation below that level and consider those individuals either overeducated or undereducated. the thing is, i don't know how to compute the mode for both these variables, they need to be somehow combined, and this eludes my stata skills. Would you happen to have an idea of how to go about this?
Thank you very much in advance.
Tags: None

1 like

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17607

24 Jun 2015, 02:36

magda:
as the mode is the most frequent value of a given variable, you may want to go this way (data absolutely not real):

Code:

. set obs 10
obs was 0, now 10

. 
. g education=runiform() in 1/9
(1 missing value generated)

. 
. replace education=education[9] if education==.
(1 real change made)

. 
. tab edu

  education |      Freq.     Percent        Cum.
------------+-----------------------------------
   .0610638 |          1       10.00       10.00
   .1086679 |          1       10.00       20.00
   .1369841 |          1       10.00       30.00
   .5552388 |          2       20.00       50.00
   .5578017 |          1       10.00       60.00
   .6047949 |          1       10.00       70.00
   .6184582 |          1       10.00       80.00
   .6432207 |          1       10.00       90.00
    .684176 |          1       10.00      100.00
------------+-----------------------------------
      Total |         10      100.00

. 
. g mode=.5552388

Kind regards,
Carlo
(StataNow 18.5)

Comment

Maarten Buis

Join Date: Mar 2014
Posts: 3407

24 Jun 2015, 02:44

Code:

sysuse nlsw88, clear
bys occupation : egen mode = mode(grade)
bys occupation : egen sd = sd(grade)
gen lb = mode - sd
gen ub = mode + sd
gen edfit = cond(grade < lb, 1, ///
            cond(grade < ub, 2, 3)) if !missing(grade,ub,lb)
label variable edfit "education to occupation match"
label define edfit 1 "undereducated" ///
                   2 "fit"           ///
                   3 "overeducated"
label value edfit edfit
tab edfit

// alternative definition of over- and undereducated
bys occupation : egen lb2 = pctile(grade), p(20)
bys occupation : egen ub2 = pctile(grade), p(80)
gen edfit2 = cond(grade < lb2, 1, ///
             cond(grade < ub2, 2, 3)) if !missing(grade,ub2,lb2)
label variable edfit2 "education to occupation match 2"
label value edfit2 edfit
tab1 edfit edfit2, miss

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17607
#4

24 Jun 2015, 03:01

Maarten:
outstanding as usual!

Kind regards,
Carlo
(StataNow 18.5)
Comment
Anja Wunder

Join Date: Mar 2017

Posts: 3
#5

21 Nov 2018, 04:57

Hello Maarten,
I just tried to apply your code in stata and it seems to work well- thank you.
I have one question: I don't understand the code from this point on - could you pls explain what is happening there?

gen lb = mode - sd
gen ub = mode + sd
gen edfit = cond(grade < lb, 1, ///
cond(grade < ub, 2, 3)) if !missing(grade,ub,lb)

Thanks
Anja
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3407
#6

21 Nov 2018, 06:03

gen lb = mode - sd
create a new variable called lb, containing the modal grade for that occupation minus the standard deviation of grade for that occupation

gen ub = mode + sd
as above only plus

gen edfit = cond(grade < lb, 1, ///
cond(grade < ub, 2, 3)) if !missing(grade,ub,lb)
create a new variable edfit, which gets a
1 if the person's grade is less than the modal grade minus the standard deviation for it's occupation (under-educated)

2 if the person's grade is between modal grade minus standard deviation and model grade plus standard deviation of it's occupation (normal-educated)

3 if the person's grade is above the modal grade plus the standard deviation for it's occupation (over-educated)

maybe you got confused by the cond() function. If that is the case, see help cond()

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Mudabira Fayaz

Join Date: Apr 2023

Posts: 3
#7

19 Feb 2024, 05:20

Originally posted by Maarten Buis View Post

gen lb = mode - sd
create a new variable called lb, containing the modal grade for that occupation minus the standard deviation of grade for that occupation

gen ub = mode + sd
as above only plus

gen edfit = cond(grade < lb, 1, ///
cond(grade < ub, 2, 3)) if !missing(grade,ub,lb)
create a new variable edfit, which gets a
1 if the person's grade is less than the modal grade minus the standard deviation for it's occupation (under-educated)

2 if the person's grade is between modal grade minus standard deviation and model grade plus standard deviation of it's occupation (normal-educated)

3 if the person's grade is above the modal grade plus the standard deviation for it's occupation (over-educated)

maybe you got confused by the cond() function. If that is the case, see help cond()

Hi Marteen,

I am trying to do something similar. However, I want to control for selection bias using Heckman approach. When I generate the occupation mismatch for the full sample I have individuals who are not working/wages not reported. How do I construct this measure in such a scenario?
I would appreciate your help.
Thanks in advance
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3407
#8

19 Feb 2024, 13:52

With a Heckmann correction you can get a bias correction for the mean and standard deviation (if all assumptions are correct, and that is far from trivial in this case). I don't think that this is worth it in this case. 1 SD above and below the mode is at its core an arbitrary choice for a definition for over and under educated. By doing something like a Heckmann correction you suggest more precision than actually exists. I would probably use an absolute measure that does not depend on the distribution. Have experts rate a set of occupations on what the required level of education is, and everybody in that occupation who has one level more is overeducated.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

education to occupation mismatch using mode (realized matches)

Comment

Comment

Comment

Comment

Comment

Comment

Comment