Generate a new variable based on the values of a second variable

Marcus Eklund

Join Date: Dec 2021

Posts: 29
#1

Generate a new variable based on the values of a second variable

27 Dec 2021, 09:28

Hi,

I'm trying to generate a continuous variable using a comparative dataset where I want the countries to be sorted based on the share of highly educated within the country. Does anyone know which command is best to use? I'm a bit confused - sorry for a beginner question:-)

All the best,
Marcus
Tags: None
Marcus Eklund

Join Date: Dec 2021

Posts: 29
#2

27 Dec 2021, 09:34

I don't want the countries to solely have the values 1,2,3 etc but I want them to represent the share of highly educated.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#3

27 Dec 2021, 10:33

Speaking in general terms, the -generate- command is the command that is most often used to create new variables. But your question provides zero information about what data you have to start with, so maybe there is some other way in your case. Who knows? If you want more specific advice, please post back showing example data, using the -dataex- command to do so. And be sure to also explain which variables are the ones from which we might figure out the share of highly educated in each country.

If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment
Marcus Eklund

Join Date: Dec 2021

Posts: 29
#4

28 Dec 2021, 10:38

Thank you for clarifying! Unfortunately, I only got one country represented in the sample but the number of countries in the sample (using ESS-data) is above 30. I want to generate a variable based on the share of highly educated within the country. I have tried this code "by cntry, sort: egen share = mean(educat)", but this will only give me the mean educational level in the country, not the share of highly educated (that has the value three within the education variable). If you have any idea of how to help, please let me know!

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double educat str2 cntry 1 "AT" 2 "AT" 1 "AT" 2 "AT" 2 "AT" 1 "AT" 2 "AT" 1 "AT" 3 "AT" 1 "AT" end label values educat labedu label def labedu 1 "Low", modify label def labedu 2 "Medium", modify label def labedu 3 "High", modify label var educat "RECODE of edulvla (Highest level of education)" label var cntry "Country"
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#5

28 Dec 2021, 10:55

Code:

by cntry, sort: egen wanted = mean(3.educat)

Added: You don't specify how you want to handle missing values of the variable educat. Perhaps there aren't any and it doesn't matter. But if there are, the above code simply omits them from consideration altogether. That is, it excludes those observations from both numerator and denominator in calculating the proportion that are highly educated.

If you want to count a missing value of educat in the denominator (but, obviously, not in the numerator), then it would be:

Code:

by cntry, sort: egen wanted = mean(educat == 3)

Last edited by Clyde Schechter; 28 Dec 2021, 10:58.
1 like
Comment
Marcus Eklund

Join Date: Dec 2021

Posts: 29
#6

28 Dec 2021, 12:12

Thank you so much! It worked. Would it be possible to add yet another variable to the syntax? I have divided the variable age (in years) into three groups that represent different cohorts. How do I make a variable that measures the percentage of highly educated among the youngest generation divided by the percentage highly educated among the oldest generation for each country? Would it be something like this:
by cntry, sort: egen wanted = mean(educat == 3 if cohort == 1 / educat == 3 if cohort == 3)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29799
#7

28 Dec 2021, 12:18

Code:

by cntry cohort, sort: egen wanted2 = mean(educat == 3) by cntry (cohort): gen wanted_ratio = wanted2[1]/wanted2[_N] if cohort[1] == 1 & cohort[_N] == 3

Here wanted 2 is the proportion of highly educated calculated separately for each cohort within country. Then wanted_ratio calculates, for each country, the ratio between wanted2 in cohort 1 and wanted2 in cohort 3. Note that if for some country your data does not include at least some observations for both cohorts 1 and cohorts 3, the wanted_ratio will be missing for that country.
Comment

Announcement

Generate a new variable based on the values of a second variable

Comment

Comment

Comment

Comment

Comment

Comment