Summing total students based on different age cateogry and education level

Tariq Abdullah

Join Date: Apr 2021
Posts: 366

Summing total students based on different age cateogry and education level

06 Apr 2024, 16:56

Hello,

the line column gives you full time and part time student numbers depending on the column lstudy which has three different values for undergraduate , first time professional and graduate. The vlaue of lstudy ==2 doesn't show up in this data sample because this unitid doesn't have lstuddy==2.

how can i sum the number of students for different age category of different values of lstudy for each unique unitid ?

label def label_line 1 "Full-time, under 18", modify: This line indicates that the label "Full-time, under 18" corresponds to the row representing the number of full-time students under 18 years old in that unique unitid.

label def label_line 3 "Full-time, 20-21", modify: This label represents the row for the number of full-time students aged 20-21 years old.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long unitid byte(line lstudy xefage01) int efage01 byte xefage02 int efage02
100654  1 1 10   36 10   62
100654  2 1 10  814 10  886
100654  3 1 10  617 10  763
100654  4 1 10  366 10  348
100654  5 1 10  139 10  103
100654  6 1 10   31 10   51
100654  7 1 10   11 10   22
100654  8 1 10   15 10   37
100654  9 1 10    2 10    5
100654 10 1 10    2 10    0
100654 11 1 10    3 10    1
100654 12 1 10 2036 10 2278
100654 13 1 10    2 10    0
100654 14 1 10   13 10   10
100654 15 1 10   26 10   25
100654 16 1 10   38 10   40
100654 17 1 10   31 10   32
100654 18 1 10   16 10   17
100654 19 1 10    8 10   25
100654 20 1 10   19 10   25
100654 21 1 10    6 10   10
100654 23 1 10    5 10    9
100654 24 1 10  164 10  193
100654 25 1 10 2200 10 2471
100654  3 3 10    0 10    1
100654  4 3 10   47 10   92
100654  5 3 10   46 10  115
100654  6 3 10   30 10   56
100654  7 3 10   15 10   25
100654  8 3 10   24 10   52
100654  9 3 10    7 10   12
100654 10 3 10    0 10    1
100654 11 3 10    1 10    7
100654 12 3 10  170 10  361
100654 15 3 10    1 10    0
100654 16 3 10   18 10   47
100654 17 3 10   56 10  112
100654 18 3 10   41 10   84
100654 19 3 10   32 10   62
100654 20 3 10   36 10  113
100654 21 3 10    7 10   36
100654 23 3 10    2 10    0
100654 24 3 10  193 10  454
100654 25 3 10  363 10  815
end
label values line label_line
label def label_line 1 "Full-time, under 18", modify
label def label_line 2 "Full-time, 18-19", modify
label def label_line 3 "Full-time, 20-21", modify
label def label_line 4 "Full-time, 22-24", modify
label def label_line 5 "Full-time, 25-29", modify
label def label_line 6 "Full-time, 30-34", modify
label def label_line 7 "Full-time, 35-39", modify
label def label_line 8 "Full-time, 40-49", modify
label def label_line 9 "Full-time, 50-64", modify
label def label_line 10 "Full-time, 65 and over", modify
label def label_line 11 "Full-time, age unknown", modify
label def label_line 12 "Total full-time", modify
label def label_line 13 "Part-time, under 18", modify
label def label_line 14 "Part-time, 18-19", modify
label def label_line 15 "Part-time, 20-21", modify
label def label_line 16 "Part-time, 22-24", modify
label def label_line 17 "Part-time, 25-29", modify
label def label_line 18 "Part-time, 30-34", modify
label def label_line 19 "Part-time, 35-39", modify
label def label_line 20 "Part-time, 40-49", modify
label def label_line 21 "Part-time, 50-64", modify
label def label_line 23 "Part-time, age unknown", modify
label def label_line 24 "Total part-time", modify
label def label_line 25 "Grand total, all students", modify
label values lstudy label_lstudy
label def label_lstudy 1 "Undergraduate", modify
label def label_lstudy 3 "Graduate", modify

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

06 Apr 2024, 18:38

I'm sorry, but I don't understand how what you want relates to the data you have. In the example data shown, the combination of unitid, line, and lstudy uniquely identifies observations. Then you have these other variables, xefage01, efage01, xefage02, and efage02, which you do not explain and whose names do not suggest to me what they might be (except that perhaps they have something to do with age.) At a guess, do those variables give number of students who are in the unitid, line, and lstudy grouping and have some additional attributes such that the definitions of these *age* variables make them mutually exclusive and exhaustive? If so, then I think all you need is:

Code:

egen wanted = rowtotal(*age*)

If that doesn't do what you want, please post back with a clearer explanation of how you might derive what you are looking for from the data you have if you were doing it by hand.
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#3

06 Apr 2024, 19:49

I'm sorry. Let me gather my thoughts and get back to you with clear explanation of what I'm trying to do. apologies this the data from 2002-2008 is much chearly defined and I dealth it with myself but this data is from 2001. Unfortunately this is all over the place
Comment

Tariq Abdullah

Join Date: Apr 2021
Posts: 366

06 Apr 2024, 22:00

efage01 stabnds for number of male students in that age category and efage02 stands for number of female students in that row or age category.
xefage01 and xefage02 are not necessary for this calculation.

Code:

egen wanted = total(efage09), by(efbage unitid)

tabdisp efbage unitid, c(wanted)

-----------------------------------------
|                unitid
efbage | 100654 100663
-------------------------+---------------
All age categories total | 12152 33122
Age under 25 total | 8772 19074
Age under 18 | 1886 22
Age 18-19 | 3398 4652
Age 20-21 | 1778 6102
Age 22-24 | 1710 8298
Age 25 and over total | 2670 14048
Age 25-29 | 1096 6954
Age 30-34 | 520 2956
Age 35-39 | 328 1578
Age 40-49 | 524 899
Age 50-64 | 198
Age 65 and over | 4
Age unknown | 710
-----------------------------------------

I want to get the total number of students ( male + female ) for specified age category. The trick part here the line column (besides the age category ) includes the part time and full time here so that makes it difficult. Otherwise i cold have used the following command

Code:

egen wanted = total(efage09), by(efbage unitid)

Last edited by Tariq Abdullah; 06 Apr 2024, 22:06.

Comment

Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#5

06 Apr 2024, 22:05

This is my another post where my dataset way a little more clean. This particular data posted here is little all over the place.

https://www.statalist.org/forums/for...t-age-category
Comment

Tariq Abdullah

Join Date: Apr 2021
Posts: 366

06 Apr 2024, 22:25

For the dataset in my another post this is what helped me to get the eventual form I wanted to make for my intended goal of all the ages for each unique unit id

Code:

drop if               efbage == 1
drop if               efbage == 2 

drop if               efbage == 7 


* Sort the data by unitid and efbage

sort unitid efbage

*Create a new variable to store the sum for each unitid and age category

egen sum_efage09 = total(efage09), by(unitid efbage)


* Now, for each unique unitid, calculate the sum of efage09 for each age category

bysort unitid: egen total_age_under_18 = total(efage09) if efbage == 3 | efbage == 4


bysort unitid: egen total_age_20_to_24 = total(efage09) if efbage == 5 | efbage == 6


bysort unitid: egen total_age_25_to_29 = total(efage09) if efbage == 8


bysort unitid: egen total_age_30_to_34 = total(efage09) if efbage == 9


bysort unitid: egen total_age_35_to_39 = total(efage09) if efbage == 10


bysort unitid: egen total_age_40_to_49 = total(efage09) if efbage == 11


bysort unitid: egen total_age_50_to_64 = total(efage09) if efbage == 12


bysort unitid: egen total_age_65_and_over = total(efage09) if efbage == 13


drop efbage            lstudy efage09
drop                      sum_efage09                   
*year


collapse (sum) total_age_under_18 total_age_20_to_24 total_age_25_to_29 total_age_30_to_34 total_age_35_to_39 total_age_40_to_49 total_age_50_to_64 total_age_65_and_over, by(unitid)

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

07 Apr 2024, 09:53

The main problem here is that you don't have a variable that is purely age group. You have it inextricably attached to full-time/part-time status. Separating those working only from the numeric codes in label_line would be feasible, but tedious and error-prone. Fortunately, the labels have a highly regular structure, with the age group offset from the full/part-time status part by a comma. So converting line to a string variable makes it simpler here.

The observations where the value of line is a total or grand total are also in the way and have to be excluded from the calculations to avoid double counting. (More on this below.)

Starting from the data example originally posted in #1:

Code:

decode line, gen(edit) replace edit = "" if strpos(lower(edit), "total") split edit, parse(",") gen(edits) drop edit edits1 rename edits2 age_group by age_group, sort: egen wanted = total(efage01 + efage02) replace wanted = . if missing(age_group) sort unitid lstudy line

Note: This Stata data set is organized like a spreadsheet, and that contributes to the difficulties here. It is also a trap, setting you up for mistakes when you try to calculate things. In particular, a Stata data set should never contain both raw data observations and summary observations like totals and subtotals, because then you have to exclude those totals and subtotals from all the calculations in order to avoid double-counting things. (That -replace edit = "" if ...- command is a workaround for this problem in this case.)
1 like
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#8

08 Apr 2024, 20:04

Mr. Clyde,

Thanks so much for helping me neavigate these very complicated step. I was simply at a loss how to proceed with these messy data and your coding really helped resolving the issue.

Again, much appreciate the mentoring and kindness!
Comment

Announcement

Summing total students based on different age cateogry and education level

Comment

Comment

Comment

Comment

Comment

Comment

Comment