Division of the total values of variables

Jorge Martin

Join Date: Apr 2024

Posts: 19
#1

Division of the total values of variables

03 Jun 2024, 07:40

Hi everyone,

I need help calculating the per capita education expenditure per grade level in Stata. The formula is expenditure per grade level (e.g., teducpreprimary) divided by the number of currently attending students per grade level (e.g. educ_group if ==1 & lc08_cursch==1)

I have three relevant variables:

1. educ_group, where each grade level is classified from 1 to 5 (e.g., 1=preprimary).
2. lc08_cursch, indicating if a student is currently attending school (1=yes, 2=no).
3. teducprimary, teducpreprimary, teducsecondary, teducpostsec, teductertiary. indicating the expenditure per grade level

input long(teducpreprimary teducprimary teducsecondary teducpostsec teductertiary) float educ_group byte lc08_cursch
0 0 0 0 0 1 .
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 3 1
0 0 0 0 0 3 1
0 0 0 0 0 3 1
0 0 0 0 0 1 1
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 10000 3 .
0 0 0 0 10000 4 2
0 0 0 0 10000 0 .
0 0 0 0 10000 5 2
0 0 0 0 0 3 .
0 0 0 0 0 5 .
0 0 0 0 0 5 2
0 0 0 0 0 3 .
0 0 0 0 0 5 .
0 0 0 0 0 1 1
0 0 0 0 0 5 .
0 0 0 0 0 5 .
0 0 0 0 0 5 .
0 0 0 0 0 0 .
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 5 .
0 0 0 0 0 5 .
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 3 1
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 5 .
0 0 0 0 0 2 1
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 3 1
0 0 0 0 0 3 1
0 0 0 0 0 3 1
0 0 0 0 0 3 1
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 3 1
0 0 0 0 0 3 1
0 0 0 0 0 3 .
0 0 0 0 78000 3 .
0 0 0 0 78000 3 .
0 0 0 0 78000 5 1
0 0 0 0 78000 5 1
0 0 0 0 78000 5 1
0 0 0 0 78000 5 2
0 0 0 0 0 3 .
0 0 0 0 0 5 .
0 0 0 0 0 3 1
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 3 2
0 0 0 0 0 3 .
0 12000 0 0 0 3 .
0 12000 0 0 0 3 .
0 12000 0 0 0 5 .
0 12000 0 0 0 5 .
0 12000 0 0 0 3 1
0 0 0 0 0 5 .
0 0 0 0 0 3 1
0 0 0 0 0 0 .
0 0 0 0 0 5 .
0 0 0 0 0 3 .
0 0 0 0 0 5 .
0 0 0 0 0 3 1
0 0 0 0 0 3 1
0 0 0 0 0 5 .
0 0 0 0 0 3 .
0 0 0 0 0 3 1
0 0 0 0 9980 5 .
0 0 0 0 9980 3 .
0 0 0 0 9980 5 1
0 0 0 0 9980 3 2
0 0 0 0 0 5 .
0 0 0 0 0 3 .
0 0 0 0 0 3 1
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 4 2
0 0 0 0 0 3 2
0 0 0 0 0 5 2
0 0 0 0 0 0 .
0 0 0 0 0 1 .
0 0 0 0 0 5 .
0 0 0 0 0 5 .
0 0 0 0 0 5 .
0 0 0 0 0 3 .
0 0 0 0 0 3 .
0 0 0 0 0 4 .
0 0 0 0 0 3 .
0 0 0 0 0 3 1
end
label values educ_group educ
label def educ 1 "Pre-Primary", modify
label def educ 2 "Primary", modify
label def educ 3 "Secondary", modify
label def educ 4 "Post Secondary & Non-Tertiary", modify
label def educ 5 "Tertiary", modify
label values lc08_cursch LC08_CURSCH
label def LC08_CURSCH 1 "yes", modify
label def LC08_CURSCH 2 "no", modify

How can I compute this in Stata?

Thank you.

Last edited by Jorge Martin; 03 Jun 2024, 07:42.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

03 Jun 2024, 13:10

I don't understand your data organization. First, what is the unit of analysis here--is each observation a single student? Next, on the one hand each observation is classified by the variable educ_group as pre-primary, primary, secondary, post secondary & non-tertiary, or tertiary. Yet you also have separate teduc* variables corresponding to these same 5 classifications. And then, for example, in observations 11 through 14, we see a cluster of observations with non-zero values of teductertiary, but, of these observations, only one of them has educ_group designated as tertiary. So what is the meaning of a non-zero value of a teducsome_level variable in an observation where educ_group specifies a different level?
Comment
Jorge Martin

Join Date: Apr 2024

Posts: 19
#3

03 Jun 2024, 17:01

I'm sorry, there was some missing information in my previous message. The data is per individual student. The educ_group variable corresponds to the highest grade level of the student (e.g., 1=preprimary). The teducpreprimary, teducprimary, teducsecondary, teducpostsec, and teductertiary variables correspond to the tuition fee (expenditure) of one student at each respective level. Non-zero variables indicate that it is not applicable or there is no value under that variable for that specific student.

Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

03 Jun 2024, 18:11

Non-zero variables indicate that it is not applicable or there is no value under that variable for that specific student.

I'm sorry, but that confuses me even more. If non-zero values are not applicable or no value, and only zero values are valid, then there is no data at all.
Comment
Jorge Martin

Join Date: Apr 2024

Posts: 19
#5

03 Jun 2024, 18:35

I mean the "." In the dataset are the ones I specified that has no value
Comment
Jorge Martin

Join Date: Apr 2024

Posts: 19
#6

03 Jun 2024, 21:02

May I know what you mean here?

Last edited by Jorge Martin; 03 Jun 2024, 21:04.
Comment
Jorge Martin

Join Date: Apr 2024

Posts: 19
#7

03 Jun 2024, 21:04

Originally posted by Clyde Schechter View Post

So what is the meaning of a non-zero value of a teducsome_level variable in an observation where educ_group specifies a different level?

May i know what you mean here?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

03 Jun 2024, 21:22

Take a look at the 11th observation in your example data in #1. It has educ_group designated as "Secondary." But it also has teductertiary = 10000. What does this mean? What is the relevance of the variable teductertiary in an observation where educ_group is "Secondary" (or anything other than "Tertiary")? There are many similar instances of a value being specified for a teduc* variable that refers to an education level different from the value of educ_group in the same observation. I don't understand what that means and why those numbers are there.

In short, I would have expected there to be only one teduc variable in the data set, and it would always contain the value of total educational expenditures for just the category of education referred to in the educ_group variable.

Thinking about this more, I also don't understand why the value of, say, teducpreprimary can differ from one observation to the next. You refer in #1 to "The formula is expenditure per grade level (e.g., teducpreprimary) divided by the number of currently attending students per grade level." This language leads me to expect that there is a single number that represents expenditure per grade level. But evidently this is not the case. Do you mean to total up the values of teducpreprimary (among those observations where educ_group is pre-primary and lc08_cursch is yes?) and then divide by the number of currently attending students in that grade level?

And, finally, what is going on when the value of teducX is zero and educ_group is designated as X and the student is currently in school? Does it really mean we have a student for which no expenditures were made on that student's education? That doesn't seem possible.
Comment
Jorge Martin

Join Date: Apr 2024

Posts: 19
#9

03 Jun 2024, 22:25

Originally posted by Clyde Schechter View Post

Do you mean to total up the values of teducpreprimary (among those observations where educ_group is pre-primary and lc08_cursch is yes?) and then divide by the number of currently attending students in that grade level?

Yes, the formula is the total expenditure per grade level (e.g., teducpreprimary) divided by the total number of currently attending students at that level (where educ_group is preprimary and lc08_cursch is yes).
Comment
Jorge Martin

Join Date: Apr 2024

Posts: 19
#10

03 Jun 2024, 23:25

Do you know a way to do this on stata? Thank you
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

#11

04 Jun 2024, 08:59

Well, you haven't answered most of my questions. So I'm going to proceed based on what I imagine would be the answers to those questions. The first stage is to reorganize the data so that there is only one teduc variable and it provides the expenditure for that student for the grade level that the student is in. Other grade levels' values of teduc are discarded. Then I calculate the mean for those where lc08_cursch == 1.

Code:

//    REORGANIZE THE DATA
gen `c(obs_t)' obs_no = _n
reshape long teduc, i(obs_no) j(temp) string
label define grade_level    1    preprimary ///
                            2    primary ///
                            3    secondary ///
                            4    postsec ///
                            5    tertiary
encode temp, gen(grade_level) label(grade_level)
drop temp
keep if grade_level == educ_group
drop grade_level obs_no

//    CALCULATE PER CAPITA EXPENDITURE BY GRADE LEVEL
by educ_group, sort: egen per_cap_expenditure = ///
    mean(cond(lc08_cursch == 1, teduc, .))

Announcement

Division of the total values of variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment