Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Division of the total values of variables

    Hi everyone,

    I need help calculating the per capita education expenditure per grade level in Stata. The formula is expenditure per grade level (e.g., teducpreprimary) divided by the number of currently attending students per grade level (e.g. educ_group if ==1 & lc08_cursch==1)

    I have three relevant variables:

    1. educ_group, where each grade level is classified from 1 to 5 (e.g., 1=preprimary).
    2. lc08_cursch, indicating if a student is currently attending school (1=yes, 2=no).
    3. teducprimary, teducpreprimary, teducsecondary, teducpostsec, teductertiary. indicating the expenditure per grade level


    input long(teducpreprimary teducprimary teducsecondary teducpostsec teductertiary) float educ_group byte lc08_cursch
    0 0 0 0 0 1 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 1
    0 0 0 0 0 3 1
    0 0 0 0 0 3 1
    0 0 0 0 0 1 1
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 10000 3 .
    0 0 0 0 10000 4 2
    0 0 0 0 10000 0 .
    0 0 0 0 10000 5 2
    0 0 0 0 0 3 .
    0 0 0 0 0 5 .
    0 0 0 0 0 5 2
    0 0 0 0 0 3 .
    0 0 0 0 0 5 .
    0 0 0 0 0 1 1
    0 0 0 0 0 5 .
    0 0 0 0 0 5 .
    0 0 0 0 0 5 .
    0 0 0 0 0 0 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 5 .
    0 0 0 0 0 5 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 1
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 5 .
    0 0 0 0 0 2 1
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 1
    0 0 0 0 0 3 1
    0 0 0 0 0 3 1
    0 0 0 0 0 3 1
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 1
    0 0 0 0 0 3 1
    0 0 0 0 0 3 .
    0 0 0 0 78000 3 .
    0 0 0 0 78000 3 .
    0 0 0 0 78000 5 1
    0 0 0 0 78000 5 1
    0 0 0 0 78000 5 1
    0 0 0 0 78000 5 2
    0 0 0 0 0 3 .
    0 0 0 0 0 5 .
    0 0 0 0 0 3 1
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 2
    0 0 0 0 0 3 .
    0 12000 0 0 0 3 .
    0 12000 0 0 0 3 .
    0 12000 0 0 0 5 .
    0 12000 0 0 0 5 .
    0 12000 0 0 0 3 1
    0 0 0 0 0 5 .
    0 0 0 0 0 3 1
    0 0 0 0 0 0 .
    0 0 0 0 0 5 .
    0 0 0 0 0 3 .
    0 0 0 0 0 5 .
    0 0 0 0 0 3 1
    0 0 0 0 0 3 1
    0 0 0 0 0 5 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 1
    0 0 0 0 9980 5 .
    0 0 0 0 9980 3 .
    0 0 0 0 9980 5 1
    0 0 0 0 9980 3 2
    0 0 0 0 0 5 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 1
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 4 2
    0 0 0 0 0 3 2
    0 0 0 0 0 5 2
    0 0 0 0 0 0 .
    0 0 0 0 0 1 .
    0 0 0 0 0 5 .
    0 0 0 0 0 5 .
    0 0 0 0 0 5 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 .
    0 0 0 0 0 4 .
    0 0 0 0 0 3 .
    0 0 0 0 0 3 1
    end
    label values educ_group educ
    label def educ 1 "Pre-Primary", modify
    label def educ 2 "Primary", modify
    label def educ 3 "Secondary", modify
    label def educ 4 "Post Secondary & Non-Tertiary", modify
    label def educ 5 "Tertiary", modify
    label values lc08_cursch LC08_CURSCH
    label def LC08_CURSCH 1 "yes", modify
    label def LC08_CURSCH 2 "no", modify

    How can I compute this in Stata?

    Thank you.
    Last edited by Jorge Martin; 03 Jun 2024, 08:42.

  • #2
    I don't understand your data organization. First, what is the unit of analysis here--is each observation a single student? Next, on the one hand each observation is classified by the variable educ_group as pre-primary, primary, secondary, post secondary & non-tertiary, or tertiary. Yet you also have separate teduc* variables corresponding to these same 5 classifications. And then, for example, in observations 11 through 14, we see a cluster of observations with non-zero values of teductertiary, but, of these observations, only one of them has educ_group designated as tertiary. So what is the meaning of a non-zero value of a teducsome_level variable in an observation where educ_group specifies a different level?

    Comment


    • #3


      I'm sorry, there was some missing information in my previous message. The data is per individual student. The educ_group variable corresponds to the highest grade level of the student (e.g., 1=preprimary). The teducpreprimary, teducprimary, teducsecondary, teducpostsec, and teductertiary variables correspond to the tuition fee (expenditure) of one student at each respective level. Non-zero variables indicate that it is not applicable or there is no value under that variable for that specific student.

      Thank you.

      Comment


      • #4
        Non-zero variables indicate that it is not applicable or there is no value under that variable for that specific student.
        I'm sorry, but that confuses me even more. If non-zero values are not applicable or no value, and only zero values are valid, then there is no data at all.

        Comment


        • #5
          I mean the "." In the dataset are the ones I specified that has no value

          Comment


          • #6


            May I know what you mean here?
            Last edited by Jorge Martin; 03 Jun 2024, 22:04.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              So what is the meaning of a non-zero value of a teducsome_level variable in an observation where educ_group specifies a different level?
              May i know what you mean here?

              Comment


              • #8
                Take a look at the 11th observation in your example data in #1. It has educ_group designated as "Secondary." But it also has teductertiary = 10000. What does this mean? What is the relevance of the variable teductertiary in an observation where educ_group is "Secondary" (or anything other than "Tertiary")? There are many similar instances of a value being specified for a teduc* variable that refers to an education level different from the value of educ_group in the same observation. I don't understand what that means and why those numbers are there.

                In short, I would have expected there to be only one teduc variable in the data set, and it would always contain the value of total educational expenditures for just the category of education referred to in the educ_group variable.

                Thinking about this more, I also don't understand why the value of, say, teducpreprimary can differ from one observation to the next. You refer in #1 to "The formula is expenditure per grade level (e.g., teducpreprimary) divided by the number of currently attending students per grade level." This language leads me to expect that there is a single number that represents expenditure per grade level. But evidently this is not the case. Do you mean to total up the values of teducpreprimary (among those observations where educ_group is pre-primary and lc08_cursch is yes?) and then divide by the number of currently attending students in that grade level?

                And, finally, what is going on when the value of teducX is zero and educ_group is designated as X and the student is currently in school? Does it really mean we have a student for which no expenditures were made on that student's education? That doesn't seem possible.

                Comment


                • #9
                  Originally posted by Clyde Schechter View Post
                  Do you mean to total up the values of teducpreprimary (among those observations where educ_group is pre-primary and lc08_cursch is yes?) and then divide by the number of currently attending students in that grade level?
                  Yes, the formula is the total expenditure per grade level (e.g., teducpreprimary) divided by the total number of currently attending students at that level (where educ_group is preprimary and lc08_cursch is yes).

                  Comment


                  • #10
                    Do you know a way to do this on stata? Thank you

                    Comment


                    • #11
                      Well, you haven't answered most of my questions. So I'm going to proceed based on what I imagine would be the answers to those questions. The first stage is to reorganize the data so that there is only one teduc variable and it provides the expenditure for that student for the grade level that the student is in. Other grade levels' values of teduc are discarded. Then I calculate the mean for those where lc08_cursch == 1.

                      Code:
                      //    REORGANIZE THE DATA
                      gen `c(obs_t)' obs_no = _n
                      reshape long teduc, i(obs_no) j(temp) string
                      label define grade_level    1    preprimary ///
                                                  2    primary ///
                                                  3    secondary ///
                                                  4    postsec ///
                                                  5    tertiary
                      encode temp, gen(grade_level) label(grade_level)
                      drop temp
                      keep if grade_level == educ_group
                      drop grade_level obs_no
                      
                      //    CALCULATE PER CAPITA EXPENDITURE BY GRADE LEVEL
                      by educ_group, sort: egen per_cap_expenditure = ///
                          mean(cond(lc08_cursch == 1, teduc, .))

                      Comment

                      Working...
                      X