Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a new ordered variable using two variables that measure education level of individual

    I have two variables, educgb1 and edubgb2 that measure the level of education of an individual as seen below. I will group these options into level of education e.g. secondary, tertiary and then I want to create a new ordered variable from lowest level of education to the highest.

    However, my issue is when the individual answers both questions if they have a secondary qualification and tertiary education, I don't know how to create a variable that only uses the highest value. So in this case, if for educgb1 they have answered option 1 and for edubgb2 they have put option 2, then the new variable will use option 2.

    This new variable will have ordered levels of education from 1 - 6 (lowest to highest), but I'm not sure how to create the new variable after grouping the options into level of education such that it only uses highest education level for each individual. Instead of, for example, adding the values from each variable together, like an index would.





  • #2
    Click image for larger version

Name:	C9123E92-4D1E-462C-B17E-BCD5A67C8463.jpeg
Views:	2
Size:	319.4 KB
ID:	1640783
    Click image for larger version

Name:	C9123E92-4D1E-462C-B17E-BCD5A67C8463.jpeg
Views:	2
Size:	319.4 KB
ID:	1640784 image

    Comment


    • #3
      I don't follow completely, but I think for your first step you want the -egen- command with the max function; see
      Code:
      help egen

      Comment


      • #4
        What rules do you want to apply? We can tell you how to implement those rules in Stata. Hoever, we cannot tell what a reasonable categorization of education would be, and how that would relate to your variables. You are the specialist on that.
        ​​​​​
        Last edited by Maarten Buis; 13 Dec 2021, 15:07.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Originally posted by Rich Goldstein View Post
          I don't follow completely, but I think for your first step you want the -egen- command with the max function; see
          Code:
          help egen
          I don't think that works. The first question seems to be some form of general secondary education, while the second variable some form of vocational secondary and tertiary education. So you can't just take the lowest of the two ( confusingly, those variables are coded with the lowest value being the highest level), and hope you get the highest attained level of education.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            So for educgb1, I want to categorise the options as follows: option 5,4,3 = 1 and option 2,1 = 2
            for edubgb2, I want to categorise the options as follows: option 10,9,8 = 3 and option 7,6,5 = 4 and option 4,3 = 5 and option 2 = 6 and option 1 = 7

            I know how to do that step of recoding the two variables, it's just that I want to generate a new variable that creates a list from 1 to 7 but only uses the highest option that the individual has put across the two variables. As you can see below, most individuals have answered both questions, but I want the new variable to only use the highest value of the two options if that makes sense.


            Click image for larger version

Name:	5ABF2A1F-A6E0-4852-9828-A197D4E88FC0.jpeg
Views:	1
Size:	1.25 MB
ID:	1640792

            Comment


            • #7
              With your rules the first variable is always lower than the second, so to find the highest value you only need to recode the second variable and ignore the first.
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                But I also want to include individuals who have the GCSE education qualification which is not within the second variable but is in the first. How would I apply the egen max function in this scenario? So it uses the max value out of the two because I will also add a category that equals 0 for individuals that don't have higher than secondary education and so would not have answered the second variable, edubgb2

                Comment


                • #9
                  Originally posted by Ajay Joshi View Post
                  But I also want to include individuals who have the GCSE education qualification which is not within the second variable but is in the first. How would I apply the egen max function in this scenario?
                  Forget about the egen max function, that is never going to work for education variables.

                  Your coding scheme does not allow for including GCSE (whatever that is). Your coding scheme set GCSE to 1 and all values on the second variable are higher than 1, so the highest level of education will always be determined by the second variable. If you think that that is wrong, then you need to adjust your coding scheme. Related to that, your coding scheme is incomplete: what do you want to do with the people who answered "none of these"? You cannot give a perfect answer to that question, but you do have to make a choice.

                  My first step would be to make a cross tabulation of the two educational variables, and make sure you include the missing values (option missing). Stare at that table for a long time. Look at the entries that make sense for your education system. Hopefully, that covers most observations. Look at those observations that don't make sense. What might be going on there? Maybe there is an educational path that you have overlooked. Maybe people don't use the education system in the way it was intended. Maybe the answer categories are awkward, and the respondents answered as best they could but question combined two or more educational levels that should not have been combined. Maybe the respondents or the interviewer made a mistake. You stare at the table, you dig into the data, you re-examine what is known about your education system (maybe changes over time, so that the question is not a good fit for the older generation). You continue until you are confident that you understand what is going on with those two variables.

                  Once you have done that, you will typically have an idea how you want to recode those variables. Make your first attempt and look at it. First check your variable with cross tabulations against the original two variables (don't forget the missing values!) Second look at the distribution of your new variable. Does that make sense? Does it work for what you want to do? For example, in Germany in the somewhat older generation about 2/3 has vocational education. If you want to make a variable 1 "less than vocation", 2 "vocational" 3 "tertiary", then that distribution obviously won't work, so you might want to look for a way to break up the huge block of vocational education in different types of vocational education. In short, expect this to be a long and iterative process. We cannot help you with that: We don't know what country this is, so what educational system this data represents. We don't know what you want to use this variable for. So we cannot help you with those choices.

                  This leaves the question how to create such a variable in Stata once you have a coding scheme. I already told you to forget about egen max, but what should you do instead?

                  I can only give you general advise, as you have not yet given us the correct coding scheme and you have not given us the data.

                  Typically I would start with creating simpler versions of the two variables using recode

                  After that I will create the final education variable step by step, something like:

                  Code:
                  generate educ = 1 if var1 == 1 & var2 == 9
                  replace educ = 2 if var1 == 2 | inlist(var2,2,3)
                  replace educ = 3 if ....
                  ...
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment

                  Working...
                  X