Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • education years frequency

    Hello how do I find the modal completed education years for the household members?
    Suppose a household has two modes what is recommended in that case?

    HHPBASE is the unique person identifier, HHBASE is the unique HH identifier, hhsize is the household size and ED6 is the completed years of eductaion.



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float hhsize double(HHBASE HHPBASE) int ED6
     5 1010201010 101020101002  0
     5 1010201010 101020101006  2
     5 1010201010 101020101005  7
     5 1010201010 101020101003  8
     5 1010201010 101020101001  0
    14 1010201020 101020102005  7
    14 1010201020 101020102004  0
    14 1010201020 101020102009 12
    14 1010201020 101020102010  2
    14 1010201020 101020102007  0
    14 1010201020 101020102015  0
    14 1010201020 101020102014  0
    14 1010201020 101020102013 12
    14 1010201020 101020102006  3
    14 1010201020 101020102011  0
    14 1010201020 101020102008 15
    14 1010201020 101020102012 15
    14 1010201020 101020102001 10
    14 1010201020 101020102003 15
     7 1010201030 101020103002  0
     7 1010201030 101020103003  9
     7 1010201030 101020103005  5
     7 1010201030 101020103007  0
     7 1010201030 101020103006  1
     7 1010201030 101020103004  6
     7 1010201030 101020103001  9
     5 1010201040 101020104002  0
     5 1010201040 101020104004  1
     5 1010201040 101020104003  6
     5 1010201040 101020104005  0
     5 1010201040 101020104001  0
     7 1010201050 101020105008  5
     7 1010201050 101020105003  0
     7 1010201050 101020105010  0
     7 1010201050 101020105006  0
     7 1010201050 101020105007  8
     7 1010201050 101020105009  4
     7 1010201050 101020105002  0
     5 1010201060 101020106014  0
     5 1010201060 101020106001  0
     5 1010201060 101020106013  0
     5 1010201060 101020106015  0
     5 1010201060 101020106012  0
     8 1010201070 101020107008  1
     8 1010201070 101020107003  0
     8 1010201070 101020107009  1
     8 1010201070 101020107006  0
     8 1010201070 101020107007  0
     8 1010201070 101020107002  0
     8 1010201070 101020107004  7
     8 1010201070 101020107001  0
     6 1010201080 101020108002  0
     6 1010201080 101020108004  2
     6 1010201080 101020108003  0
     6 1010201080 101020108006  0
     6 1010201080 101020108005  1
     6 1010201080 101020108001  0
     8 1010201090 101020109002  0
     8 1010201090 101020109008  0
     8 1010201090 101020109003  0
     8 1010201090 101020109009  0
     8 1010201090 101020109007  1
     8 1010201090 101020109005  0
     8 1010201090 101020109006  4
     8 1010201090 101020109004  8
    11 1010201100 101020110005 10
    11 1010201100 101020110007  0
    11 1010201100 101020110006  9
    11 1010201100 101020110008 11
    11 1010201100 101020110004 11
    11 1010201100 101020110002  0
    11 1010201100 101020110009  7
    11 1010201100 101020110003 12
    11 1010201100 101020110011  5
    11 1010201100 101020110010  6
    11 1010201100 101020110001 12
     1 1010201110 101020111001  0
     4 1010201120 101020112002 12
     4 1010201120 101020112001 15
     4 1010201120 101020112004  0
     4 1010201120 101020112003  0
     7 1010201130 101020113002  0
     7 1010201130 101020113005  8
     7 1010201130 101020113007  4
     7 1010201130 101020113004  9
     7 1010201130 101020113003  9
     7 1010201130 101020113006  4
     7 1010201130 101020113001  8
     7 1010201140 101020114004  3
     7 1010201140 101020114002  0
     7 1010201140 101020114007  0
     7 1010201140 101020114001  0
     7 1010201140 101020114005  1
     7 1010201140 101020114006  0
     7 1010201140 101020114003  3
     8 1010201160 101020116005  4
     8 1010201160 101020116004  0
     8 1010201160 101020116006  3
     8 1010201160 101020116007  1
     8 1010201160 101020116002  0
    end
    label values ED6 ED6
    label def ED6 0 "none, <1 0", modify
    label def ED6 1 "1st class 1", modify
    label def ED6 2 "2nd class 2", modify
    label def ED6 3 "3rd class 3", modify
    label def ED6 4 "4th class 4", modify
    label def ED6 5 "5th class 5", modify
    label def ED6 6 "6th class 6", modify
    label def ED6 7 "7th class 7", modify
    label def ED6 8 "8th class 8", modify
    label def ED6 9 "9th class 9", modify
    label def ED6 10 "Secondary 10", modify
    label def ED6 11 "11th Class 11", modify
    label def ED6 12 "High Secondary 12", modify
    label def ED6 15 "Bachelors 15", modify
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 298513 observations
    Use the count() option to list more

  • #2
    suppose i run this command

    Code:
    egen modedu = mode(ED6) if ED6 !=., by( HHBASE ) minmode
    and instead of putting the minimum value of ED6 as mode in case there are two modes I want to assign mode for the hhid as the education years of the household head given that his education years is one of the modes. Here hhhead is an indicator of the person being a hh head. How can I do that

    Comment


    • #3
      Typically the maximum is used in situations like these: if one person in the household has a university degree, than the entire household benefits from that. This has the added advantage that you won't have to worry about ties.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Raphael:
        I cannot find -hhid- in your example.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          HHBASE is the unique hh identifer

          Comment


          • #6
            I think Maarten Buis might agree that taking the higher value is a social science judgment for education data based on some mix of arguments, including what the researcher is trying to measure, what is closest to the goals of the analysis, or what has most predictive power. And that's all positive and not a criticism.

            In other contexts, a researcher could jump to a different conclusion, even that occurrence of ties renders taking modes useless.

            Comment


            • #7
              Nick Cox absolutely, and how to "add up" individual resources like education to come to a household level is an area of active research. Also because it can be of substantive interest: does the mother influence the daughters and the father the sons, and/ or does the highest educated parent matter, and/or do both parents matter, and if so, do they matter to the same extent? You can test that empirically. This gives a hint on how these resources influence the outcome.

              So I would expect that I would not get the same results for different outcomes. So there is not one answer for all situations , but if I don't care too much about that variable, and I just want one number per household, then I would take the highest. If I care a little bit more I might use the sum, which is equivalent to adding the education of all household members, but constrain the effects to be the same (see this Stata tip: https://www.stata-journal.com/articl...article=st0261 ). If I care a whole lot about this then I will test what the best specification for this case (for example this article: https://maartenbuis.nl/publications/fam_backgr.html ). I would only do that in cases where this is the point of the article.
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment

              Working...
              X