Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data construction and management

    I am working with child data and having difficulties trying to create a variable that contains birth order and whether the child has older brothers or sisters with respect to the child id = 0.
    Something similar with this code here,

    0=child is firstborn
    1=child is secondborn and has an older brother
    2=child is secondborn and has an older sister
    3=child is thirdborn or higher birth order and has at least one older brother
    4=child is thirdborn or higher birth order and has no older brothers.

    I have this sample as an example data
    Code:
    * Example generated by -dataex-. For more info, type help dataex clear input str8 childid byte(id age sex relate) "010001" 0 4 1 0 "010001" 1 31 1 1 "010001" 2 23 2 1 "010001" 3 1 2 7 "010002" 0 5 1 0 "010002" 1 45 1 1 "010002" 2 36 2 1 "010003" 0 4 1 0 "010003" 2 34 1 1 "010003" 3 26 2 1 "010003" 4 3 2 7 "010004" 0 5 2 0 "010004" 1 39 1 1 "010004" 2 31 2 1 "010004" 3 10 2 7 "010004" 4 9 2 7 "010005" 0 5 2 0 "010005" 3 42 1 1 "010005" 4 31 2 1 "010006" 0 5 1 0 end label values sex memsex label def memsex 1 "male", modify label def memsex 2 "female", modify label values relate relate label def relate 0 "yl child", modify label def relate 1 "biological parent", modify label def relate 7 "brother/sister (both parents the same)", modify
    Thank you in advance

  • #2
    The variable names are a little bit confusing. Here is what I assume is the structure of your data: this is a survey of households and the variable called "childid" is actually an identifier for the entire household. Within those households, there are various people, each with a distinct value of id, starting from 0. 0 is the id of the child who is the focus of the survey, and also has the value of relate == 0 (yl child)--the one you wish to characterize. On those assumptions, the following should do it:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 childid byte(id age sex relate)
    "010001" 0 4 1 0
    "010001" 1 31 1 1
    "010001" 2 23 2 1
    "010001" 3 1 2 7
    "010002" 0 5 1 0
    "010002" 1 45 1 1
    "010002" 2 36 2 1
    "010003" 0 4 1 0
    "010003" 2 34 1 1
    "010003" 3 26 2 1
    "010003" 4 3 2 7
    "010004" 0 5 2 0
    "010004" 1 39 1 1
    "010004" 2 31 2 1
    "010004" 3 10 2 7
    "010004" 4 9 2 7
    "010005" 0 5 2 0
    "010005" 3 42 1 1
    "010005" 4 31 2 1
    "010006" 0 5 1 0
    end
    label values sex memsex
    label def memsex 1 "male", modify
    label def memsex 2 "female", modify
    label values relate relate
    label def relate 0 "yl child", modify
    label def relate 1 "biological parent", modify
    label def relate 7 "brother/sister (both parents the same)", modify
    
    assert relate == 0 if id == 0
    
    gen byte is_child = (relate != "biological parent":relate)
    by childid (age), sort: gen int birth_order = sum(is_child) if(is_child)
    by childid (id), sort: egen has_older_brother = ///
        max(_n > 1 & is_child & sex == "male":memsex & age > age[1])
    by childid (id): egen has_older_sister = ///
        max(_n > 1 & is_child & sex == "female":memsex & age > age[1])
        
    by childid (id): gen status = 0 if birth_order[1] == 1
    by childid (id): replace status = 1 if birth_order[1] == 2 & has_older_brother
    by childid (id): replace status = 2 if birth_order[1] == 2 & has_older_sister
    by childid (id): replace status = 3 if birth_order[1] >= 3 & has_older_brother
    by childid (id): replace status = 4 if birth_order[1] >= 3 & !has_older_brother
    Note that the five categories of status you outlined in #1 are not exhaustive of the possibilities. A child might be of birth order 2 but have no older sibling at all--this doesn't fit any of those descriptions.

    By the way, somehow your -dataex- output got mangled in posting and was all one long line. Please review what you post after it appears, and if there are problems like that, go back and fix them. Manually parsing a long line like that is tedious and time-consuming.

    Comment


    • #3
      Your data example wasn't posted correctly, so I've fixed it and post it here for the benefit of anyone who wants to work on this.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str8 childid byte(id age sex relate)
      "010001" 0  4 1 0
      "010001" 1 31 1 1
      "010001" 2 23 2 1
      "010001" 3  1 2 7
      "010002" 0  5 1 0
      "010002" 1 45 1 1
      "010002" 2 36 2 1
      "010003" 0  4 1 0
      "010003" 2 34 1 1
      "010003" 3 26 2 1
      "010003" 4  3 2 7
      "010004" 0  5 2 0
      "010004" 1 39 1 1
      "010004" 2 31 2 1
      "010004" 3 10 2 7
      "010004" 4  9 2 7
      "010005" 0  5 2 0
      "010005" 3 42 1 1
      "010005" 4 31 2 1
      "010006" 0  5 1 0
      end
      label values sex memsex
      label def memsex 1 "male", modify
      label def memsex 2 "female", modify
      label values relate relate
      label def relate 0 "yl child", modify
      label def relate 1 "biological parent", modify
      label def relate 7 "brother/sister (both parents the same)", modify
      Can you explain the variable relate more fully? I suspect there are other values possible and that only 0, 1, and 7 appeared in your sample. Show the output of
      Code:
      label list relate
      and we should then be able to know the complete coding.

      What does "yl child" mean?
      Last edited by William Lisowski; 02 Mar 2022, 17:12.

      Comment


      • #4
        Thank you very much, Clyde. You have pinpointed the structure of the data.
        I checked that the code works in reference to relate == 0. How can you tweak the codes to check if ANY child is
        relate if the type of relation member of the household has to the index child identified by the relate code == 0 | id == 0

        0=child is firstborn
        1=child is secondborn and has an older brother
        2=child is secondborn and has an older sister
        3=child is thirdborn or higher birth order and has at least one older brother
        4=child is thirdborn or higher birth order and has no older brothers.

        Thank you!

        Here is the corrected data format from dataex

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str8 childid byte(id sex relate) int age
        "IN010001" 0 1 0  7
        "IN010001" 1 1 1 34
        "IN010001" 2 2 1 26
        "IN010001" 3 2 7  4
        "IN010002" 0 1 0  8
        "IN010002" 1 1 1 48
        "IN010002" 2 2 1 39
        "IN010003" 0 1 0  8
        "IN010003" 2 1 1 37
        "IN010003" 3 2 1 29
        "IN010003" 4 2 7  6
        "IN010004" 0 2 0  8
        "IN010004" 1 1 1 42
        "IN010004" 2 2 1 34
        "IN010004" 3 2 7 13
        "IN010004" 4 2 7 12
        "IN010005" 0 2 0  8
        "IN010005" 3 1 1 45
        "IN010005" 4 2 1 34
        "IN010005" 5 1 7  3
        end
        label values sex memsex
        label def memsex 1 "male", modify
        label def memsex 2 "female", modify
        label values relate aa
        label def aa 0 "YL child", modify
        label def aa 1 "Biological Parent", modify
        label def aa 7 "brother/sisters", modify
        label values age age

        Comment


        • #5
          Thank you, William. I must have checked the preview before posting it. Apologies for that. here is the corrected data. relate is record of relationship data member of the household has related to the child identified by id | relate == 0 (index child)

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str8 childid byte(id sex relate) int age
          "IN010001" 0 1 0  7
          "IN010001" 1 1 1 34
          "IN010001" 2 2 1 26
          "IN010001" 3 2 7  4
          "IN010002" 0 1 0  8
          "IN010002" 1 1 1 48
          "IN010002" 2 2 1 39
          "IN010003" 0 1 0  8
          "IN010003" 2 1 1 37
          "IN010003" 3 2 1 29
          "IN010003" 4 2 7  6
          "IN010004" 0 2 0  8
          "IN010004" 1 1 1 42
          "IN010004" 2 2 1 34
          "IN010004" 3 2 7 13
          "IN010004" 4 2 7 12
          "IN010005" 0 2 0  8
          "IN010005" 3 1 1 45
          "IN010005" 4 2 1 34
          "IN010005" 5 1 7  3
          end
          label values sex memsex
          label def memsex 1 "male", modify
          label def memsex 2 "female", modify
          label values relate aa
          label def aa 0 "YL child", modify
          label def aa 1 "Biological Parent", modify
          label def aa 7 "brother/sisters", modify
          label values age age

          Comment


          • #6
            So, you want to apply these classifications to every child in the household, not just the "YL child." That's slightly different:
            Code:
            gen byte is_child = (relate != "Biological Parent":aa)
            by childid (age), sort: gen int birth_order = sum(is_child) if(is_child)
            by childid (age), sort: egen int age_oldest_male = ///
                max(cond(sex == "male":memsex & is_child, age, .))
            by childid (age), sort: egen int age_oldest_female = ///
                max(cond(sex == "female":memsex & is_child, age, .))
            gen byte has_older_brother = is_child & age_oldest_male > age & !missing(age_oldest_male)
            gen byte has_older_sister = is_child & age_oldest_female > age & !missing(age_oldest_female)
                
                
            gen byte status = 0 if birth_order == 1
            replace status = 1 if birth_order == 2 & has_older_brother
            replace status = 2 if birth_order == 2 & has_older_sister
            replace status = 3 if birth_order >= 3 & !missing(birth_order) & has_older_brother
            replace status = 4 if birth_order >= 3 & !missing(birth_order) & !has_older_brother
            Added: I notice that the value label for the relate variable has changed from the earlier example. The code relies on this value label, so the code shown here works with the data shown in #5, but will not work with the data shown in earlier posts in this thread.

            I also notice that your variable age has a value label named age attached to it, although apparently no such value label exists. While this is a legal configuration, it is potentially dangerous and I worry that it reflects some error in the creation of this data set. If so, there may be other errors made along the way as well. It is a dangerous situation because age, being a true numeric variable, should not have a value label. Now, if in the larger data set there is some coding like 99 for missing value, or where there is a "topcoding" such that, say, age = 65 actually means 65 and older, then applying a value label to handle that situation makes sense. (Actually 99 for missing value should be handled by replacing it with a real Stata missing value.) But if the values of the variable age are the true numeric ages, there is no reason to have a value label, and there is the risk that you will inadvertently create such a value label and it will trick you into thinking the variable is something other than what it really is. So I would look into this carefully before proceeding.
            Last edited by Clyde Schechter; 03 Mar 2022, 10:47.

            Comment


            • #7
              Thank you very much, Clyde!
              Yes, the value label for relate has a slight variation from the one I posted earlier. I will check the value label for age. Obviously, a numeric variable should not have a value label. Thank you for alarming this.

              Comment


              • #8
                Hi Clyde,

                I came back with a question.
                The previous code worked. But there is one issue. The birth order kept changing from round to round when household members dropped in and out of a given round. Here is an example,
                In round 4, a member dropped and the birth order changed from the previous two rounds.

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input long iid byte id float(round birth_order) int age byte(sex relate)
                4 3 2 1 10 0 7
                4 4 2 2  9 0 7
                4 0 2 3  5 0 0
                4 3 3 1 13 0 7
                4 4 3 2 12 0 7
                4 0 3 3  8 0 0
                4 4 4 1 16 0 7
                4 0 4 2 12 0 0
                4 3 5 1 20 0 7
                4 4 5 2 19 0 7
                4 0 5 3 15 0 0
                end
                label values iid iid
                label def iid 4 "IN010004", modify
                label values round rnd
                label def rnd 2 "round 2", modify
                label def rnd 3 "round 3", modify
                label def rnd 4 "round 4", modify
                label def rnd 5 "round 5", modify
                label values sex mf
                label def mf 0 "female", modify
                label values relate rel
                label def rel 0 "YL child", modify
                label def rel 7 "Brother/sister", modify
                How is it possible to tweak the code so the birth order follows the natural order and does not depend on the movement of members?
                Thanks in advance.

                Comment


                • #9
                  Yes, I see. I can think of some approaches to this, but in order for them to work, you need to have either actual dates of birth, rather than age, or you need dates for each round of the survey. Those dates (whether of birth or of survey) would need to be accurate to at least the month. Otherwise, if one person is present in rounds 1 and 2 at ages, say, 8, 10, and another person in the household is only present in rounds 3 and 4, with ages 11 and 12, that person could be either older or younger than the first one depending on how much time has elapsed between rounds 2 and 4.

                  One other thing: each time you post example data, the data organization keeps changing. In #8 you now have households identified by a new variable iid, which is a value-labeled integer rather than the string variable childid we had before. The value label for sex has changed from memsex to sex, and the value label for relationship has changed from relate to aa and now to rel. It will make things much easier for me if we stick to one set of variable names and one set of data storage types and one set of value labels throughout the thread. This will enable me to start with code already posted and focus on just modifying it to make the substantive change called for, as opposed to having to rewrite from scratch because everything is different now.

                  Comment


                  • #10
                    Hi Clyde, Thank you very much and apologize for the inconsistency.
                    I should have been more consistent and paid attention to the data organization ( variable and value label, names ). I was sharing data from different rounds and I only re-coded and relabeled them in the final construction, which is the case in #8.

                    At the moment what I only have is dates in survey years and there is 3-4 years gap between rounds. Let me see what I can do and get back for advice.
                    Thank you again.

                    Comment

                    Working...
                    X