Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with household rosters

    Hello, Stata community

    I am struggling with a problem. I have a household roster listing all members of a household and their relationship to each other. I'm particularly interested in marriage relations within households. I have a variable per household (serial), a variable indicating a person's number in the household, the person's sex, their relationship to the household head, and a variable indicating the person number with which a person is married (i.e. the head of the household might have the person number of 1. The wife of the household might be listed 4th. Then, the variable indicating who is married to who would take a value of 4 for the household head and 1 for the wife).

    The thing is, some households are polygamous, meaning that multiple women would report being married to the same husband (who is not always listed 1st in the household).

    What I would like to do is to, per household, create a variable indicating whether the woman is in a polygamous relationship (i.e. the woman will take a value of 1 of two or more women report being married to the same husband in the household). In addition, per household, I would like to calculate how many wives are in the household.

    Does anyone have an idea how I can go about doing this?

    P.S. I apologize about the suboptimal nature of my post. I tried extracting the data with dataex, but it didn't seem to work. I hope that my explanation is helpful regardless.




    Last edited by Christiaan de Swardt; 27 Nov 2024, 09:31.

  • #2
    With imaginary data, the best you can hope for is imaginary code. I'm guessing that the problem you had with -dataex- is that you have more variables than -dataex- will handle. For your problem, the only variables you need to show are the household identifier, the person number variable, the sex variable, and the variable that inidicates who the person is married to. So just run -dataex- with those four variables and you will get results, which you can then copy/paste here.

    Comment


    • #3
      Dear Clyde,

      Thank you for your response. Here is the dataex output as I receive it:


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long serial byte(pernum sex sploc)
      1000  1 1  2
      1000  2 2  1
      1000  3 2  0
      1000  4 2  1
      1000  5 1  0
      1000  6 2  0
      1000  7 1  0
      1000  8 1  0
      1000  9 1  0
      1000 10 2  0
      1000 11 1  0
      1000 12 1 13
      1000 13 2 12
      end
      label values sex SEX
      label def SEX 1 "male", modify
      label def SEX 2 "female", modify

      The code is for household 1000. As we can see, the first person in the household (in column 2) is married to person number 2 (as shown in column 4). However, both persons 2 and 4 are married to person 1. This is what I would like to use to indicate that persons 1, 2, and 4 are in a polygamous relationship. Furthermore, I would like to capture that person 1 has 2 wives.

      As an added complication, this household includes another monogamous couple, persons number 12 and 13. For them, I would like to indicate that they are in a monogamous relationship and that person 12 (the male) only has one wive.

      Since it is easier working with individual households, I could generate some code for an individual household and then loop it over households using preserve and restore, but given the number of households I have in my dataset (10% random sample of census data), this would be quite inefficient.

      Thank you so much in advance for any advice/ideas that the community might have!

      Comment


      • #4
        The following code will do what you ask, and on the assumption that the variable named serial is the household id, it can be used without modification for your entire data set without any loops.

        Code:
        by serial (sploc), sort: gen byte polygamous_couple = _N > 1 & sploc != 0
        rangestat (count) n_wives = pernum, by(serial) interval(sploc pernum pernum)
        replace n_wives = . if sex != "male":SEX
        sort serial pernum
        I could generate some code for an individual household and then loop it over households using preserve and restore, but...
        It is almost never necessary to do this kind of looping in Stata. Moreover, when it is necessary to loop over groups of observations, it is certainly not necessary to use -preserve- and -restore-, as the commands in the loop can be conditioned with -if- clauses.

        But the -by- prefix, as used above, accomplishes this with far greater efficiency than looping. You should make it your business to learn about -by- before you proceed any farther in your use of Stata: it is an essential for getting anything beyond toy problems done in Stata.

        -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.

        Comment


        • #5
          Thank you Clyde, that worked great!

          I have one final question related to this. I'm also interested in the age difference between husbands and their wives. There is another variable called 'age' that captures the age of each respondent. So, I would simply want to subtract the age of the wive from the age of the husband. However, I'm not sure how to operationalize this using the sploc variable.

          For illustration, we can imagine that the age of pernum 1 (the husband) is 32. The ages of pernum 2 and 4 (the two wives) are 28 and 27, respectively. Then, I would like to create a variable capturing their age difference, so having values of 4 years for pernum 2, and 5 years for pernum 4. Do you perhaps have an idea of how this can be achieved?

          Thank you so much for your assistance!

          Comment


          • #6
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input long serial byte(pernum sex sploc) float age
            1000  1 1  2 32
            1000  2 2  1 28
            1000  3 2  0  .
            1000  4 2  1 27
            1000  5 1  0  .
            1000  6 2  0  .
            1000  7 1  0  .
            1000  8 1  0  .
            1000  9 1  0  .
            1000 10 2  0  .
            1000 11 1  0  .
            1000 12 1 13  .
            1000 13 2 12  .
            end
            label values sex SEX
            label def SEX 1 "male", modify
            label def SEX 2 "female", modify
            
            rangestat husb_age=age, int(pernum sploc sploc) by(serial)
            
            gen age_diff = husb_age - age if sex == 2
            
            list
            
                 +--------------------------------------------------------------+
                 | serial   pernum      sex   sploc   age   husb_age   age_diff |
                 |--------------------------------------------------------------|
              1. |   1000        1     male       2    32         28          . |
              2. |   1000        2   female       1    28         32          4 |
              3. |   1000        3   female       0     .          .          . |
              4. |   1000        4   female       1    27         32          5 |
              5. |   1000        5     male       0     .          .          . |
                 |--------------------------------------------------------------|
              6. |   1000        6   female       0     .          .          . |
              7. |   1000        7     male       0     .          .          . |
              8. |   1000        8     male       0     .          .          . |
              9. |   1000        9     male       0     .          .          . |
             10. |   1000       10   female       0     .          .          . |
                 |--------------------------------------------------------------|
             11. |   1000       11     male       0     .          .          . |
             12. |   1000       12     male      13     .          .          . |
             13. |   1000       13   female      12     .          .          . |
                 +--------------------------------------------------------------+
            EDIT:

            Note that this also works, at least for the data example:

            Code:
            bysort serial (pernum) : gen wanted = age[sploc] - age if sex == 2
            Last edited by Nick Cox; 29 Nov 2024, 06:00.

            Comment


            • #7
              I would like to return to my question of how to code individuals as being in a polygamous relationship when more than one of them report being married to the same husband.

              I have run the code that Clyde has suggested, namely:

              by serial (sploc), sort: gen byte polygamous_couple = _N > 1 & sploc != 0

              That produced the data below. Persons 1, 2, and 4 are correctly indicated as being in a polygamous relationship. The problem is that persons 12 and 13 are indicated as being in a polygamous relationship as well, even though they are monogamous (as they have unique values for sploc).

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input long serial byte(pernum sploc sex polygamous_couple)
              1000  1  2 1 1
              1000  2  1 2 1
              1000  3  0 2 0
              1000  4  1 2 1
              1000  5  0 1 0
              1000  6  0 2 0
              1000  7  0 1 0
              1000  8  0 1 0
              1000  9  0 1 0
              1000 10  0 2 0
              1000 11  0 1 0
              1000 12 13 1 1
              1000 13 12 2 1
              end
              label values sex SEX
              label def SEX 1 "male", modify
              label def SEX 2 "female", modify
              My thinking is to generate the variable polygamous couple and somehow change its value to 1 if there is a duplicate value for sploc per serial number (as a duplicate value would indicate that two wives are married to the same husband in a household). Since the sploc values for persons 12 and 13 are unique, they would then not be listed as polygamous. However, I am struggling to think a way through operationalizing this, particularly since it doesn't seem like the 'duplicate' command can be combined with 'by'. Do you perhaps have any insight into how I could move forward with this?

              Comment


              • #8
                Working backwards:

                1.

                The duplicates command (not duplicate) indeed can't be combined syntactically with by: but (as its putative author) I don't see that as a limitation.

                If you want to identify duplicates on x y within groups of identical z, that means you want duplicates on x y z.


                2.

                That said, the duplicates command is just a wrapper for all sorts of fooling around with sorting, by: and _N -- so you can always do without it.

                Here is some technique:


                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input long serial byte(pernum sex sploc)
                1000  1 1  2
                1000  2 2  1
                1000  3 2  0
                1000  4 2  1
                1000  5 1  0
                1000  6 2  0
                1000  7 1  0
                1000  8 1  0
                1000  9 1  0
                1000 10 2  0
                1000 11 1  0
                1000 12 1 13
                1000 13 2 12
                end
                label values sex SEX
                label def SEX 1 "male", modify
                label def SEX 2 "female", modify
                
                bysort serial sploc : gen wanted = sploc != 0 & _N >= 2 
                
                duplicates tag serial sploc if sploc, gen(WANTED)
                
                list if inlist(1, wanted, WANTED)
                
                    +----------------------------------------------------+
                     | serial   pernum      sex   sploc   wanted   WANTED |
                     |----------------------------------------------------|
                  9. |   1000        2   female       1        1        1 |
                 10. |   1000        4   female       1        1        1 |
                     +----------------------------------------------------+
                3.

                Any reply to #6?

                Comment


                • #9
                  Dear Nick,

                  Thank you for the information. I did not know about that functionality of the duplicates command. This should be very helpful in the future!

                  I am glad that my intuition regarding 'duplicates' was correct. The code that you provided in #8 and #6 did exactly what I wanted it to do. Thank you and Clyde so much for the advice and support

                  Comment


                  • #10
                    Re #7. You are right. My apologies for the error in the code. It should be:
                    Code:
                    by serial sploc, sort: gen byte polygamous_couple = _N > 1 & sploc != 0
                    // N.B.  NO PARENTHESES AROUND sploc

                    Comment


                    • #11
                      Dear Clyde and Nick

                      Thank you once again for the responses. After further investigating the data, I realized that the code, while doing exactly what we want it to do, doesn't account for some aspects of the data that I would like it to. Consider the example below:


                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input long serial byte(pernum sploc) int marstd float wanted
                      75000 1 2 217 0
                      75000 2 1 216 0
                      75000 3 0 100 0
                      75000 4 0 100 0
                      75000 5 0 400 0
                      end
                      label values marstd MARSTD
                      label def MARSTD 100 "single/never married", modify
                      label def MARSTD 216 "married, monogamous", modify
                      label def MARSTD 217 "married, polygamous", modify
                      label def MARSTD 400 "widowed", modify
                      Here, the husband is coded as being polygamous. The wife is reported as being married to the polygamous husband through the sploc value of 1. However, 'wanted' shows up as 0, because she is the only woman in the household reporting to be married to the husband. For now, let's assume that the other wife lives in another household or is away, such that 'marstd' actually provides the correct household status.

                      What we could try to do is to then just mark women as being in a polygamous marriage if they are married to a husband who reports being polygamous. I would want to code this as something like:

                      by serial: replace marstd = 217 if sploc (wife) == pernum (husband) & marstd (husband) == 217 & sex == 2

                      Of course, the sploc (wife), pernum (husband), abd marstd (husband) is incorrect . But this is the intuition of how I think this problem can be tackled. What do you think?
                      Last edited by Christiaan de Swardt; 12 Dec 2024, 08:46.

                      Comment


                      • #12
                        Sure. Your example data didn't include a sex variable. I'm guessing from what you wrote that the variable is coded 1 for males and 2 for females.

                        Code:
                        frame put serial pernum if marstd == 217 & sex == 1, into(polygamous_males)
                        frlink m:1 serial sploc, frame(polygamous_males serial pernum)
                        replace marstd = 217 if !missing(polygamous_males)
                        drop polygamous_males
                        frame drop polygamous_males
                        The logic here is to create a new frame with the identifiers of all the polygamous males. Then link everybody to that, but use the sploc instead of the pernum in the linkage, so any woman married to a polygamous male finds a match there. Then change the marital status variable to polygamous for anyone who matched to a polygamous male.

                        Added: This question raises a problem with the data structure you have. In the example you show, there is no information within the household (serial) that identifies pernum 1 as polygamous except for marstd == 217. But how did you get marstd == 217 in the first place? You have stated that he is polygamously married to somebody in another household. But your data structure doesn't appear to enable you to identify cross-household marriages. But even assuming that you can do that using other variables not shown, if the same male is married to two women in different households, he will not be identified as polygamous in either household using only data in the household, and so this further extension will not have marstd == 217, and the polygamous marriage will be missed. Is there something in your data that makes it possible to fix this. It seems that the problem is that this same male has two different IDs in the data set: one in household 75000 and the other in some different household. So how can you tell it is the same person?
                        Last edited by Clyde Schechter; 12 Dec 2024, 09:52.

                        Comment


                        • #13
                          Odd how polygamy turns out to mean polygyny.

                          Comment

                          Working...
                          X