Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating an ID based on multiple variables

    Hello Statalisters!

    I am working with the Birth Recode survey of the Demographic and health surveys. It has individual panel data on women's birth histories -- includes unique id for the mother (uid), year of birth of the mother, each child (yobchild), birth order of the child (bidx), sex of the child (b4) etc. It basically has all of the mother's birth's recorded as a panel. IE mother 1 has child 1 in year 1972 with sex female, mother 1 had child 2 in year 1974 with sex male etc.

    I wanted to create a dummy that identifies if the mother had her first child in the 1980s and keep that on until the survey in the 90s. I wrote the following code but it does not take into account children who are born after the first child.

    Code:
    
    gen jan180 = 0 
    replace jan180 = 1 if yobchild >= 1980 & bidx == 1 // first child was born after 1980 Jan where bidx is birth order, yobch is the year of birth of the child
    So the variable jan180 is 1 if the first child was born on or after 1980 and zero otherwise.

    My question is, how do I replace jan180 == 1 for the subsequent children (bidx 2-10) but conditional on the fact that the mother's first child was born in 1980? This variable would then be indicative of the mother being fertile only after Jan 1980.

    Thank you!

    Lori

  • #2
    It's hard to provide exact code because I cannot see how your data is structured exactly, but I'm going to make a guess here:
    Code:
    bysort uid (yobchild): gen wanted = sum(jan180)

    Comment


    • #3
      Originally posted by Wouter Wakker View Post
      It's hard to provide exact code because I cannot see how your data is structured exactly, but I'm going to make a guess here:
      Code:
      bysort uid (yobchild): gen wanted = sum(jan180)
      I am not certain if this code would ensure that wanted = 1 for all subsequent children the mother has.
      gen jan180 = 0
      .
      . replace jan180 = 1 if yobchild >= 1980 & bidx == 1
      (143,338 real changes made)
      . bysort uid (yobchild): gen wanted = sum(jan180)
      . tab wanted
      wanted Freq. Percent Cum.
      0 400,082 73.54 73.54
      1 143,969 26. 46 100.00
      Total 544,051 100.00
      I have here a sample of what my data looks like using -dataex-. As you can see, I want to have a dummy for mothers whose first child was born on or after 1980 and then that dummy should remain = 1 for all subsequent children in the panel.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(uid yobchild) byte bidx
       1 1988 1
       1 1985 2
       2 1992 1
       2 1989 2
       2 1985 3
       2 1977 4
       3 1977 1
       3 1973 2
       3 1967 3
       4 1989 1
       5 1991 1
       5 1986 2
       5 1982 3
       6 1964 1
       6 1962 2
       6 1958 3
       7 1988 1
       7 1985 2
       7 1984 3
       7 1982 4
       7 1980 5
       8 1991 1
       8 1986 2
       8 1985 3
       9 1991 1
       9 1989 2
       9 1986 3
       9 1984 4
      10 1971 1
      10 1970 2
      10 1967 3
      10 1965 4
      10 1964 5
      10 1961 6
      11 1991 1
      12 1986 1
      12 1985 2
      12 1982 3
      13 1976 1
      13 1971 2
      13 1969 3
      13 1967 4
      13 1964 5
      13 1961 6
      14 1991 1
      14 1988 2
      15 1983 1
      16 1991 1
      17 1992 1
      17 1989 2
      17 1986 3
      18 1987 1
      18 1982 2
      18 1979 3
      19 1965 1
      20 1978 1
      21 1986 1
      21 1983 2
      21 1979 3
      22 1965 1
      22 1964 2
      22 1963 3
      22 1962 4
      22 1961 5
      22 1960 6
      22 1959 7
      23 1988 1
      23 1984 2
      24 1988 1
      24 1984 2
      24 1981 3
      25 1988 1
      26 1989 1
      26 1986 2
      26 1983 3
      27 1982 1
      27 1980 2
      27 1971 3
      27 1968 4
      27 1962 5
      27 1960 6
      28 1978 1
      28 1973 2
      28 1966 3
      28 1963 4
      28 1959 5
      28 1956 6
      29 1987 1
      29 1984 2
      29 1983 3
      30 1987 1
      30 1985 2
      30 1982 3
      30 1981 4
      31 1974 1
      31 1973 2
      31 1970 3
      31 1969 4
      31 1967 5
      32 1990 1
      end

      Comment


      • #4
        In #1 you stated that bidx==1 means the first child. In the example data however, bidx==1 is the last born (youngest) child, which is only the first child if the mother only had one child. The jan180 variable should therefore be created in a different way. I think this does what you want:
        Code:
        bysort uid (bidx): gen jan180 = yobchild >= 1980 & _n == _N // tag if first child is born in the 1980s
        bysort uid (yobchild): gen wanted = sum(jan180) // carry over to subsequent observations

        Comment

        Working...
        X