Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thinking more carefully about this, my idea of creating a new variable -seq- is not workable. The reason is that smoking status can change over time in this data, and there is no guarantee that the value of seq generated properly matches up the year for the child with the same year for the parent. So this could break the correspondence. You have to actually have the original year variable.

    And fixing it appears a bit more complicated than the original problem. So please post another example of data that includes multiple records for the same person, contains the year variable, and also, for at least one person and year, the data example should contain the mother and father records for that year as well.

    Comment


    • #17
      I hope this is ok if not write a message

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long(pid mpid fpid) float year byte smoker
      10002251        0        0 1991  2
      10004491        0        0 1992  1
      10004491        0        0 1991  1
      10004521        0        0 1992  1
      10004521        0        0 1993  1
      10004521        0        0 1991  1
      10007857        0        0 1998  2
      10007857        0        0 1991  2
      10007857        0        0 1999  .
      10007857        0        0 1993  2
      10007857        0        0 1992  2
      10007857        0        0 2001  2
      10007857        0        0 2000  2
      10014578        0        0 1993  2
      10014578        0        0 1998  2
      10014578        0        0 1995 -7
      10014578        0        0 2005  2
      10014578        0        0 2002  2
      10014578        0        0 2000  2
      10014578        0        0 1996  2
      10014578        0        0 2007  2
      10014578        0        0 1992  2
      10014578        0        0 2004  2
      10014578        0        0 1991  2
      10014578        0        0 2003  2
      10014578        0        0 1999  .
      10014578        0        0 2008  2
      10014608        0        0 1991  1
      10014608        0        0 2000  2
      10014608        0        0 1993  2
      10014608        0        0 1999  .
      10014608        0        0 2007  2
      10014608        0        0 2002  2
      10014608        0        0 1996  2
      10014608        0        0 1998  2
      10014608        0        0 2005  2
      10014608        0        0 2008  2
      10014608        0        0 1992  2
      10014608        0        0 1995  2
      10014608        0        0 2004  2
      10014608        0        0 2003  2
      10016813        0        0 1991  1
      10016813        0        0 2000 -7
      10016813        0        0 2004 -7
      10016813        0        0 1997  1
      10016813        0        0 1993 -7
      10016813        0        0 1994  1
      10016813        0        0 1998 -7
      10016813        0        0 1999  .
      10016813        0        0 1992  1
      10016813        0        0 2002 -7
      10016848        0        0 1999  .
      10016848        0        0 2001  1
      10016848        0        0 1993 -7
      10016848        0        0 2004  1
      10016848        0        0 1994  1
      10016848        0        0 1998  1
      10016848        0        0 2000  1
      10016848        0        0 2003  1
      10016848        0        0 1991  1
      10016848        0        0 1992  1
      10016848        0        0 2005  1
      10016848        0        0 1997  1
      10016848        0        0 2007  1
      10016848        0        0 2008  1
      10016848        0        0 2002  1
      10016872        0 10016813 2007  2
      10016872        0 10016813 2001  2
      10016872        0        0 2008  2
      10016872        0 10016813 1998 -7
      10016872        0 10016813 2002  1
      10016872        0 10016813 1999  .
      10016872        0 10016813 2000 -7
      10016872        0 10016813 2004 -7
      10016872        0 10016813 2003  2
      10016872        0 10016813 2005  2
      10017933        0        0 1994  2
      10017933        0        0 1991  2
      10017933        0        0 1992  2
      10017933        0        0 1996  2
      10017933        0        0 1998  2
      10017933 90001451        0 2001  2
      10017933 90001451        0 2005  2
      10017933 90001451        0 1999  .
      10017933        0        0 1993  2
      10017933 90001451        0 2000  2
      10017933 90001451        0 2004  2
      10017933        0        0 2007  2
      10017933 90001451        0 2002  2
      10017933        0        0 1997  2
      10017933        0        0 2008  2
      10017933 90001451        0 2003  2
      10017933        0        0 1995  2
      10017968        0        0 1992  2
      10017968        0        0 1994  2
      10017968        0        0 1991  2
      10017968        0        0 1993  2
      10017968        0        0 1995 -7
      10017992 10017933        0 1997  2
      10017992 10017933        0 2001  2
      end
      label values mpid ampid
      label def ampid 0 "mother not in hh", modify
      label values fpid afpid
      label def afpid 0 "father not in hh", modify
      label values smoker asmoker
      label def asmoker 1 "yes", modify
      label def asmoker 2 "no", modify

      Comment


      • #18
        I think this will do it:
        Code:
        set more off
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long(pid mpid fpid) float year byte smoker
        10002251        0        0 1991  2
        10004491        0        0 1992  1
        10004491        0        0 1991  1
        10004521        0        0 1992  1
        10004521        0        0 1993  1
        10004521        0        0 1991  1
        10007857        0        0 1998  2
        10007857        0        0 1991  2
        10007857        0        0 1999  .
        10007857        0        0 1993  2
        10007857        0        0 1992  2
        10007857        0        0 2001  2
        10007857        0        0 2000  2
        10014578        0        0 1993  2
        10014578        0        0 1998  2
        10014578        0        0 1995 -7
        10014578        0        0 2005  2
        10014578        0        0 2002  2
        10014578        0        0 2000  2
        10014578        0        0 1996  2
        10014578        0        0 2007  2
        10014578        0        0 1992  2
        10014578        0        0 2004  2
        10014578        0        0 1991  2
        10014578        0        0 2003  2
        10014578        0        0 1999  .
        10014578        0        0 2008  2
        10014608        0        0 1991  1
        10014608        0        0 2000  2
        10014608        0        0 1993  2
        10014608        0        0 1999  .
        10014608        0        0 2007  2
        10014608        0        0 2002  2
        10014608        0        0 1996  2
        10014608        0        0 1998  2
        10014608        0        0 2005  2
        10014608        0        0 2008  2
        10014608        0        0 1992  2
        10014608        0        0 1995  2
        10014608        0        0 2004  2
        10014608        0        0 2003  2
        10016813        0        0 1991  1
        10016813        0        0 2000 -7
        10016813        0        0 2004 -7
        10016813        0        0 1997  1
        10016813        0        0 1993 -7
        10016813        0        0 1994  1
        10016813        0        0 1998 -7
        10016813        0        0 1999  .
        10016813        0        0 1992  1
        10016813        0        0 2002 -7
        10016848        0        0 1999  .
        10016848        0        0 2001  1
        10016848        0        0 1993 -7
        10016848        0        0 2004  1
        10016848        0        0 1994  1
        10016848        0        0 1998  1
        10016848        0        0 2000  1
        10016848        0        0 2003  1
        10016848        0        0 1991  1
        10016848        0        0 1992  1
        10016848        0        0 2005  1
        10016848        0        0 1997  1
        10016848        0        0 2007  1
        10016848        0        0 2008  1
        10016848        0        0 2002  1
        10016872        0 10016813 2007  2
        10016872        0 10016813 2001  2
        10016872        0        0 2008  2
        10016872        0 10016813 1998 -7
        10016872        0 10016813 2002  1
        10016872        0 10016813 1999  .
        10016872        0 10016813 2000 -7
        10016872        0 10016813 2004 -7
        10016872        0 10016813 2003  2
        10016872        0 10016813 2005  2
        10017933        0        0 1994  2
        10017933        0        0 1991  2
        10017933        0        0 1992  2
        10017933        0        0 1996  2
        10017933        0        0 1998  2
        10017933 90001451        0 2001  2
        10017933 90001451        0 2005  2
        10017933 90001451        0 1999  .
        10017933        0        0 1993  2
        10017933 90001451        0 2000  2
        10017933 90001451        0 2004  2
        10017933        0        0 2007  2
        10017933 90001451        0 2002  2
        10017933        0        0 1997  2
        10017933        0        0 2008  2
        10017933 90001451        0 2003  2
        10017933        0        0 1995  2
        10017968        0        0 1992  2
        10017968        0        0 1994  2
        10017968        0        0 1991  2
        10017968        0        0 1993  2
        10017968        0        0 1995 -7
        10017992 10017933        0 1997  2
        10017992 10017933        0 2001  2
        end
        label values mpid ampid
        label def ampid 0 "mother not in hh", modify
        label values fpid afpid
        label def afpid 0 "father not in hh", modify
        label values smoker asmoker
        label def asmoker 1 "yes", modify
        label def asmoker 2 "no", modify
        
        //    CREATE A SUBSET CONSISTING JUST OF THOSE WHO ARE
        //    SHOWN AS PARENTS IN A GIVEN YEAR
        isid pid year
        preserve
        drop if mpid == 0 & fpid == 0
        keep mpid fpid year
        gen long seq = _n
        reshape long @pid, i(seq year) j(parent) string
        drop if pid == 0
        drop seq
        by pid (parent), sort: assert parent[1] == parent[_N] // NOBODY IS BOTH MOTHER & FATHER
        duplicates drop
        tempfile parent_years
        save `parent_years'
        
        //    MERGE THAT BACK TO THE ORIGINAL DATA TO RETRIEVE
        //    THE CORRESPONDING RECORDS
        restore, preserve
        merge 1:1 pid year using `parent_years', keep(match) nogenerate
        save `"`parent_years'"', replace
        
        //    NOW CREATE DATA SET OF ALL THOSE WHO HAVE A PARENT IN THE DATA
        //    IN A GIVEN YEAR
        restore
        keep if mpid != 0 | fpid != 0
        //    AND APPEND THE PARENTS
        append using `parent_years'
        
        //    AND CREATE INDICATOR VARIABLES
        gen byte is_mother = (parent == "m")
        gen byte is_father = (parent == "f")
        By the way, I notice your smoker variable sometimes has a value -7, which is not covered in your label, and it's coded 1/2 for yes/no. For most purposes in Stata, you would be better off with smoker coded as 0/1. As for the -7, assuming that's some kind of non-response code, it is best coded as a Stata missing value. Having it coded as an actual number is likely to get you into some serious trouble if you have to do any analyses with these data. And if you end up needing to do a logistic regression with smoker as the outcome coding yes and no as 1 and 2 will cause your analysis to fail. So you might want to do this:

        Code:
        label define boolean 0 "Yes" 1 "No"
        recode smoker (2 = 0) (-7 = .a)
        label values smoker boolean
        label values mother father boolean

        Comment


        • #19
          I get an error here


          . by pid (parent), sort: assert parent[1] == parent[_N] // NOBODY IS BOTH MOTHER & FATHER

          1 contradiction in 7600 by-groups
          assertion is false
          r(9);

          Comment


          • #20
            OK, I put that in to make sure that the same person is not listed as both a mother and a father. Apparently there is somebody who is, indeed, listed that way. Unless that person made a gender transition during the period of the survey, a possible but unlikely event, this is probably an error in your data. Either way, you can find this person by running the code in #18 substituting the line you show in #19 with this:

            Code:
            by pid (parent), sort: gen byte switch = parent[1] != parent[_N]
            list if switch // OR -browse if switch- IF YOU PREFER
            stop // STATA WILL COMPLAIN THAT stop IS AN UNRECOGNIZED COMMAND; DON'T WORRY ABOUT THAT
            If you determine that it's an error in your data, then fix the data error and retry the code in #18.

            If you determine that the person involved has actually made a gender transition, then replace the command cited in #19 by:
            Code:
            by pid year (parent), sort: assert parent[1] == parent[_N]
            This code verifies the more lenient assumption that no person is indicated as both a mother and a father in the same year. That seems a reasonable assumption even in the transgender situation. If that weaker assumption is still violated, post back.

            Comment


            • #21
              If I search for the individual I get pid -1 and year 1993. However there is no individual with a pid smaller than 1 if I try to find it.
              If I try to run the code again including the individual itsays false assertion.
              After that I cannot run the code again because switch is already defined even though I clear.

              Comment


              • #22
                If I search for the individual I get pid -1 and year 1993. However there is no individual with a pid smaller than 1 if I try to find it.
                Remember, that at that point in the code, what you see as pid originally comes from either mpid or fpid. So if you are going back to the original data to try to fix things up, you have to look for an observation (or more than one) with mpid = -1 or fpid = -1 (and year == 1993). I'd probably do this by reloading the original data and then running -list if inlist(-1, mpid, fpid) & year == 1993)-. That should pinpoint the offending observation(s).

                If I try to run the code again including the individual it says false assertion.
                Of course it does. If you don't change the data, you will get the same results.

                After that I cannot run the code again because switch is already defined even though I clear.
                In order to rerun the code, you have to start from the very beginning. You have correctly observed that the code creates some variables, and you cannot then attempt to -gen- those variables again.

                Comment


                • #23
                  Originally posted by Clyde Schechter View Post
                  Well, if you want to do it with both mothers and fathers, you could do it separately for each and then combine them. But it would be simpler to do it in one fell swoop:

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input long(pid mpid fpid) int age byte smoker
                  10020233 10020209 10020179 19 1
                  10048243 10048219 10048189 21 2
                  10048278 10048219 10048189 19 2
                  10079599 10079556 10079521 18 2
                  10101977 10101942 10101918 33 2
                  end
                  label values mpid ampid
                  label values fpid afpid
                  label values age aage
                  label values smoker asmoker
                  label def asmoker 1 "yes", modify
                  label def asmoker 2 "no", modify
                  
                  // CREATE A DATA SET WITH JUST MOTHERS
                  preserve
                  keep pid mpid fpid
                  drop if missing(mpid) & missing(fpid)
                  rename pid key
                  reshape long @pid, i(key) j(parent) string
                  drop key
                  duplicates drop
                  tempfile parent_ids
                  save `parent_ids'
                  restore, preserve
                  merge 1:1 pid using `parent_ids', keep(match) nogenerate
                  tempfile parents
                  save `parents'
                  
                  // NOW ELIMINATE FROM ORIGINAL DATA
                  // THOSE OBSERVATIONS WITH NO MOTHER
                  restore
                  drop if missing(mpid) & missing(fpid)
                  gen parent = ""
                  
                  // NOW COMBINE THE MOTHERS AND THEIR OFFSPRING
                  append using `parents'
                  gen byte mother = parent == "m"
                  gen byte father = parent == "f"
                  This code creates a single data set in which everybody is a parent or has a parent in the data set. It also provides a variable, parent, which contains m if the person is a mother, f if a father, and missing value if the person is not a parent. Finally, it includes variables mother and father which are 0/1 coded to indicate who is a mother and who a father, respectively. (In the case of your example data from #5, none of the mother or father id's appear as pid's, so the result is not very interesting--just the original data and an indication that nobody is a parent.)

                  Thank you for using -dataex-.
                  10020233 10101977 10079599 19 1
                  10048243 10101977 10079599 21 2
                  10048278 10048219 10048189 19 2
                  10079599 10079556 10079521 18 2
                  10101977 10101942 10101918 33 2

                  If I try to run the code using the above data (i.e. the first two individuals have the same mother and father), the code gives two sets of observations for the parents. How can I keep a single observation for the mother and father in this scenario?

                  Comment


                  • #24
                    Just add the following command to the end of the code:
                    Code:
                    collapse (firstnm) mpid fpid age smoker parent mother father, by(pid)

                    Comment


                    • #25
                      Thank you for the reply!


                      Comment

                      Working...
                      X