Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PSID - Generating a household ID

    I am using 10 waves of the PSID data with the aim of analysing trends in household charitable giving over time. I want to create a panel with household ID as the Panel ID variable. The problem is that the household ID number is different for each household for each wave. The only identifying variable that is constant over time is at the individual level. Naturally, this wouldn't be a problem if the head of household was the same for each year but for many families the head of household changes at least once over the course of the 10 waves. I have the following variables that are of use:

    id - Unique individual identification number
    familyid - Unique family idenitifcaiton number (different every year)
    famchange - Family composition change number (0 if no change, 1 if change other than head, 2 partner left family, 3 partner now head of household etc.)
    relationtohead - Individual's relation to the head of household (20 if partner, 30 if son or daughter, 33 stepson etc.)

    In my dataset I have every individual that was head of household in at least 1 wave. All of my other variables are at the household level.

    How can I generate a household ID variable that is constant over time and accounts for changes in the head of household?

    I have tried to use dataex but my input statement exceeds linesize limit.

  • #2
    I have used the PSID in my work, as well as HILDA, the similar survey in Australia.

    Your problem is that there is no such thing as a "household" that continues across waves.

    Suppose Alice lives alone in the 2001 and 2003 waves, as the RP (reference person, formerly "head") in her one-person household. Suppose she then is married to Bob in the 2005 and 2007 waves, and perhaps Bob has become the RP and Alice is the SP (spouse/partner, formerly "wife") and they have a child. Suppose she then separates from Bob, but Bob has custody of their child, so in 2009 he and the child remain part of the PSID, as RP and child in their household, while Alice again lives alone as RP of her household. What exactly constitutes the household you want to track across waves?

    That's an easy example. There are much more complicated examples.

    In my work with PSID I have found it helpful to define an ongoing household as the RP and if present the SP, so in the above example there are four households:
    • Alice in 2001 and 2003
    • Alice and Bob in 2005 and 2007
    • Alice in 2009 - which is a new household for my purposes
    • Bob and their child in 2009
    I identify my household by the combination of the RP's and SP's 7-digit identifiers, taking care to put the lower value first since we don't want the household ID to change if the RP and SP exchange roles. For example, if Alice is 1357913 and Bob is 2468024, the household identifiers are
    • 1357913 in 2001 and 2003
    • 13579132468024 in 2005 and 2007
    • 1357913 in 2009
    • 2468024 in 2009

    Comment


    • #3
      Hi William,

      Thanks for your response. I had thought about this and I think I would like to define continuity of a household in cases when famchange = 0, 1, 2, 3 and 4 and then generate a new household in all other cases. I think I can make your solution work for that, so thank you for sharing that.

      Do you happen to know of any literature that might inform me of the best way to approach the problem of defining households?
      Last edited by Ben Grace; 08 Apr 2021, 13:34.

      Comment


      • #4
        Everything I know about the PSID I learned from the documentation at the PSID website https://psidonline.isr.umich.edu/ or from papers linked to from there.

        I'm not a big fan of videos myself, but others have found their video series useful, and in your case, the videos on Merging PSID Records may be helpful.

        Comment


        • #5
          I'd like to support William's arguments in #3 -- well put. The bottom line is that it's not possible to consistently define a longitudinal concept of "the household". You can track individuals over time, and relate them to their household context though at each wave. You might want to argue that a household in base year t=0 is the same household in year t=1 if exactly the same adults (or maybe same adults and children) are co-resident in both years. You will quickly find that the number of such "households" is small and increasingly so as the number of panel waves increase -- whch is what you appear to be finding. There is a lot of household formation and dissolution over time as people arrive (births, partnerships and other co-residences) and depart (death, divorce, separation, kids leaving home, etc.). NB In rotating panels like the US CPS interviewers return to the same address, not to the same "household".

          In short, I strongly recommend that you return to your original assumption that you wish to look at "household charitable giving" over time if you wish to exploit the longitudinal nature of the PSID. Give up on your attempt to derive a "household id" that is the same over time.

          To repeat: there is a good reason that the only longitudinally consistent identifier in the PSID -- and all other "household panel" surveys I know quite well (BHPS, Understanding Society, HILDA, GSOEP) -- is for the individual.

          The only consistent "household identifier" that can be derived for individuals in wave t > 1 is the household ID of the household with which they can be associated in wave t = 1. If a couple household are together at waves 1 and 2 and 3 but split (e.g. divorce) between waves 3 and 4, so they are 2 households at wave 4, you can still give the adults at wave 4 their household ID at wave 1. If the couple had kids sometime after wave 1, you could attribute them with the wave 1 HH id that their currently co-resident parent had. But this sort of ID allocation exercise is most useful for attributing survey design features (base year PSUs and strate) to observatiions in later waves. It's not directly useful as an analytical concept in the way you appear to hope.

          [Related rant] Quite a few labor economists when they use the PSID (and I do mean labor economists, not labour economists -- who often use non-PSID panels) use estimation samples based on men who are household heads in all years of data used. This solves the household ID problem by assuming it away -- only a single person per household is being tracked over time. (This 'solution' is mighty convenient, but rather strange given that headship can change over time; and all women are dropped not only female household heads. The real world is a complicated place and models built on these sorts of samples abstract from that!)

          Comment


          • #6
            Thanks Stephen, the problem that I am trying to overcome is that donations are reported at the household level (or family unit level). One study I have found seems to approach this issue by using a multidimensional panel but I am unfamiliar with this approach. Another study seems to go by household but the specificities of their model and sample are unclear.

            Comment


            • #7
              So, the ID generated by using the formula [(ER30001*1000) + ER30002] is basically a unique ID for each individual (whether head or not), not for each household, right?

              Comment


              • #8
                Originally posted by Ahmed Wasiful Alam View Post
                So, the ID generated by using the formula [(ER30001*1000) + ER30002] is basically a unique ID for each individual (whether head or not), not for each household, right?
                That is exactly correct.

                Comment


                • #9
                  Thanks. So, when I keep only the heads (using the question "Relationship to head") from these unique individuals, I am basically tracking the heads of the households over the years but not the same households over the years, right?

                  Comment


                  • #10
                    You are not tracking "the same households" over the years, that is correct, as Stephen Jenkins discussed. The best you are doing is tracking individuals over the years.

                    But if indeed you tracking individuals over the years, by keeping only observations of heads from each wave, you will do worse than that: you will track those individuals only over the years in which they were heads.

                    It is not unusual - especially in earlier years - for the nominal head to change: for example, when a single female head of household married, the new husband often replaced his wife as the head of the combined household. And if they later separate, the woman may reappear as a head.

                    Comment


                    • #11
                      Williams, thank you so much for your response. In your example, it might be the case that the leaving husband (after the separation) may go and start a new family and become the head of that new family. However, if our goal is to track the same families over the years, we need to stick to the family where the original female head remains and becomes the head again after the separation, right?

                      Having that said, if we drop the "split-off" families and keep only the re-interviewing families and then keep only the heads by using the "relationship to head" variable then we are able to keep the heads of the same families (regardless of whether the head changed) over the years, right?

                      Please let me know what you think. In my research, I need to follow the same families (households) over the sample period regardless of whether the head changed at some point of time within the sample period. Many thanks!

                      Comment


                      • #12
                        I really said all I have to say about tracking households in post #3 above, and Stephen Jenkins said it much better, and considering his academic history, with more assured authority than I, in post #5.

                        You can track individuals over time, and relate them to their household context though at each wave.
                        which seems to be what you describe in post #11. If individual A marries individual B, then later abandons B and their 3 children and moves in with C, it is hard to argue that the A-C household is in any objective sense, other than the presence of A, the "same household" as the A household or the A-B household.

                        Comment


                        • #13
                          Thanks everyone for your contributions so far! This is my first time working with the PSID or really any similar longitudinal data. I believe I'm understanding what you're all saying about the merits of having a household id in the data.
                          However, I have downloaded a set of family-level variables across a span of 25 years, and the data are in wide form with multiple years of observations in each row (which years and how many depends on the row). Looking at the data, it seems like each row is observations a family or household that are somehow linked together across years, but I don't understand how they're being linked together or what variable this linking is based on. Originally, I thought I might be able to use the 1968 interview number since it's listed as a "manual cross year identifier", but I have too many rows with information for the same year that have the same 1968 interview number for this to be the case.

                          Currently my main goal is to reshape long, which I've been able to do by generating a unique id for each row in the data, but creating this kind of trivial id means that it won't be possible to merge it with more data in the future. Does anyone know how the PSID determines which observations to show in the same row in the wide form and how I could generate an id to reshape with that would be consistent with the current format of the data and across different subsets of the PSID family-level variables?

                          Comment


                          • #14
                            Your questions on PSID are best answered by reference to the PSID FAQ and by email to the PSID, as described on the PSID Getting Started page.

                            It is not clear what is used to connect the family data from different waves into a single row in the data file. My guess as a PSID user is that each row represents a single individual who was a household head in one or more of the waves in your data, and shows family-level data for their household during those waves in which the individual was a household head. But this is only a guess; the PSID documentation does not confirm this guess, at least not in my casual searching.

                            In other words, obtaining family-level data from the PSID does not solve your problem of constructing meaningful households over time, if that was your hope.

                            The yearly family identifiers included in your data, such as "2011 FAMILY INTERVIEW (ID) NUMBER", can be used to link to other family data from this wave, or to members of this family in this wave. However, it's of no use as a cross-wave identifier linking families wave-to-wave, as the PSID FAQ tells us.

                            You can reshape long, using a meaningless sequence number for the i() option, and then discard that id after reshaping, because each observation will have the yearly family identifier and the wave or year, which can later be used to merge with to other subsets of PSID family-level variables that have been similarly reshaped.

                            Comment

                            Working...
                            X