Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data: Only recognise first marital status change

    Hello!

    I have a data set as shown below. I would like to only analyse the change in behaviour the year before person was widowed, and the year after the person was widowed and remains as a widow (i.e. not remarried in the 2nd consecutive year but can be remarried after 2 years of becoming a widow although after the 2 years, this will not be of interest to me).


    The id below was only widowed for one recorded time period and was remarried in the next recorded time period and therefore, i will not be able to use an id like this.
    Click image for larger version

Name:	Screenshot 2021-02-10 at 11.06.50 PM.png
Views:	1
Size:	65.2 KB
ID:	1593872


    Therefore, how do I only keep variables whose marital status changes from married to widowed and remains as widowed for at least 2 consecutive years?

    Thank you,
    Sarah.

  • #2
    Your question is not entirely clear. If you have somebody who has this married-widowed-widowed sequence that you are targeting, I cannot tell if you want to keep only those three observations or if you want to keep all of that id's data. Also, if you do not want to keep all of that id's data, what do you want to do if one person has more than a single married-widowed-widowed sequence. Additional unclarity arises because your example data shows that there can be gaps in the years. Do you require the three observations of marital-widowed-widowed to be in consecutive years? Or is it enough that they just be in the next available data?

    Also you have posted your data example in the least useful way possible: as a screenshot. There is no way to import screenshots of data into Stata to develop and test code. Moreover, while the size and resolution of your screenshot is good on my setup, often they are unreadable. Also screenshots omit important metadata that can be crucial for code. For example, I can tell from the blue color that the marital variable is a value-labeled numeric variable. But without knowing either name of the value label, or its actual coding, there is no way I can referr to the values "married" and "widowed" in the code I write. The helpful way to show example data here is using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    All that said, to point you in the right direction, I will assume a) that in your data married is coded as 1 and widowed as 2, b) you want to keep all of the data for any id that has any married-widowed-widowed sequence, and c) the married-widowed-widowed sequence must involve consecutive years--if there is a gap in the years in that sequence, it doesn't count.

    Code:
    xtset id year
    by id (year), sort: egen byte keeper = max(marital == 1 & F1.marital == 2 ///
        & F2.marital == 2)
    keep if keeper
    Note: Untested because usable example data was not provided. Beware of typos.

    Comment


    • #3
      Hi Clyde, thank you very much for your response. Attached is my dataset http://gss.norc.org/documents/stata/GSS_stata.zip .

      Yes your assumptions (a) and (c) are correct. However assumption (b) is incorrect as I am analysing whether a spousal death changes a person's political view therefore, I am looking at time: T(-1), the year right before a spouse died, T(0), the year a spouse died i.e. the first year as a widow, and T(+1) the second year as a widow. And thus I would only need data on a person for these 3 consecutive years and any other data outside this time period is irrelevant.

      Thank you very much.

      Comment


      • #4
        and if i may add, I would only like to look at the first time a person becomes a widow and not if they become a widow for the 2nd time after re-marrying

        Comment


        • #5
          Like many others on this Forum, I don't download attachments from people I don't know due to the risk of malware. That's why in #2 I advised the use of -dataex- to show example data. Anyway, I haven't looked at your attachment. But continuing with assumptions a) and c), which you have verified, and changing assumption b) in accordance with what you wrote in #3 & #4, the code changes somewhat:

          Code:
          xtset id year
          by id (year), sort: gen int widowing_episode_num = sum(marital == 1 & F1.marital == 2 ///
              & F2.marital == 2)
          keep if widowing_episode_num == 1

          Comment

          Working...
          X