Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Thank you Clyde Schechter. Seriously. I appreciate your help with the code. I note an issue that arises below (id 174 & 175), I refer to #25 where I tried to account for the case where one in a couple state they are separated (marstat==3), in such cases, end should '=1' in the preceeding year (wave==6, not 7), see below:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(id p_id) byte(wave marstat p_marstat begin end)
    174  175  1 1 1 1 0
    174  175  2 1 1 0 0
    174  175  3 1 1 0 0
    174  175  4 . 1 0 0
    174  175  5 . 1 0 0
    174  175  6 . 1 0 0
    174  175  7 . 3 1 0
    175  174  1 1 1 1 0
    175  174  2 1 1 0 0
    175  174  3 1 1 0 0
    175  174  4 1 . 0 0
    175  174  5 1 . 0 0
    175  174  6 1 . 0 0
    175  174  7 3 . 1 1
    175 1227 12 2 . 1 0
    175 1227 13 2 . 0 0
    175 1227 14 2 . 0 0
    175 1227 15 2 . 0 0
    175 1227 16 2 . 0 0
    175 1227 17 2 . 0 0
    175 1227 18 2 . 0 0
    end
    With respect to duration, Would it be correct to counts the rare missing as in wave 14 below, so duration equals 18? If so, what would the rule be for this - a limit of 1-2 continuous missings within, not at start/end, of panels - your thoughts?
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input longid p_id) byte(wave marstat p_marstat begin end) float duration
    128  129  1 1 1 1 0 17
    128  129  2 1 1 0 0 17
    128  129  3 1 1 0 0 17
    128  129  4 1 1 0 0 17
    128  129  5 1 1 0 0 17
    128  129  6 1 1 0 0 17
    128  129  7 1 1 0 0 17
    128  129  8 1 1 0 0 17
    128  129  9 1 1 0 0 17
    128  129 10 1 1 0 0 17
    128  129 11 1 1 0 0 17
    128  129 12 1 1 0 0 17
    128  129 13 1 1 0 0 17
    128  129 15 1 1 0 0 17
    128  129 16 1 1 0 0 17
    128  129 17 1 1 0 0 17
    128  129 18 1 1 0 0 17
    end
    For a meaningful analysis, I think I should only include a minimum number of continuous observations (e.g. 3), unless spells<3 start and end in that period. Does this sound reasonable?

    Last edited by Chris Boulis; 30 Mar 2020, 19:59.

    Comment


    • #32
      Concerning your first question, I see the problem. The definition of end has to be modified to take into account the fact that a batch of observations on the same couple can consist of more than one spell, and each of those spells must be marked as ending (unless it is still in a relationship on the final wave--in which case the last one is censored.)

      Concerning the second question, you really have to make that decision yourself, based on your understanding of the way in which the data were gathered, the reasons for the missingness of some waves for some couples, and the implications for your analyses of misclassification errors in either direction. What I have done in the code below is to count the duration of each spell as extending from the first wave through the last--regardless of how many gaps their might be between them. When you decide how many missing waves are allowable, you can add to the code another command that replaces the duration with missing value if your criterion is exceeded.

      Code:
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long(id p_id) byte(wave marstat p_marstat)
      179  180  1 2 .
      179  180  3 2 .
      179  180  5 1 1
      179  180  6 1 1
      179  180  8 1 1
      179  180  9 1 1
      179  180 10 1 1
      179  180 11 1 2
      179  180 12 1 2
      180  179  1 . 2
      180  179  3 . 2
      180  179  5 1 1
      180  179  6 1 1
      180  179  8 1 1
      180  179  9 1 1
      180  179 10 1 1
      180  179 11 2 1
      180  179 12 2 1
      180  146 14 2 .
      186  864  8 2 .
      186  864  9 2 .
      186  864 10 2 .
      186  864 11 1 .
      186  864 12 1 .
      186  864 13 1 .
      186  864 14 1 .
      186  864 15 1 .
      186  864 16 1 .
      186  864 17 1 .
      186  864 18 1 .
      188  189  1 1 1
      188  189  2 1 1
      188  189  3 1 1
      188  189  4 1 1
      188  189  5 1 1
      188  189  6 1 1
      188  189  7 1 1
      188  189  8 1 1
      188  740 11 2 2
      188  740 12 2 2
      188  740 13 2 2
      188  131 17 2 2
      188  131 18 2 2
      189  188  1 1 1
      189  188  2 1 1
      189  188  3 1 1
      189  188  4 1 1
      189  188  5 1 1
      189  188  6 1 1
      189  188  7 1 1
      189  188  8 1 1
      116  279  2 2 2
      116  279  3 2 2
      116  279  4 2 2
      116  279  5 1 1
      116  279  6 1 1
      116  279  7 1 1
      116  279  8 1 1
      116  279  9 1 1
      116  279 10 1 1
      116 1888 18 2 2
      174  175  1 1 1
      174  175  2 1 1
      174  175  3 1 1
      174  175  4 . 1
      174  175  5 . 1
      174  175  6 . 1
      174  175  7 . 3
      175  174  1 1 1
      175  174  2 1 1
      175  174  3 1 1
      175  174  4 1 .
      175  174  5 1 .
      175  174  6 1 .
      175  174  7 3 .
      175 1227 12 2 .
      175 1227 13 2 .
      175 1227 14 2 .
      175 1227 15 2 .
      175 1227 16 2 .
      175 1227 17 2 .
      175 1227 18 2 .
      128  129  1 1 1
      128  129  2 1 1
      128  129  3 1 1
      128  129  4 1 1
      128  129  5 1 1
      128  129  6 1 1
      128  129  7 1 1
      128  129  8 1 1
      128  129  9 1 1
      128  129 10 1 1
      128  129 11 1 1
      128  129 12 1 1
      128  129 13 1 1
      128  129 15 1 1
      128  129 16 1 1
      128  129 17 1 1
      128  129 18 1 1
      end
      
      
      
      //  VERIFY THAT PARTNERS NEVER ACTIVELY DISAGREE ABOUT WHETHER
      //  THEY ARE IN A RELATIONSHIP
      assert inlist(marstat, 1, 2) == inlist(p_marstat, 1, 2) if !missing(marstat, p_marstat)
      
      //  IDENTIFY OBSERVATIONS WHERE PERSON IS MARRIED OR DEFACTO
      gen byte in_relationship = inlist(marstat, 1, 2) if !missing(marstat)
      replace in_relationship = inlist(p_marstat, 1, 2) if missing(in_relationship)
      
      //  IDENTIFY SPELLS OF THAT STATUS WITH SAME PARTNER
      by id p_id (wave), sort: gen int spell_num = sum(in_relationship != in_relationship[_n-1])
      replace spell_num = . if !in_relationship
      
      //  CALCULATE DURATION OF ALL SPELLS OF A PAIRING IN RELATIONSHIP
      by id p_id spell_num, sort: gen spell_duration = wave[_N] - wave[1] + 1
      by id p_id: egen total_duration_this_pair = total(cond(spell_num != spell_num[_n-1] & in_relationship, ///
          spell_duration, .))
      
      by id p_id spell_num (wave), sort: gen byte begin = (_n == 1)
      by id (wave), sort: gen byte end = ((p_id[_n+1] != p_id) ///
          | (spell_num[_n+1] != spell_num))& _n < _N

      Comment


      • #33
        Thank you a lot Clyde Schechter. I really am thankful and hope one day I will be able to help others as you do. Regarding one point. I notice you sometimes sort by spell and other times spell_num. I do not yet understand why. e.g. for 'begin' you previously used 'spell' but now 'spell_num'. Do you mind explaining the difference in what the code will give with the two? Kind regards, Chris

        Comment


        • #34
          There is no difference, and I shouldn't have done that. The variable is created as spell_num, and I should have consistently called it that throughout. In those places where I just called it spell, I was being lazy. When Stata encounters the beginning of a variable name, it will automatically use the full name of the variable (provided there is only one possible variable name that begins with what was written--had there been two variables, say spell_num and spell_x, then using just spell would throw an error message.) I'm normally careful in my coding not to rely on this, especially here on the Forum where often the data examples shown do not include the full array of variables in the real data set. I don't know why I got sloppy in this thread. But I'm sorry. It really should be spell_num throughout.

          Comment


          • #35
            Hi Clyde Schechter. No worries, thanks for clarifying. That said, I should spend more time trying to understand your code (I appreciate your patience).

            Regarding the issue of duplication and your comment in #30, I should note that the dataset I'm working with is a subset of the full dataset (should only include couples - marstat==1 (married 2 (de facto). But to better deal with the issue of missing values and left-truncation that I see in my data, I think I should use the full dataset, which will thus include marstat==3 (separated) 4 (divorced) 5 (widowed) and 6 (single). I'll apply this spell and your duplicates code to that and advise further re missings, etc.

            Comment


            • #36
              Hi Clyde Schechter. Given I am now working from the full dataset, I don't believe we can "drop marstat" as per the code to address data duplication in #30. Do you have another suggestion? Or could we create two variables from "marstat" instead?
              Click image for larger version

Name:	tabulate_marstat.png
Views:	1
Size:	12.4 KB
ID:	1543991

              Comment


              • #37
                I suppose it depends on what you need to do with the marstat variable. In all the code I have written for you, at least in the more recent posts in this thread, I have first created the in_relationship variable and used that. One I have had in_relationship, marstat is no longer needed for the purpose of creating spells, estimating durations, determining failure vs censorship etc. It may well be that you still need the marstat variable for other purposes, and, if so, then retain it.

                Comment


                • #38
                  Yes of course, silly me, thank you Clyde Schechter. I am having trouble understanding the following code. Could you kindly explain what's going on in each line please.
                  Code:
                  gen partner1 = min(xwaveid, hhpxid)
                  gen partner2 = max(xwaveid, hhpxid)
                  by partner1 partner2 wave (spell), sort: assert inlist(spell, spell[1], .)
                  by partner1 partner2 wave (spell): replace spell = spell[1]
                  This will then help me understand what it means that the assertion is false
                  Code:
                  8,647 contradictions in 364,427 observations
                  assertion is false

                  Comment


                  • #39
                    The idea behind generating partner1 and partner2 is to uniquely identify a pair, so that rather than having both 175 with 174 and 174 with 175, we would have only 174 with 175. The specific implementation is to have partner1 be the numerically smaller person number, and partner2 be the numerically larger person number.

                    After those commands have executed, we now have a pairing defined by partner1 and partner2, but it contains duplicate information gleaned from both versions of the two original pairs.

                    I do not know where you got the assert and replace commands you show from. In the code that I suggested in #30, the variable spell does not appear in those commands: it has in_relationship in every place where you show spell. The purpose of the commands is to verify that the relationship between the partners is consistently reported in both versions: they should either absolutely agree on the value of in_relationship, or perhaps one of them will have a missing value instead. Once that is verified, the -replace- command harmonizes in_relationship to the common value, or to the non-missing value if one is missing. After that the commands
                    Code:
                    keep partner1 partner2 wave in_relationship duplicates drop
                    reduce the data to just one version of the information for each pair.

                    So remove the references to spell from those commands and replace them with in_relationship. (In fact, this block of code should be run before spell is even calculated.)

                    Comment


                    • #40
                      Thank you so much Clyde Schechter. I'm seriously hopeless (I'd like to blame it on cabin fever, or the gremlins, but I know I can't ...). I'm using the code as provided in #30. Thank you for a very clear explanation of the code. I understand now. And yes as you advised in #30, I'm running it before the spell-related code.

                      I notice that the first line
                      Code:
                      gen byte in_relationship = inlist(marstat, 1, 2)
                      does not include
                      Code:
                      if !missing(marstat)
                      as you included for this line of code in #32. Is there a reason or does it not matter?

                      Comment


                      • #41
                        My error. It should include that.

                        Comment


                        • #42
                          In running this code before the spell code, it drops all other variables in the dataset
                          Code:
                          keep partner1 partner2 wave in_relationship
                          duplicates drop
                          Now to run the spell code, and to retain all other variables that I will need for my analysis, could/should I replace the keep line of code with
                          Code:
                          duplicates drop partner1 partner2 wave in_relationship, force
                          Also, do I need to change any of the spell code now we have the new vars: partner1 and partner2?

                          Comment


                          • #43
                            Yes, the spell code all has to be done in terms of partner1 and partner2 rather than id and p_id. If you need to retain the other variables, then yes, the -duplicates drop, force- command will do that But be careful. In the example data we have been working with there have been no variables that are needed or relevant besides partner1 partner2, wave, and in_relationship. I don't know what these other variables are--and if they represent individual-level data about the person identified in id, then the corresponding information about the partner who was originally p_id will be lost. Without knowing more about these other variables, I can't really advise you. It may be necessary to take additional steps to preserve that information.

                            Comment


                            • #44
                              I am working with a lot of individual-level data, specifically relating to individual and family characteristics, age, sex, education, labour, income, expenditure, and so use id and p_id in my code. I wonder if this issue has arisen from the code I used to merge the data: https://www.statalist.org/forums/for...tring-function.

                              Let me know if I should make this in a new post re: how to deal with the issue of duplicate data?

                              Comment


                              • #45
                                Hi Clyde Schechter. I was looking over code in your post in #32
                                Code:
                                gen byte in_relationship  = inlist(mrcurr, 1, 2) if !missing(mrcurr) 
                                replace in_relationship  = inlist(p_mrcurr, 1, 2) if missing(in_relationship)
                                and wondered whether the second line should be
                                Code:
                                replace in_relationship  = inlist(p_mrcurr, 1, 2) if missing(marstat)
                                and not
                                Code:
                                if missing(in_relationship)
                                I appreciate your clarification on this. Kind regards, Chris.

                                Comment

                                Working...
                                X