Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping Missing and changing values over time for fixed effects

    Race fixed effects have changed in my 4 different years. In the first survey some individuals that answered black for example changed to Asian Black in the year after.

    How can I delete all these observations that show different answers in different years?
    I just want to remove all the individuals that answered two different things in the survey
    • For example the same Person (PersonNum=1109) answered: Race=101code in T1 , Race 102code in T2 , Race 101code in T3 .
    or for example
    • For example the same Person (PersonNum=20013) answered: Race=101code in T1 , Race=missing in T2 , Race 101code in T3
    For both scenarios I need to drop them

    the only scenario I want to keep is
    • For example the same Person (PersonNum=12333) answered: Race=101code in T1 , Race=101code in T2 , Race 101code in T3
    so in all these three observations in my survey I just want to see Person 12333 not the other two .......Obs =1

    Is there someone that can give me a quick guidance code, Im new with Stata

  • #2
    Code:
    by PersonNum (Race), sort: drop if Race[1] != Race[_N]
    will do what you ask. Whether this is a wise thing to do is another matter. My experience with this kind of data is that inconsistent reporting of race over time is very frequent. It arises from several sources, I believe. Some arises from self-reporting of race on some occasions and reporting of race by an observer on others. Some arises because there are growing numbers of people who are interracial, but the pose of the question on the survey has no such category, and so they "randomly" choose one of their racial heritages to report. Some arises out of errors in recording of the intended response. Be that as it may, you may be excluding a large number of people from your analysis, and to the extent that being interracial is driving this, they are a special group. If race is truly salient to the key variables in your research questions, you are likely introducing bias by handling it this way. If race is not relevant to the key variables in your research, then why bother--it doesn't matter if race is inconsistently recorded in that case. Think it over. You might be better off imputing race categories to the various patterns of inconsistent race reporting if you need a single race variable for analysis.

    Comment


    • #3
      thank you Clyde Schechter for your quick response

      I will definetly tell my supervisor what you suggest , however when I type
      • tab race year
      I have different numbers


      I expected that if I type the table I will get the same number of Black , white and so on over the years


      Attached Files
      Last edited by Roberto Villa; 14 Mar 2022, 11:45.

      Comment


      • #4
        Well, most longitudinal studies suffer from some degree of attrition, and, depending on the design, late enrollment. So even if every person gives a constant response to the race question over the years, not everyone participates in every year, so the number of respondents of each race can vary from year to year. If you wish to restrict your analyses to people who participated in every year, that is possible, but probably also not a good idea.

        Comment


        • #5
          Perfect Clyde Schechter, what if the same person ID responds he is a white (100) and the next year (652) , is it wise to drop him , and how is coded?

          If someone misses to answer in 2019 but he is white for the other 3 periods I think it will be wise to keep?

          thank you
          ROberto

          Comment


          • #6
            thank you Im very new with stata
            Last edited by Roberto Villa; 14 Mar 2022, 12:07.

            Comment


            • #7
              what if the same person ID responds he is a white (100) and the next year (652)
              That really depends on the context of your study. If for the purposes at hand, a white person is equivalent to a native hawaiian/pacific islander, then you could just recode the race variable to mark all native hawaiian/pacific islander responses to white (or the other way around). But that equivalence is a substantive matter that depends on what specifically you are researching. So it is a question you will need to take up with content area experts. It's not a statistical question.

              If someone misses to answer in 2019 but he is white for the other 3 periods I think it will be wise to keep?
              While the answer to this question, too, depends on the research context, I will stay that, in general, it is unwise to discard data unless you know it to be incorrect and there is no reasonable way to fix the error. This comment is in line with my response in #1 where I indicated that my skeptical view of the entire approach of dropping people with inconsistent responses.

              Comment


              • #8
                Let me add that, having yesterday posted this question and had a fair amount of discussion at

                https://www.statalist.org/forums/for...tudinal-survey

                as a matter of respect to those whose help you seek, that earlier discussion should have been mentioned in the initial post, along with an explanation of why that discussion did not meet your needs.

                Comment


                • #9
                  In a private exchange, I had encouraged Roberto to clarify his question and "repost" it publicly, by which I had in mind that he follow up in the same thread. I suspect that as a new member, he just took my unfortunate use of "repost" too literally.

                  Comment

                  Working...
                  X