Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • match (longitudinal) records from cases and controls

    Dear Statalister,

    I'm working on a project where I have to match records from cases and controls using a longitudinal database. All individuals are HIV seroconverters, meaning that they enter the risk set at the moment of HIV infection. Cases are those who got infected with another virus during follow-up, controls remained HIV-monoinfected throughout follow-up.

    Now I need to match records from cases with control at the moment they became infected with another virus relative to HIV-seroconversion; meaning that the matching needs to take place at an equal time point from HIV seroconversion onwards. for example: an individual got infected with another virus 2 years after HIV seroconversion, so I need to find one or more controls that have records from 2 years after HIV seroconversion onwards.

    Is there a simple way to do this? Do you have any recommendations?

    Until now I have generated a variable that points out which cases have equal records, so for example everyone with a record 2 years after HIV seroconversion has an unique identifier number (variable called match). But I dont know how to match it now with HIV-monoinfected as cases can have a same identifier (same match number) while they can't be controls of each other.

    Any tips are welcome.

    Thank you in advance for your help.

    Best, Daniela.

  • #2
    There's a substantial discussion of implementing incidence density sampling, in which I participated, at
    http://www.statalist.org/forums/foru...-control-study
    The code I offered there might be adaptable to your situation. If you need help with that, I'd recommend per the FAQ that you post a concrete data example. There are also other discussions of matching for case-control studies in the Statalist archive, so a search is worthwhile.

    The basic strategy in my code is to create a file in which a case is matched to all controls that match on the covariates, and then tp discard all those controlst don't fulfill the risk set condition. That being said, my understanding of your situation would be that "time since HIV serconversion" is simply another covariate, the trick being that it is continuous and a time-varying covariate.

    I'm sorry if my response here is not entirely responsive to what you describe in your last paragraph, since I don't quite follow your verbal description.

    Regards, Mike

    Comment


    • #3
      Dear Mike,

      Thank you! I will take a look at the code and try to implement it. If necessary I might contact you if I run into any problem but after searching through the forums you mentioned.

      Best regards, Daniela.

      Comment


      • #4
        Please note this FAQ: http://www.statalist.org/forums/help#private
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Dear Mike,

          First I tried to use "sttocc", but I can only assign a certain number of controls to the cases while I am interested in matching all possible controls to each case.

          Now, I tried you code until "by idcase: gen byte first = (_n ==1)" as I do not need to delete any controls from the cases as in your example.

          In order to make the code work with my longitudinal dataset, I kept one record per case (of the 330 cases). Afterwards, I joined it with all possible records from the controls to the unique records of cases. This seemed to have worked now. However, after joining records a new variable is created, namely "fuctl". I've tried to look what this does/mean but I couldn't find it in the help file or by googeling it.
          In some joined cases, "fuctl' is missing, so i dont know if the joining process went well here.

          I was also wondering what the following piece of code does to the joining or matching proces:
          gen rand = runiform()
          sort rand, stable

          Just in case, this is how my do file looks like with a few changes to your code. I also added an example dataset to the post. I hope you can help me figure this out and I hope my description is more clear now, if not please let me know.

          Thank you in advance.

          _______code______

          local matchvars = "hivmatch"

          by patient: gen fu=(maxvisit-hivserocon)/365.25

          tempfile filectl
          rename patient idctl
          rename fu fuctl // fu is my eventtime variable

          gen rand = runiform()
          sort rand, stable
          drop rand

          ** keeping control records. im keeping these records to after joining I can have all the longitudinal data from each individual.
          preserve

          keep if case==0

          save "allrecords-controls", replace

          restore

          ** keeping case records
          preserve

          drop if case==0
          * keeping cases

          save "allcases-records", replace
          * later I'll append these records to the cases
          restore

          save `filectl' // file of controls

          rename idctl idcase
          rename fuctl fucase

          ** keeping only one record per case

          keep if event==1
          cou

          joinby `matchvars' using `filectl', update


          drop if (idcase == idctl) // self-pairs dropped
          qui count
          di r(N)

          bys idcase: gen recno=_n
          cou if recno==1
          ta recno
          * n=330

          by idcase: gen byte first = (_n ==1) // just to count cases
          qui count if first ==1
          di r(N)
          * n=330

          P.S': match_case & match_ctrl are the same as 'hivmatch" but match_case for the cases and match_ctrl for the controls to see whether the joining process went well.
          Attached Files

          Comment


          • #6
            Sorry, but I don't follow you here. What I'd find useful is a relatively small example of the data you want to start with, and what you want to end up with. And, having variable names in the example data that match the variable names in the code would be helpful. Explaining/showing exactly what about the code you have tried does not work would also be helpful.

            An isolated comment about other things that confuse me:
            1)You said: "However, after joining records a new variable is created, namely "fuctl". I've tried to look what this does/mean but I couldn't find it in the help file or by googeling it."
            That's a variable your code creates. I don't understand why this surprises you.

            You ask: Why the -sort- command? I don't recall the specifics of the code right now, but I would presume that in the original context, its purpose was to randomly match cases and controls. As you want to match all cases to controls, that would not be relevant.

            Regards, Mike

            Comment


            • #7

              Dear Mike,

              Sorry, I'm new to this forum, i'll hope this post is better!

              So I've attached an excel file with a simple example of my starting file and how I want my data to look like after matching. I also attached a start sample dta. ( example dataset- start & final.xlsx example-startdta.dta )

              With respect to the variable "fuctl", it doesn't surprise me that a new variable is made, but I don't know what a value of fuctl=6.8 would mean for example. I could not find any description/explanation of this new variable in the help file. I'm also attaching a sample of the database that was created after using "joinby" where you can see that there are missing values in "fuctl". So mainly I don't understand why in some cases "fuctl" is missing. example-aftermatching.dta

              Now is clear what "gen rand = runiform()" does. Thank you! at least I'm a bit closer now to the matching.

              Please let me know if I can clarify any question you have.

              Thank you again.

              Regards, Daniela.

              Comment


              • #8
                Dear Mike,

                I figured out that "fuctl" was missing when the cases were being used as controls to other cases. So I deleted them as controls and now the matching worked perfectly. Although, I still don't get what "fuctl"means, if you have any description of this variable that is created, would be much appreciated.

                Thank you for your help!!

                regards, Daniela.

                Comment


                • #9
                  I find myself just as confused as before, particularly because your example file does not seem to contain anything about "time," which seemed important to your question. I actually now think I understood you better at your original post.

                  I think you have a fairly answerable question, but almost no one on this list will help you if you post Excel files. (They can contain malware, and many of the most able people on this list are not Excel users.) I'd suggest:

                  1) Take another look at the FAQ about how to post examples.

                  2) Reframe your question something like this:
                  "Here is an example of the data I have, along with documentation of the meaning of the variables. I want to end up with data like this. I have tried this code and here is how it does not do what I want." or "I have tried this code and it does part of what I need, but I still need it to do this other thing."

                  3) Get a colleague, who need not be knowledgeable about Stata, to look at your posting and offer advice about how to communicate your question. (Besides whatever comment a colleague can offer, I often find that in trying to explain my thinking to someone I know, my understanding of my problem is clarified.)

                  I'm suggesting essentially a new version of your original question, with your initial explanation *integrated* with example data and code.

                  I think when I and others have a clear understanding of your problem, and when example files and code are offered in a format more standard to this forum, I or someone else here will be able to help your work our your question.

                  Regards, Mike

                  Comment


                  • #10
                    Note specially FAQ 12 about how to post code and results on the forum between CODE delimiters, described.
                    Steve Samuels
                    Statistical Consulting
                    [email protected]

                    Stata 14.2

                    Comment

                    Working...
                    X