Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging m:1 creates repeated observations

    Good evening,

    I merged m:1 a BP dataset with BP and heart rate variables with a demographic dataset with age, gender, and education variables (code below). Afterward, in my new merged file, I have repeated observations of the age, gender, and education variables as shown in the data example below. I would like to know what I am doing wrong in this merge process so I can fix it such that I (1) do not have repeated observations of the demographic variables or (2) figure out a way to examine my data as in a Table 1 or analyze my data e.g. in mixed-level models such that I am not using the repeated observations of age and gender in my analysis. Should I have done it the other way round and merged 1:m with the BP dataset being the using file.

    Here is the code I used in merging:

    Code:
    use  "/Volumes/Datasets/BP Data/BP.dta" 
    merge m:1 id visit using "/Volumes/Datasets/Demographic Data/Demo.dta"
    drop _merge
    save "/Volumes/Datasets/Merged Datasets/Demo_BP_merged.dta", replace

    Here is the sample of the merged dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id byte age str6 gender float education int bp float map int hr double(time datetime) float visit
    1 61 "Female" 1 113    88  60          82800000      1.8498204e+12 0
    1 61 "Female" 1 129    91  78          32400000        1.84977e+12 0
    1 61 "Female" 1  84    61  69          68400000       1.849806e+12 0
    1 61 "Female" 1 131    95  73          70380000      1849807980000 0
    1 61 "Female" 1 104    76  59          12600000      1.8498366e+12 0
    1 61 "Female" 1 112    78  69          79200000      1.8498168e+12 0
    1 61 "Female" 1 136    99  65          27540000      1849765140000 0
    1 61 "Female" 1 117    81  76          54000000      1.8497916e+12 0
    1 61 "Female" 1 117    93  61          84600000      1.8498222e+12 0
    1 61 "Female" 1 110    79  85          52380000      1849789980000 0
    1 61 "Female" 1 123    80  62           7320000      1849831320000 0
    1 61 "Female" 1 119    86  81          55800000      1.8497934e+12 0
    1 61 "Female" 1  95    68  61          19800000      1.8498438e+12 0
    1 61 "Female" 1  99    71  57          18120000      1849842120000 0
    1 61 "Female" 1 117    78  73          41400000       1.849779e+12 0
    1 61 "Female" 1 125    92  74          66600000      1.8498042e+12 0
    1 61 "Female" 1 111    83  58           1800000      1.8498258e+12 0
    1 61 "Female" 1 123    86  70          72000000      1.8498096e+12 0
    1 61 "Female" 1 112    84  61          16200000      1.8498402e+12 0
    1 61 "Female" 1  98    73  60          81000000      1.8498186e+12 0
    1 61 "Female" 1 104    83  67          37800000      1.8497754e+12 0
    1 61 "Female" 1  98    68  80          45000000      1.8497826e+12 0
    1 61 "Female" 1 103    72  69          61200000      1.8497988e+12 0
    1 61 "Female" 1 112    50  54          43200000      1.8497808e+12 0
    1 61 "Female" 1 105    83 125            180000      1849824180000 0
    1 61 "Female" 1 122    94  77          64980000      1849802580000 0
    1 61 "Female" 1  93    57  74          73800000      1.8498114e+12 0
    1 61 "Female" 1 102    75  57           3600000      1.8498276e+12 0
    1 61 "Female" 1  88    65  74          75600000      1.8498132e+12 0
    1 61 "Female" 1  96    72  61          77400000       1.849815e+12 0
    1 61 "Female" 1  89    68  60           9000000       1.849833e+12 0
    1 61 "Female" 1  98    62  77          59400000       1.849797e+12 0
    1 61 "Female" 1 101    68  60          10800000      1.8498348e+12 0
    1 61 "Female" 1 112    79  63          14400000      1.8498384e+12 0
    1 61 "Female" 1 117    83  57           5400000      1.8498294e+12 0
    1 61 "Female" 1 110    81  74          57600000      1.8497952e+12 0
    1 61 "Female" 1  89    74  71          46980000      1849784580000 0
    1  . ""       . 115    86  66 11339999.99975586 1859684939999.9998 1
    1  . ""       . 112    86  75          83520000      1859670720000 1
    1  . ""       . 112    73  70          85140000      1859672340000 1
    1  . ""       . 125    83  76 76139999.99975586 1859663339999.9998 1
    1  . ""       . 111    83  65 6120000.000244141 1859679720000.0002 1
    1  . ""       . 101    73  64          63540000      1859650740000 1
    1  . ""       . 124    91  63          38760000      1859625960000 1
    1  . ""       .  81    54  66           4140000      1859677740000 1
    1  . ""       . 113    78  72           2340000      1859675940000 1
    1  . ""       . 121    93  78          58320000      1859645520000 1
    1  . ""       . 103    75  66 16739999.99975586 1859690339999.9998 1
    1  . ""       . 116    84  67          50940000      1859638140000 1
    1  . ""       . 105    78  71           7740000      1859681340000 1
    1  . ""       .  98    70  61 22139999.99975586 1859695739999.9998 1
    1  . ""       .  98    63  72 81539999.99975586 1859668739999.9998 1
    1  . ""       .  95    61  71          79740000      1859666940000 1
    1  . ""       . 121    93  79          74520000      1859661720000 1
    1  . ""       .  96    67  73          47340000      1859634540000 1
    1  . ""       . 107    70  68          52920000      1859640120000 1
    1  . ""       . 100    65  62          23940000      1859697540000 1
    1  . ""       . 101    75  95          67140000      1859654340000 1
    1  . ""       . 112    75  67          78120000      1859665320000 1
    1  . ""       . 128    93  60          38580000      1859625780000 1
    1  . ""       . 118    91  71           9540000      1859683140000 1
    1  . ""       . 107    72  72 49139999.99975586 1859636339999.9998 1
    1  . ""       . 101    68  66          45540000      1859632740000 1
    1  . ""       . 102    71  62          25740000      1859699340000 1
    1  . ""       . 113    81  60          68940000      1859656140000 1
    1  . ""       . 120    90  60          38700000      1.8596259e+12 1
    1  . ""       . 113    85  64          61740000      1859648940000 1
    1  . ""       .  99    69  58          20340000      1859693940000 1
    1  . ""       .  91    60  71 65339999.99975586 1859652539999.9998 1
    1  . ""       . 100    68  65 539999.9997558594 1859674139999.9998 1
    1  . ""       . 125    93  79 43739999.99975586 1859630939999.9998 1
    1  . ""       . 137    98  88          41940000      1859629140000 1
    1  . ""       . 100    79  91 70920000.00024414 1859658120000.0002 1
    1  . ""       . 111    78  69          13140000      1859686740000 1
    1  . ""       . 112    71  61          15120000      1859688720000 1
    1  . ""       . 118    86  75 59939999.99975586 1859647139999.9998 1
    2 55 "Female" 1 117 87.52  73          51360000      1877609760000 0
    2 55 "Female" 1  91 62.19  72           2760000      1877647560000 0
    2 55 "Female" 1 121 90.85  73          49800000      1.8776082e+12 0
    2 55 "Female" 1 118 87.85  71          31560000      1877676360000 0
    2 55 "Female" 1 121  94.2  74          33360000      1877678160000 0
    2 55 "Female" 1 120 85.16  69          35160000      1877679960000 0
    2 55 "Female" 1 110 85.21  83          65760000      1877624160000 0
    2 55 "Female" 1 113 84.86  74          53160000      1877611560000 0
    2 55 "Female" 1 113 89.55  80          29940000      1877674740000 0
    2 55 "Female" 1 104 77.87  72          85560000      1877643960000 0
    2 55 "Female" 1 139 97.46  71          40560000      1877685360000 0
    2 55 "Female" 1 125 93.51  72          42360000      1877600760000 0
    2 55 "Female" 1 103  76.2  75           8160000      1877652960000 0
    2 55 "Female" 1  99  72.2  60          18960000      1877663760000 0
    2 55 "Female" 1  90  69.9  72           9960000      1877654760000 0
    2 55 "Female" 1 106 81.88  74            960000      1877645760000 0
    2 55 "Female" 1 127 92.16  82          38940000      1877683740000 0
    2 55 "Female" 1  98 72.54  69          24360000      1877669160000 0
    2 55 "Female" 1 109 85.55  70          38760000      1877597160000 0
    2 55 "Female" 1 101 73.53  78          83940000      1877642340000 0
    2 55 "Female" 1 112 89.89  69          38940000      1877597340000 0
    2 55 "Female" 1 119 90.86  78          74760000      1877633160000 0
    2 55 "Female" 1 114 99.26  87          71340000      1877629740000 0
    2 55 "Female" 1 126 93.84  92          73140000      1877631540000 0
    end
    format %tc_HH:MM time
    format %tcNN-DD-CCYY_HH:MM:SS datetime
    label values id id
    label values education education
    label def education 1 "College", modify
    label values visit visit
    label def visit 0 "0: Baseline", modify
    label def visit 1 "1: Follow-up", modify
    Question 3, what if I have to merge the merged file produced above with a file containing at 3 observations for BP medication for each person/id. would that then be a 1:1 merge or a 1:m merge?

    Thank you

  • #2
    I think that your problem is not that you have duplicates for the patient characteristic variables (demographics), but rather that you have missing values for those variables for all but the zeroeth (I guess, baseline) visit. In order to fix that, merge only on the patient ID and do not include clinic visit.
    Code:
    merge m:1 id /* visit */ using "/Volumes/Datasets/Demographic Data/Demo", ///
        assert(match using) nogenerate
    I don't see a Table 1, but you don't have to worry about fitting mixed effects models with the invariant patient characteristic data carried forward through all clinic visits.

    For your third question, what you do in management will depend upon what you want to do with your medication information, but from the sound of it (medication regimen seems to be invariant within patient), you'd first merge the medication dataset and patient characteristic dataset before proceeding.
    Last edited by Joseph Coveney; 06 Oct 2020, 20:53.

    Comment


    • #3
      Thank you.

      I tried taking out the visit when merging, but unfortunately, that did not solve the issue of the age being repeated for each participant (please see data example). I was hoping that there would not be repetitions so that I would be able to find the average age of the persons in the dataset or the proportion of persons by gender. With the repetitions, if I had 20 people in my original demographic dataset because each person's age is repeated on average 40 times, now I have a mean age that is averaged over 800 "persons" (yes, the average age for these 800 persons would be the same as with the original 20 persons, but for my categorical variables e.g., gender and education (and etc.), if I want to display my Table 1 data in "n(%)" format (see sample Table 1 below), My n's would be artificially large. Additionally, I am afraid if it would affect my estimations/predictions if I were to use age or gender in analysis, for example when I adjust for age in regressions.

      I believe (and correct me if I am wrong) that it would be prudent to not have repetitions of the age observations, for example, to obtain accurate results.

      I feel a bit stumped about how to address the original issue and I am wondering if creating a variable that I could condition my analysis codes on would help address this issue or if I need to delete the repeated observations in each variable so that age, for example, shows up only once.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long id byte age str6 gender float education int(bp hr) float map double(time datetime) float visit
      1 61 "Female" 1 102  57    75           3600000      1.8498276e+12 0
      1 61 "Female" 1 104  59    76          12600000      1.8498366e+12 0
      1 61 "Female" 1  99  57    71          18120000      1849842120000 0
      1 61 "Female" 1 104  67    83          37800000      1.8497754e+12 0
      1 61 "Female" 1  88  74    65          75600000      1.8498132e+12 0
      1 61 "Female" 1 122  77    94          64980000      1849802580000 0
      1 61 "Female" 1 112  61    84          16200000      1.8498402e+12 0
      1 61 "Female" 1  98  80    68          45000000      1.8497826e+12 0
      1 61 "Female" 1 117  76    81          54000000      1.8497916e+12 0
      1 61 "Female" 1 125  74    92          66600000      1.8498042e+12 0
      1 61 "Female" 1  89  60    68           9000000       1.849833e+12 0
      1 61 "Female" 1 105 125    83            180000      1849824180000 0
      1 61 "Female" 1 112  63    79          14400000      1.8498384e+12 0
      1 61 "Female" 1 131  73    95          70380000      1849807980000 0
      1 61 "Female" 1 110  74    81          57600000      1.8497952e+12 0
      1 61 "Female" 1 117  73    78          41400000       1.849779e+12 0
      1 61 "Female" 1 101  60    68          10800000      1.8498348e+12 0
      1 61 "Female" 1  89  71    74          46980000      1849784580000 0
      1 61 "Female" 1 111  58    83           1800000      1.8498258e+12 0
      1 61 "Female" 1 129  78    91          32400000        1.84977e+12 0
      1 61 "Female" 1 123  62    80           7320000      1849831320000 0
      1 61 "Female" 1 112  54    50          43200000      1.8497808e+12 0
      1 61 "Female" 1 119  81    86          55800000      1.8497934e+12 0
      1 61 "Female" 1 103  69    72          61200000      1.8497988e+12 0
      1 61 "Female" 1 117  57    83           5400000      1.8498294e+12 0
      1 61 "Female" 1  84  69    61          68400000       1.849806e+12 0
      1 61 "Female" 1  98  77    62          59400000       1.849797e+12 0
      1 61 "Female" 1  96  61    72          77400000       1.849815e+12 0
      1 61 "Female" 1 117  61    93          84600000      1.8498222e+12 0
      1 61 "Female" 1  98  60    73          81000000      1.8498186e+12 0
      1 61 "Female" 1 123  70    86          72000000      1.8498096e+12 0
      1 61 "Female" 1  95  61    68          19800000      1.8498438e+12 0
      1 61 "Female" 1 136  65    99          27540000      1849765140000 0
      1 61 "Female" 1  93  74    57          73800000      1.8498114e+12 0
      1 61 "Female" 1 113  60    88          82800000      1.8498204e+12 0
      1 61 "Female" 1 112  69    78          79200000      1.8498168e+12 0
      1 61 "Female" 1 110  85    79          52380000      1849789980000 0
      1 61 "Female" 1  95  71    61          79740000      1859666940000 1
      1 61 "Female" 1 100  65    68 539999.9997558594 1859674139999.9998 1
      1 61 "Female" 1 128  60    93          38580000      1859625780000 1
      1 61 "Female" 1 111  65    83 6120000.000244141 1859679720000.0002 1
      1 61 "Female" 1 137  88    98          41940000      1859629140000 1
      1 61 "Female" 1 107  72    72 49139999.99975586 1859636339999.9998 1
      1 61 "Female" 1 121  78    93          58320000      1859645520000 1
      1 61 "Female" 1  98  61    70 22139999.99975586 1859695739999.9998 1
      1 61 "Female" 1 102  62    71          25740000      1859699340000 1
      1 61 "Female" 1 103  66    75 16739999.99975586 1859690339999.9998 1
      1 61 "Female" 1 101  66    68          45540000      1859632740000 1
      1 61 "Female" 1 112  61    71          15120000      1859688720000 1
      1 61 "Female" 1 107  68    70          52920000      1859640120000 1
      1 61 "Female" 1 113  72    78           2340000      1859675940000 1
      1 61 "Female" 1 113  60    81          68940000      1859656140000 1
      1 61 "Female" 1 111  69    78          13140000      1859686740000 1
      1 61 "Female" 1  99  58    69          20340000      1859693940000 1
      1 61 "Female" 1 101  95    75          67140000      1859654340000 1
      1 61 "Female" 1 112  70    73          85140000      1859672340000 1
      1 61 "Female" 1  91  71    60 65339999.99975586 1859652539999.9998 1
      1 61 "Female" 1 105  71    78           7740000      1859681340000 1
      1 61 "Female" 1 120  60    90          38700000      1.8596259e+12 1
      1 61 "Female" 1  98  72    63 81539999.99975586 1859668739999.9998 1
      1 61 "Female" 1 125  76    83 76139999.99975586 1859663339999.9998 1
      1 61 "Female" 1 113  64    85          61740000      1859648940000 1
      1 61 "Female" 1 100  91    79 70920000.00024414 1859658120000.0002 1
      1 61 "Female" 1 101  64    73          63540000      1859650740000 1
      1 61 "Female" 1 116  67    84          50940000      1859638140000 1
      1 61 "Female" 1 112  67    75          78120000      1859665320000 1
      1 61 "Female" 1 115  66    86 11339999.99975586 1859684939999.9998 1
      1 61 "Female" 1 100  62    65          23940000      1859697540000 1
      1 61 "Female" 1 118  71    91           9540000      1859683140000 1
      1 61 "Female" 1 121  79    93          74520000      1859661720000 1
      1 61 "Female" 1  96  73    67          47340000      1859634540000 1
      1 61 "Female" 1 112  75    86          83520000      1859670720000 1
      1 61 "Female" 1 125  79    93 43739999.99975586 1859630939999.9998 1
      1 61 "Female" 1 118  75    86 59939999.99975586 1859647139999.9998 1
      1 61 "Female" 1 124  63    91          38760000      1859625960000 1
      1 61 "Female" 1  81  66    54           4140000      1859677740000 1
      2 55 "Female" 1 113  74 84.86          53160000      1877611560000 0
      2 55 "Female" 1 105  73 80.88          81960000      1877640360000 0
      2 55 "Female" 1 125  72 93.51          42360000      1877600760000 0
      2 55 "Female" 1 112  81  91.9          63960000      1877622360000 0
      2 55 "Female" 1  96  66 72.55          13560000      1877658360000 0
      2 55 "Female" 1  98  69 72.54          24360000      1877669160000 0
      2 55 "Female" 1 139  71 97.46          40560000      1877685360000 0
      2 55 "Female" 1  97  72 74.89           6540000      1877651340000 0
      2 55 "Female" 1  92  68 75.25          22560000      1877667360000 0
      2 55 "Female" 1 121  74  94.2          33360000      1877678160000 0
      2 55 "Female" 1 106  74 81.88            960000      1877645760000 0
      2 55 "Female" 1  88  65 72.59          27960000      1877672760000 0
      2 55 "Female" 1  86  74 67.91          11760000      1877656560000 0
      2 55 "Female" 1 127  68 92.16          39060000      1877597460000 0
      2 55 "Female" 1 112  84 80.51          76560000      1877634960000 0
      2 55 "Female" 1 106  74 83.89          58560000      1877616960000 0
      2 55 "Female" 1  91  68 74.92          15360000      1877660160000 0
      2 55 "Female" 1 127  82 92.16          38940000      1877683740000 0
      2 55 "Female" 1 104  77 71.84          80160000      1877638560000 0
      2 55 "Female" 1  99  60  72.2          18960000      1877663760000 0
      2 55 "Female" 1  91  72 62.19           2760000      1877647560000 0
      2 55 "Female" 1 127  80 96.85          45960000      1877604360000 0
      2 55 "Female" 1 109  70 85.55          38760000      1877597160000 0
      2 55 "Female" 1 101  78 73.53          83940000      1877642340000 0
      end
      format %tc_HH:MM time
      format %tcNN-DD-CCYY_HH:MM:SS datetime
      label values id id
      label values education education
      label def education 1 "College", modify
      label values visit visit
      label def visit 0 "0: Baseline", modify
      label def visit 1 "1: Follow-up", modify

      Click image for larger version

Name:	Screen Shot 2020-10-07 at 2.45.16 AM.png
Views:	1
Size:	69.4 KB
ID:	1575961


      Thank you

      Comment


      • #4
        First of all, you create that table using the patient characteristics (demographics) dataset, and not on the merged dataset.

        Second, you're wrong if your regressions are over visits and you wish to have patient characteristics as predictors--you must have the patient characteristics present at each visit. If you delete the repeated observations of the patient characteristics in the merged dataset, then you will have only one visit.

        Comment


        • #5
          Thank you for clarifying. I did indeed create the demographics table (Table 1) from the baseline dataset. I guess I am just used to having one main dataset from which I do all my analysis.

          There were only two visits -one baseline and one follow up. visit during which data was collected. However, the dataset I have shows the demographic data being entered only for the baseline visit.

          I am wondering if this would be fixed if I duplicate the dataset, have an indicator variable for baseline in one dataset, then have an indicator for follow up in the duplicated dataset and then put them together such that the dataset has the same demographic variables for baseline and follow up ( I am thinking this is a terrible solution and perhaps a big no-no)

          Otherwise (preferred option), how does one copy a row of observations (for baseline) to another row for follow up?

          Comment


          • #6
            Originally posted by Ruth-Alma Turkson-Ocran View Post
            the dataset I have shows the demographic data being entered only for the baseline visit.
            Not to me. From what you show above in #3, patient characteristic data are present for all observations, including those for the follow-up visit. Am I missing something?

            I am wondering if this would be fixed if I duplicate the dataset, have an indicator variable for baseline in one dataset, then have an indicator for follow up in the duplicated dataset and then put them together such that the dataset has the same demographic variables for baseline and follow up. . . Otherwise (preferred option), how does one copy a row of observations (for baseline) to another row for follow up?
            Just as I showed: merge on patient ID alone and not on both patient ID and visit number. This results in the patient-characteristic data's being propagated throughout each patient's entire set of observations, including those for follow-up visit.

            I really don't understand what your problem is here. Looking at the brief listing that you posted in #3, everything that you seem to want is already present in the merged dataset.

            Comment

            Working...
            X