Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why does the psmatch2 using 2 nearest neighbors not give a 1:2 result?

    Hi,

    This is my first post to Statalist so please bare with me.
    I am currently working with an observational dataset (approx 5000 observations) wanting to look for treatment effect of the treatment tpa_alle on the outcome mrs3mnd_xlnt (binary categorical). I have used the psmatch2 command for propensity score matching based on treatment status and 10 other variables. I have done a 1:1 matching, but also wish to do a 1:2 matching to see if this gives me similar results.
    This is where my problem arises.
    For the matching with 2 nearest neighbors I used the following code:
    Code:
    psmatch2 tpa_alle c.patientage gender c.nihssinnkomst i.mrspre logreg_af logreg_prebtrx logreg_priormi tidltia livesalone i.hospital_volume_3_groups if  wus_kos_uos==1 & tidlhjerneslag==0, n(2) caliper(0.2) odds logit
    When I try to check how many cases in each group after matching it looks to me like it didn't give me a 1:2 match.
    I tried
    Code:
     tab tpa_alle if _weight!=.
    which shows 804 cases in the untreated group and 526 treated.
    Is this because som controls/untreated cases are used more than once? Other explanations? Other ways of displaying the number of cases in the matched groups?

    I also used the following code to identify the matched pairs and tried to use this to look at the no of cases in each group:
    Code:
    gen pair1 = _id if _treated==0
    replace pair1 = _n1 if _treated==1
    gen pair2 = _id if _treated==0
    replace pair2 = _n2 if _treated==1
    bysort pair1: egen paircount1 = count(pair1)
    bysort pair2: egen paircount2 = count(pair2)
    egen byte paircount = anycount(paircount1 paircount2), values(2)
    Followed by:
    Code:
    tab tpa_alle if paircount!=0
    This however retured 389 treated and 687 untreated and thus left me even more confused as the no of cases in the two group differ depending on apporach when I thought these two options were similar. Any one have any insight on which approach is correct -if any?

    Thankful for your suggestions!
    Cheers
    Mary-Helen

  • #2
    I'll show an example to explain this point. The example investigates the effect of mother smoking during pregnancy (mbsmoke = 1 for smoking mothers, = 0 for non-smoking mothers). For each smoking mother, I'll find 2 nearest non-smoking mothers.

    Code:
    webuse cattaneo2, clear
    psmatch2 mbsmoke mmarried c.mage##c.mage fbaby medu, n(2) logit odds caliper(0.2)
    Then I tabulate the distribution of _weight by mbsmoke.

    Code:
    tab _weight mbsmoke
    
     psmatch2: |
     weight of |
       matched |  1 if mother smoked
      controls | nonsmoker     smoker |     Total
    -----------+----------------------+----------
            .5 |       223          0 |       223
             1 |        90        864 |       954
           1.5 |        77          0 |        77
             2 |        57          0 |        57
           2.5 |        21          0 |        21
             3 |        15          0 |        15
           3.5 |        19          0 |        19
             4 |         8          0 |         8
           4.5 |         8          0 |         8
             5 |         6          0 |         6
             6 |         2          0 |         2
           6.5 |         8          0 |         8
             7 |         6          0 |         6
           7.5 |         4          0 |         4
             8 |         2          0 |         2
           9.5 |         2          0 |         2
    -----------+----------------------+----------
         Total |       548        864 |     1,412
    You can see that the weight of each smoking mother is 1 (864 smoking mothers in total), while the weight differs for the non-smoking mothers because a non-smoker may be used multiple times to match different smokers. If a non-smoker is used once for matching, then its weight is normalized to 0.5; if a non-smoker is used three times, then its weight is 0.5*3 = 1.5.

    So if you'd like to confirm the 1:2 ratio, the code can be

    Code:
    egen n_control = total(_weight*(mbsmoke==0)*2)
    
    sum n_control
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
       n_control |      4,642        1728           0       1728       1728
    1728: 864 = 2: 1.
    Last edited by Fei Wang; 30 Jun 2022, 09:24.

    Comment

    Working...
    X