Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • psmatch2 (avoiding matching within the same team) : how do I do propensity matching with conditions?


    Hi,

    I have a question about avoiding "within" matching in PSMATCH2. It seems that there are many posts about within, but not about avoiding within. (Please let me know if there are!! It would be so helpful).


    My data looks like the following basically...

    As you can see, a team has multiple observations, let's call them dyads. Ideally, I would like to calculate pscore for each dyad and match across teams not within. So even if row 1-3 are close in pscore, they should not be matched because they are all dyads of team A. How do I avoid matching within team?
    Team Collaborators Treatment Team's collaborator pick var3....
    A Tom 1 1
    A John 0 1
    A Sam 0 0
    B Tom 0 0
    B John 1 1
    B Sam 1 1

    Thank you for your help!

  • #2
    This is an interesting question. I am not sure if there is a quick or elegant way. I have attempted a somewhat crude solution using kmatch.

    Code:
    ssc install kmatch, replace

    I find up to n nearest neighbors and then select a match that is not in the same group. Here is a toy example where industry is the grouping where I want to avoid matches within.

    Code:
    *** Prepare toy dataset ***
    clear all
    sysuse nlsw88
    keep if inlist(industry, 1, 2, 3, 5)
    keep union wage hours industry
    drop if missing(union, wage, hours)
    
    
    *** Create Unique ID for each observation ***
    sort industry
    gen ID = _n
    
    
    *** Find up to 5 matches in the dataset using kmatch ***
    global treatment union
    global controls wage hours
    kmatch ps $treatment $controls, nn(5) idgenerate
    
    
    *** For each group, create ID bounds ***
    bysort industry (ID): egen p_low = min(ID)
    bysort industry (ID): egen p_high = max(ID)
    
    
    *** Find matches that fall outside the own groups using these bounds ***
    gen finalmatch_ID = .
    forvalues i = 1/5 {
        replace finalmatch_ID = _ID_`i' if missing(finalmatch_ID) & !inrange(_ID_`i', p_low, p_high)
    }
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3

      @Flex Bittmann Thank you for your input!

      I will try what you shared.. I have a quick clarifying question, please let me know if I got this incorrectly..
      Your 'for loop' finds matches that fall outside of the own groups within 5 matches, BUT does not necessarily find the closest one (in terms of pscore) out of those 5. Is this correct?


      Comment


      • #4
        This is actually a very good question. You can use options generate and dxgenerate to also inspect individual PS values. I am not 100% sure about this, but I think the matches are ordered by PS difference, so match1 has the smallest value and so on. However, as we are interested in distances, I think that absolute values are more important for our goal. I have adapted the code to change the order of matches and sort them by absolute PS distance between two matches. Now the code should select the closest match, based on PS.




        Code:
        *** Prepare toy dataset ***
        clear all
        sysuse nlsw88
        keep if inlist(industry, 1, 2, 3, 5)
        keep union wage hours industry
        drop if missing(union, wage, hours)
        
        
        *** Create Unique ID for each observation ***
        sort industry, stable
        gen ID = _n
        
        
        *** Find up to 5 matches in the dataset using kmatch ***
        global treatment union
        global controls wage hours
        kmatch ps $treatment $controls, nn(5) generate idgenerate dxgenerate
        
        
        *** For each group, create ID bounds ***
        bysort industry (ID): egen p_low = min(ID)
        bysort industry (ID): egen p_high = max(ID)
        
        
        *** Sort matches by absolute PS difference ***
        preserve
        keep ID _ID* _DX*
        reshape long _ID_ _DX_, i(ID) j(match)
        replace _DX_ = abs(_DX_)
        sort ID _DX_, stable
        bysort ID: gen newmatch = _n
        drop match
        reshape wide _ID_ _DX_, i(ID) j(newmatch)
        order ID _ID* _DX*
        tempfile sortedmatches
        save `sortedmatches', replace
        restore
        drop _ID_* _DX_*
        merge 1:1 ID using `sortedmatches', nogen
        
        
        *** Find matches that fall outside the own groups using these bounds ***
        gen finalmatch_ID = .
        forvalues i = 1/5 {
            replace finalmatch_ID = _ID_`i' if missing(finalmatch_ID) & !inrange(_ID_`i', p_low, p_high)
        }
        Best wishes

        (Stata 16.1 MP)

        Comment

        Working...
        X