Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Psmatch - low match even for one variable

    Hello,

    I am having an issue running the psmatch2 code and getting a decent match. My full sample has 1 million + observations, and from 1 million observations even if I limit my characteristics to just SEX (male and female), I still only get 170 matches? My variable is tabulated very clearly:

    'msgender' | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 863,931 80.33 80.33
    1 | 211,500 19.67 100.00
    ------------+-----------------------------------
    Total | 1,075,431 100.00


    Similar with age, when dropping variables to just include age and treatment, I get very few matches. So the issue is not with the variable... I think. My code for using matching based on just sex is:



    psmatch2 treatment i.sex
    psgraph
    pstest i.sex

    sum i.sex if treatment ==1 [aw =_weight]
    sum i.sex if treatment ==0 [aw =_weight]


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte sex float(mar tax_year age treatment)
    0  1 2011 60 1
    1  1 2010 41 1
    1  1 2011 42 1
    0  1 2010 52 1
    0 .a 2011 53 1
    0  1 2010 57 1
    0  1 2011 58 1
    1  1 2011 34 1
    1  1 2010 38 1
    1  1 2011 39 1
    0  1 2010 34 1
    0  1 2011 35 1
    0  2 2010 66 1
    0  2 2011 67 1
    0  2 2010 62 1
    0  2 2011 63 1
    0  1 2010 36 1
    0  2 2010 21 1
    0  2 2010 64 1
    0  2 2011 65 1
    0  1 2010 61 0
    0  1 2011 62 0
    0  3 2010 48 0
    0  3 2011 49 0
    0  2 2010 30 0
    0  2 2011 31 0
    0  1 2010 34 0
    0  1 2011 35 0
    0  1 2010 45 0
    0  1 2011 46 0
    end
    label values mar marLbl
    label def marLbl 1 "Couple", modify
    label def marLbl 2 "Single", modify
    label def marLbl 3 "Wid_Div_Sep", modify
    label def marLbl .a "Missing/Invalid", modify
    ------------------ copy up to and including the previous line ------------------


    Please help!

  • #2
    In your example data, sex == 1 always co-occurs with treatment == 1, so the attempt to build a propensity model of treatment based on sex fails: sex being a perfect predictor, it is omitted from the propensity model.

    If this is also true in your full data set, then that is, if not the sole cause of your problem, certainly a contributory problem because it means that no propensity score can be calculated for sex == 1 observations, so they are not candidates to match anything. If it is not also true in your full data set, then you need to post a better data example, one that reflects the general distribution of these variables in the full data set, and that also reproduces the problem you are having.

    Comment


    • #3
      Hello,

      I wish that was it. Here is the tabulation of sex across treatment:



      | treatment
      'msgender' | 0 1 | Total
      -----------+--------------------------------+----------
      0 | 362,138 501,793 | 863,931
      1 | 91,484 120,016 | 211,500
      -----------+----------------------+----------
      Total | 453,622 621,809 | 1,075,431

      Comment


      • #4
        So, for troubleshooting, you need to post a better example of your data: one that reflects the variables' distributions in the data set as a whole and reproduces the problem you get when you apply -psmatch20-.

        Comment

        Working...
        X