KMATCH Matching Diagnostics Tests Availability

Ondrej Dvoulety

Join Date: Jul 2017

Posts: 23
#1

KMATCH Matching Diagnostics Tests Availability

20 Aug 2021, 00:55

I started to work with the KMATCH command for propensity score matching and I am a bit struggling with the post-matching diagnostics. I found there are excellent graphical demonstrations like kmatch summarize, density, cumul and box. However, I could not found any of the "established" diagnostics like Pseudo R-squared of unmatched and matched samples, t-tests testing characteristics of both groups, Mean and Median Bias (average for the characteristics), which are important for me to see, that I am correct, and also for the reviewers. Could you, please, help me with that?
Tags: Checks, diagnostics, kmatch, propensity score matching, psm
Ondrej Dvoulety

Join Date: Jul 2017

Posts: 23
#2

20 Aug 2021, 01:28

As I continue with the code, I understood that I need to find a way how to combine "KMATCH" and "PSTEST" syntax. Thus, I figured out, that I need to use code "generate" after "KMATCH", which generates matching weights (_KM_mw), which may be used for PSTEST:

kmatch ps TREATED Rok_vzniku dKraj_realizace1 dKraj_realizace2 dKraj_realizace3 dKraj_realizace4 dKraj_realizace5 dKraj_realizace6 dKraj_realizace7 dKraj_realizace8 dKraj_realizace9 dKraj_realizace10 dKraj_realizace11 dKraj_realizace12 dKraj_realizace13 dKraj_realizace14 FO SRO AS Other SMALL MEDIUM LARGE dsector1 dsector2 dsector3 dsector4 dsector5 dsector6 dsector7 dsector8 dsector9 dsector10 (ROA1415), att generate

pstest (Rok_vzniku dKraj_realizace1 dKraj_realizace2 dKraj_realizace3 dKraj_realizace4 dKraj_realizace5 dKraj_realizace6 dKraj_realizace7 dKraj_realizace8 dKraj_realizace9 dKraj_realizace10 dKraj_realizace11 dKraj_realizace12 dKraj_realizace13 dKraj_realizace14 FO SRO AS Other SMALL MEDIUM LARGE dsector1 dsector2 dsector3 dsector4 dsector5 dsector6 dsector7 dsector8 dsector9 dsector10), both treated(TREATED) mweight(_KM_mw) graph
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 663
#3

20 Aug 2021, 02:45

I have not used pstest in ages but I think your strategy is probably a good choice as pstest allows you to replicate the results with matching weights. As mentioned in my other post, if you need to you can replicate all results on your own using these weights (and removing not used observations, if present).

Best wishes

(Stata 16.1 MP)
Comment
Ondrej Dvoulety

Join Date: Jul 2017

Posts: 23
#4

20 Aug 2021, 02:50

Thanks a lot, Felix, I double-checked that and it works perfectly. I guess now I have what I needed. I truly admit that KMATCH is much faster than PSMATCH2. I am doing a lot of matching studies, so this will be used frequently.
Comment
Ondrej Dvoulety

Join Date: Jul 2017

Posts: 23
#5

20 Aug 2021, 07:52

[QUOTE=Felix Bittmann;n1624164] Dear Felix, I have one more question. Is there any chance I may first separately estimate logit, then save the score and use command kmatch to match the samples? The issue is that I am working with panel data, and I would like to merge my groups on the exact outcome year (this is why I use ematch for a specific year and a condition of a particular year when watching the outcomes)., but I would like to do the matching on a different period, i.e., on the pre-treatment. Any suggestions would be great.
Comment

Ben Jann

Join Date: Sep 2014
Posts: 257

20 Aug 2021, 08:06

Hi Ondrej,

I do not know what exactly the "established" diagnostics are, but here are some suggestions that might be helpful:

Code:

webuse cattaneo2, clear

// standardized mean differences before and after after matching, i.e. mean
// difference of X divided by the (unweighted) average of the standard
// deviations of X in the control group and treatment group; the standard
// deviations are obtained from the original data
quietly kmatch ps mbsmoke mage fage prenatal1 mmarried fbaby (bweight), att
kmatch sum, meanonly

// pseudo R2 before and after matching, i.e. pseudo R2 of logit regressing treatment status
// on covariates in the unmatched and the matched data; could of course also use probit
quietly kmatch ps mbsmoke mage fage prenatal1 mmarried fbaby (bweight), att wgen(_W)
logit mbsmoke mage fage prenatal1 mmarried fbaby if e(sample) // before matching
logit mbsmoke mage fage prenatal1 mmarried fbaby [pw=_W]      // after mathing
drop _W

// t-tests of mean differences before and after matching; the trick is to
// include the covariates as outcome variables; NATE will give you the pre
// matching test, ATT will give you the post matching test
kmatch ps mbsmoke mage fage prenatal1 mmarried fbaby (bweight) ///
    (mage fage prenatal1 mmarried fbaby), att nate

// mean and median bias across covariates; don't know how "bias" is defined in
// this context; I use the absolute standardized difference
quietly kmatch ps mbsmoke mage fage prenatal1 mmarried fbaby (bweight), att
kmatch sum, meanonly
mata: mean(abs(st_matrix("r(M)")[,3]))      // before matching (mean)
mata: mean(abs(st_matrix("r(M)")[,6]))      // after matching (mean)
mata: mm_median(abs(st_matrix("r(M)")[,3])) // before matching (median)
mata: mm_median(abs(st_matrix("r(M)")[,6])) // after matching (median)

// graph of standardized biases
// . ssc install coefplot
matrix M = r(M)
coefplot matrix(M[,3]) matrix(M[,6]), noci nooffset xline(0) nolabel ///
    rescale(100) plotlabels(Unmatched Matched) ///
    xti(Standardized % bias across covariates)

Note that I use option wgen() to generate the matching weights as this computes ready-to-use matching weights; the generate() option returns "raw" matching weights that will need further modification depending on application. See the option wgenerate() in the documentation for details. The bottom line is that wgen() will give you want you want.

Furthermore, it seems to me that some care is required when using pstest after kmatch. I did not look into this in detail, but it seemed to me that pstest does not update the statistics for the treatment group in the "matched" comparison if some observations from the treatment group have been lost in the matching due to lack of common support (i.e., for the treatment group pstest seems to use the same values for the both the unmatched and the matched comparison even if the treatment group has been restricted). If there are no treatment observations that could not be matched then the results from pstest for the matched and unmatched means and the standardized differences/biases should coincide with the results obtained as in the above example.

Finally, note that standard errors reported by kmatch will typically tend to be somewhat conservative. This is not much of an issue in a sample as large as the data used in the sample above, but the bias may be more relevant in smaller samples. It is on my list to improve on this, but just did not yet get to it...

If you provide me with a list of what you consider established diagnostics (best including some references), I will see whether I can add at least some of it in a future version of kmatch.

ben

Comment

Ben Jann

Join Date: Sep 2014

Posts: 257
#7

20 Aug 2021, 08:10

Re #5: you can use option pscore() to provide a variable containing the propensity score.
Comment
Ondrej Dvoulety

Join Date: Jul 2017

Posts: 23
#8

20 Aug 2021, 08:21

Thank you very much, Ben, for a quick turnaround. I am working with a very large sample, containing about 2,3 mil. firm-level observations, so it is excellent that I can get quick feedback from your side. I already do not regret switching to kmatch. All information seem to be very much clear. Regarding the pstest application, I thought that "generate" option will provide me with more suitable weights for "pstest". Nevertheless, your suggestion is rather towards using "wgenerate" weights, right? I am not econometrician by background, but it would be great if by default, kmatch would be able to provide at least changes in R-Squared, Tests indicating if the matched and unmatched samples are different in the included covariates and the biases means and medians (I do not know how they were calculated in psmatch2)...
Comment
Ondrej Dvoulety

Join Date: Jul 2017

Posts: 23
#9

20 Aug 2021, 08:39

Originally posted by Ben Jann View Post

Re #5: you can use option pscore() to provide a variable containing the propensity score.

Could you just check that it is OK like this? My outcome variables are ROA etc. (in bracket), condition if specifies the treatment time and "ROK" is an exact year, allowing me to compare outcomes one year after receiving public aid. However, due to the fact I am working with a longitudinal sample, I need to make sure that my outcome variables are estimated in a proper year.

kmatch ps TREATED (ROA AKTIVACELK Trzby_celkem) if TREATED==0 | TREATED==1 & Cas_Podpory == 1, pscore(pscore) ematch(ROK) att
Comment
Ben Jann

Join Date: Sep 2014

Posts: 257
#10

20 Aug 2021, 13:59

re #8: Yes, my general advice would be to use wgenerate() to store matching weights for use in further analyses. The matching weights stored by generate(), for example, are only defined for the control group and are missing in the treatment group (assuming option att has been specified; in case of ate things are more complicated). Strictly speaking, wgenerate() is not needed because all relevant information is also returned by generate(), but you would need to know how to handle this information. I added wgenerate()to make things easier for users.

re #9: looks ok to me. If you specify pscore() then this variable is used instead of an internally estimated propensity score.
Comment
Ondrej Dvoulety

Join Date: Jul 2017

Posts: 23
#11

20 Aug 2021, 14:36

Excellent, Ben, thanks a lot!
Comment

Announcement