I have a query regarding the propensity score matching that I have done for my project. The mean values for the covariates of the control group produced by the command pstest doesn't match the mean values of covariates of the control group, when I calculate them manually by the command summarize. I am unclear as to why this is happening. As per my understanding, these mean values found both via pstest and summarize command should match but they don't. I request you to kindly help me understand why this might be happening and where could I have possibly gone wrong. I used the following steps to conduct the ps match:
1) I run the 'psmatch2 command' for propensity score matching based on the 2005 data. I did not specify any 'kind' of matching such as NN matching or Kernel matching. So, I am assuming the default matching method of psmatch2 command must have been used.
2) This command considered those observations which had no missing values for any of the variables considered. That is 80,299. This includes the observations in year 2005 which are both 'on support' and 'off support'. As per my understanding, all the observations are considered for matching and then classified as 'on support' or 'off support'.
3) I then run the pstest command to check for the test match quality. Here, the number of observations considered for calculating mean values for 'treated' and 'control' group after matching should be 80,299 - 1658 (observations in the off support area). = 78641. Is my understanding correct here? This table shows that the mean values of treatment and control group are balanced after matching and there is significant bias reduction. 1658 observations in the off support area only belong to the control group. All observations from the treatment group are retained.
4) I have then used the 'summarize' command to manually see the mean values of covariates of treament and control group separately after the 'off support' observations are removed from the dataset. So, here the number of observations considered is 78641. However, here the mean values of covariates for the control group is almost the same (minute differences in the decimals) as those before the ps matching was done.
I am fairly new to STATA and propensity score matching. Any little help would be of immense value to me. Thank you so much!
1) I run the 'psmatch2 command' for propensity score matching based on the 2005 data. I did not specify any 'kind' of matching such as NN matching or Kernel matching. So, I am assuming the default matching method of psmatch2 command must have been used.
2) This command considered those observations which had no missing values for any of the variables considered. That is 80,299. This includes the observations in year 2005 which are both 'on support' and 'off support'. As per my understanding, all the observations are considered for matching and then classified as 'on support' or 'off support'.
3) I then run the pstest command to check for the test match quality. Here, the number of observations considered for calculating mean values for 'treated' and 'control' group after matching should be 80,299 - 1658 (observations in the off support area). = 78641. Is my understanding correct here? This table shows that the mean values of treatment and control group are balanced after matching and there is significant bias reduction. 1658 observations in the off support area only belong to the control group. All observations from the treatment group are retained.
4) I have then used the 'summarize' command to manually see the mean values of covariates of treament and control group separately after the 'off support' observations are removed from the dataset. So, here the number of observations considered is 78641. However, here the mean values of covariates for the control group is almost the same (minute differences in the decimals) as those before the ps matching was done.
I am fairly new to STATA and propensity score matching. Any little help would be of immense value to me. Thank you so much!
Comment