Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • pstest results interpretation

    I have a query regarding the propensity score matching that I have done for my project. The mean values for the covariates of the control group produced by the command pstest doesn't match the mean values of covariates of the control group, when I calculate them manually by the command summarize. I am unclear as to why this is happening. As per my understanding, these mean values found both via pstest and summarize command should match but they don't. I request you to kindly help me understand why this might be happening and where could I have possibly gone wrong. I used the following steps to conduct the ps match:

    1) I run the 'psmatch2 command' for propensity score matching based on the 2005 data. I did not specify any 'kind' of matching such as NN matching or Kernel matching. So, I am assuming the default matching method of psmatch2 command must have been used.
    2) This command considered those observations which had no missing values for any of the variables considered. That is 80,299. This includes the observations in year 2005 which are both 'on support' and 'off support'. As per my understanding, all the observations are considered for matching and then classified as 'on support' or 'off support'.
    3) I then run the pstest command to check for the test match quality. Here, the number of observations considered for calculating mean values for 'treated' and 'control' group after matching should be 80,299 - 1658 (observations in the off support area). = 78641. Is my understanding correct here? This table shows that the mean values of treatment and control group are balanced after matching and there is significant bias reduction. 1658 observations in the off support area only belong to the control group. All observations from the treatment group are retained.
    4) I have then used the 'summarize' command to manually see the mean values of covariates of treament and control group separately after the 'off support' observations are removed from the dataset. So, here the number of observations considered is 78641. However, here the mean values of covariates for the control group is almost the same (minute differences in the decimals) as those before the ps matching was done.

    I am fairly new to STATA and propensity score matching. Any little help would be of immense value to me. Thank you so much!

  • #2
    observations that are on support may not be matched. psmatch2 (SSC) creates a variable _weight that holds the weight given to the matched observations and is missing for the unmatched ones.
    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . psmatch2 foreign mpg
    
    Probit regression                                       Number of obs =     74
                                                            LR chi2(1)    =  11.55
                                                            Prob > chi2   = 0.0007
    Log likelihood = -39.258972                             Pseudo R2     = 0.1282
    
    ------------------------------------------------------------------------------
         foreign | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |   .0960601   .0301523     3.19   0.001     .0369627    .1551575
           _cons |  -2.635268   .6841462    -3.85   0.000     -3.97617   -1.294366
    ------------------------------------------------------------------------------
    
    . pstest, both
    
    ----------------------------------------------------------------------------------------
                    Unmatched |       Mean               %reduct |     t-test    |  V(T)/
    Variable          Matched | Treated Control    %bias  |bias| |    t    p>|t| |  V(C)
    --------------------------+----------------------------------+---------------+----------
    mpg                    U  | 24.773   19.827     86.0         |   3.63  0.001 |  1.94
                           M  | 24.773   24.182     10.3    88.1 |   0.32  0.753 |  1.32
                              |                                  |               |
    ----------------------------------------------------------------------------------------
    * if variance ratio outside [0.42; 2.41] for U and [0.42; 2.41] for M
    
    -----------------------------------------------------------------------------------
     Sample    | Ps R2   LR chi2   p>chi2   MeanBias   MedBias      B      R     %Var
    -----------+-----------------------------------------------------------------------
     Unmatched | 0.128     11.55    0.001     86.0      86.0      86.0*   1.94      0
     Matched   | 0.002      0.10    0.746     10.3      10.3       9.5    1.32      0
    -----------------------------------------------------------------------------------
    * if B>25%, R outside [0.5; 2]
    
    . tabstat mpg, by(foreign)
    
    Summary for variables: mpg
    Group variable: foreign (Car origin)
    
     foreign |      Mean
    ---------+----------
    Domestic |  19.82692
     Foreign |  24.77273
    ---------+----------
       Total |   21.2973
    --------------------
    
    . tabstat mpg [aw=_weight], by(foreign)
    
    Summary for variables: mpg
    Group variable: foreign (Car origin)
    
     foreign |      Mean
    ---------+----------
    Domestic |  24.18182
     Foreign |  24.77273
    ---------+----------
       Total |  24.47727
    --------------------
    Last edited by Øyvind Snilsberg; 30 Aug 2022, 03:42.

    Comment


    • #3
      Thank you so much for your response. It is extremely helpful for me and certainly helps me understand my case better. I have a few follow up queries:
      1) As I did not specifiy the kind of matching in my psmatch2 command, which default kind of matching would have been done?
      2) I am unable to understand the difference between the observations which are off support and those where no value was generated for the _weight variable, after the matching.
      2) I aim to retain only those observations which are matched. In such a case, should my approach be to first remove the observations which are off support and then remove the observations which have missing value for _weight variable?

      Any little input would be immensely helpful for me.

      Comment


      • #4
        1. default is single nearest-neighbor.
        2. an untreated observation with missing _weight is not the nearest-neighbor for any treated observation in terms of the pscore, whereas an untreated observation off support have pscore higher than the maximum or less than the minimum pscore of the treated observations.
        3. remove observations with missing _weight.
        Last edited by Øyvind Snilsberg; 30 Aug 2022, 11:15.

        Comment


        • #5
          Thank you so much for your response. I tried to play around with the auto dataset. I have a few follow up queries on your answer (3):

          1) I see that the variable _nn=1 (indicating that there is one match for the observation) corresponding to those observations which are on support. However, _nn may or maynot be 1 for those observations which have a value for _weight, which means that the missing _weight observations have also found a match. How is this possible? Is my interpretation of _nn correct here?
          2) Additionally, will this mean that, if I am dropping the observations where _weight is missing, I would be dropping observations which may have been matched?
          3) If I am retaining matched observations on the basis of missing _weight variable, I would be dropping a few observations which may be on support and retaining observations which maybe off support. How can I explain this in theory, given that I am including the common support graph in my paper?
          4) I dropped the observations having missing _weight values. I then calculated the mean values of the covariates for the control group. I expected this to be the same as those in the pstest output table. But the values still don't match. So, now I can't understand which observations does the pstest command use for calculating the means of the covariates for treatment and control groups? Is it not the total number of observations which don't have a missing _weight value?

          Thank you once again for all your help

          Comment


          • #6
            1. _nn = 1 for treated observations with 1 (the single nearest-neighbor) matched untreated observation, and 0 otherwise.
            2. no.
            3. it often makes sense to drop observations off support but doing so might result in poorer matches.
            4. you must weigh the covariates using the variable _weight, i.e., -bysort foreign: summarize mpg [aw=_weight]-.

            Comment


            • #7
              Thank you so much for your responses. I have one follow up question:
              1) How does adding 'ate' to the psmatch2 command affect the matching exercise? I see that the number of observations on support and off support change if I remove the 'ate' from my psmatch2 command. The regression results of the psmatch2 command is not relevant for my research. My only aim is to retain observations which are matched as per their propensity scores, and conduct other analyses using the matched observations only. In such a case, should I include 'ate' in my psmatch2 command or should I not?

              Comment


              • #8
                In reference to my question above, the observations on and off support differ when I remove the 'ate' option from the psmatch2 command. And the number of observations with missing _weight values also differ. I want to understand what difference does 'ate' make? And if I should use the ate option in my case or if I should not, if I am to drop observations with missing _weight values to retain matched observations.

                Comment


                • #9
                  with ate option, _support flags observations on support. without ate option, _support flags treated observations on support + all untreated observations.

                  Comment


                  • #10
                    How does inlcuding ate option affect the _weight variable ?
                    On what basis should one decide to use ate option with the psmatch2 command?

                    Thanks

                    Comment


                    • #11
                      Hello everyone
                      I have a problem with pstest: results differs from t-test in the matched sample. Pstest provide equal mean results for the treated group but different mean results are obtained for the control group every time that I run the code. I have tried with different kind of variables, categorical and continuous and I obtain every time the same thing: means of control group with the pstest differ from means of control group using a t-test in the matched sample. I really do not understand how this works or if I am doing something wrong.
                      Here my code:
                      pscore ASA_yes_no_group Age Gender, pscore(mypscore) detail
                      psgraph, treated(ASA_yes_no_group) pscore(mypscore)
                      psmatch2 ASA_yes_no_group, out(AKI_KDIGO) pscore(mypscore) noreplacement neighbor(1) caliper(0.2) ate
                      pstest Age Gender, graph both
                      preserve
                      keep if _support==1
                      ttest Age, by(ASA_yes_no_group)

                      Any help would be much appreciated!!

                      Comment


                      • #12
                        Originally posted by Isha Mohanty View Post
                        How does inlcuding ate option affect the _weight variable ?
                        On what basis should one decide to use ate option with the psmatch2 command?

                        Thanks
                        dod you ever figure this out ? Really useful post

                        Comment


                        • #13
                          Click image for larger version

Name:	Screenshot .png
Views:	1
Size:	108.4 KB
ID:	1736772
                          Hello everyone. Could anyone give me a brief interpretation or explanation about the dotted values, "." along some columns of the pstest STATA function results (post-PSM balance test outputs) mainly seen along the variance ratios V(T)/V(C) and %bias of dummy covariates?
                          Last edited by Amare Terefe Gashaye; 11 Dec 2023, 10:15.

                          Comment

                          Working...
                          X