psmatch2

Faiza Zafar

Join Date: Jan 2022

Posts: 23
#1

psmatch2

18 Sep 2022, 05:44

Hi everyone!

I am trying to understand psmatch2 and wanted help with a few things.

The problem:

I am trying to match control firms based on a specific industry in a certain year (2019). From the various post on stata i have realised that this can be done by the following steps

step 1 : Obtain a propensity score based on industry and specific year

step2 : Use that pscore in the psmatch2 command

step 3: run your regression

I am facing problems in step 1 : How can i obtain the pscore based on industry from a specific year? Could you please specify which regression would i use ?

I am following this link:

HTML Code:

https://www.statalist.org/forums/forum/general-stata-discussion/general/1669473-how-to-use-results-of-psmatch2-in-regression

The commands i am trying right now are

Code:

logit treated INDUSTRY if year == 2019

Code:

predict double ps

Code:

psmatch2 treated if year == 2019, outcome(WACC) pscore(ps) neighbor(1) caliper (0.01)

My reference year which i want to match my control firm is 2019.

Any help comments would be greatly appreciated!
Tags: matching based on a year, matching on industry, panel data, psmatch2
Faiza Zafar

Join Date: Jan 2022

Posts: 23
#2

18 Sep 2022, 08:56

Øyvind Snilsberg Any suggestions please?
Comment
Øyvind Snilsberg

Join Date: Oct 2021

Posts: 591
#3

19 Sep 2022, 09:05

can you post a data example?
Comment

David Radwin

Join Date: Mar 2014
Posts: 368

20 Sep 2022, 18:24

You don't need to calculate a propensity score in advance when using psmatch2 (Leuven and Sianesi, available from SSC), so you can skip the first step. You also don't need to run a separate regression (step 3). Here is a silly example that shows propensity score matching in one command:

Code:

. sysuse nlsw88
(NLSW, 1988 extract)

. psmatch2 married age grade south, outcome(union)

Probit regression                                       Number of obs =  1,876
                                                        LR chi2(3)    =   1.38
                                                        Prob > chi2   = 0.7114
Log likelihood = -1212.9288                             Pseudo R2     = 0.0006

------------------------------------------------------------------------------
     married | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |  -.0096615   .0098233    -0.98   0.325    -.0289148    .0095918
       grade |   .0011241   .0116451     0.10   0.923    -.0216998    .0239481
       south |  -.0371692   .0604129    -0.62   0.538    -.1555762    .0812378
       _cons |   .7678532    .423047     1.82   0.070    -.0613037     1.59701
------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
----------------------------+-----------------------------------------------------------
           union  Unmatched | .228501229   .276335878  -.047834649   .020817873    -2.30
                        ATT | .228501229   .279279279  -.050778051   .050789401    -1.00
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-----------+-----------+----------
 Untreated |       655 |       655 
   Treated |     1,221 |     1,221 
-----------+-----------+----------
     Total |     1,876 |     1,876

It seems like you might need to learn more about propensity score matching in general, including setting a reasonable caliper. Please see https://www.statalist.org/forums/for...08#post1242208 and https://www.statalist.org/forums/for...64#post1661764.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him

Comment

Faiza Zafar

Join Date: Jan 2022

Posts: 23
#5

24 Sep 2022, 05:11

Hi David,

Thanks for your comments! Please correct me if i am wrong but I think i do need to run a pre- logit regression estimating propensity based on a specific (industry and year) and then use that propensity in the psmatch2 command to obtain a set of control firms based on specific industry and year?
Comment

Faiza Zafar

Join Date: Jan 2022
Posts: 23

24 Sep 2022, 07:46

Originally posted by Øyvind Snilsberg View Post

can you post a data example?

I am unable to use dataex to post an example but please find below the code i used and the results i get.

Code:

*exact matching onindustyr and year
egen industry_Year = group ( year INDUSTRY )
logit treated i.year i.INDUSTRY ROA  Size 
predict double pscore if e(sample)
gen double pscore2 = industry_Year*1000+pscore
rsort
psmatch2 treated , out( WACC) n(1) caliper(2) pscore(pscore2) noreplacement 
pstest ROA  Size

Attached Files

Comment

Moomal Khan

Join Date: Jul 2022
Posts: 19

25 Sep 2022, 06:19

Originally posted by David Radwin View Post

Code:

. sysuse nlsw88
(NLSW, 1988 extract)

. psmatch2 married age grade south, outcome(union)

Probit regression Number of obs = 1,876
LR chi2(3) = 1.38
Prob > chi2 = 0.7114
Log likelihood = -1212.9288 Pseudo R2 = 0.0006

------------------------------------------------------------------------------
married | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.0096615 .0098233 -0.98 0.325 -.0289148 .0095918
grade | .0011241 .0116451 0.10 0.923 -.0216998 .0239481
south | -.0371692 .0604129 -0.62 0.538 -.1555762 .0812378
_cons | .7678532 .423047 1.82 0.070 -.0613037 1.59701
------------------------------------------------------------------------------
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
union Unmatched | .228501229 .276335878 -.047834649 .020817873 -2.30
ATT | .228501229 .279279279 -.050778051 .050789401 -1.00
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.

| psmatch2:
psmatch2: | Common
Treatment | support
assignment | On suppor | Total
-----------+-----------+----------
Untreated | 655 | 655
Treated | 1,221 | 1,221
-----------+-----------+----------
Total | 1,876 | 1,876

can you guide me about this... https://www.statalist.org/forums/for...82-ps-matching

Comment

Øyvind Snilsberg

Join Date: Oct 2021
Posts: 591

25 Sep 2022, 10:58

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(id year wacc treated size)
 1 2019    .851468 1 5
 1 2020   .9820066 1 8
 2 2019 .032479186 1 5
 2 2020   .9874847 1 4
 3 2019    .894106 1 6
 3 2020   .9684734 1 8
 4 2019  .23922028 1 8
 4 2020   .6927336 1 6
 5 2019   .4884359 1 6
 5 2020   .4376452 1 5
 6 2019   .5858005 1 7
 6 2020   .3787092 1 8
 7 2019   .6880603 1 7
 7 2020   .9794578 1 5
 8 2019   .6701937 1 5
 8 2020   .5948808 1 7
 9 2019   .7970893 1 7
 9 2020   .7835853 1 7
10 2019   .6546342 1 9
10 2020  .09688907 1 8
11 2019   .6885059 0 5
11 2020    .872496 0 7
12 2019  .52963525 0 7
12 2020   .8302209 0 6
13 2019   .9339853 0 6
13 2020   .1749891 0 4
14 2019   .5536171 0 7
14 2020   .5346152 0 6
15 2019   .7767794 0 9
15 2020   .1288747 0 4
16 2019  .27751842 0 8
16 2020   .4242016 0 7
17 2019  .13590056 0 6
17 2020   .3325624 0 7
18 2019   .4675523 0 6
18 2020  .51608807 0 8
19 2019  .06694305 0 8
19 2020  .07229638 0 8
20 2019   .6817465 0 8
20 2020  .08804953 0 5
end

*estimate propensity scores based on size in 2019
psmatch2 treated size if year==2019
bys id (year): gen ps = _pscore[1]

*estmate the effect of treatment in 2020 using propensity scores estimated based on size in 2019
psmatch2 treated if year==2020, outcome(wacc) pscore(ps)

Comment

David Radwin

Join Date: Mar 2014

Posts: 368
#9

28 Sep 2022, 11:48

Originally posted by Faiza Zafar View Post

Please correct me if i am wrong but I think i do need to run a pre- logit regression estimating propensity based on a specific (industry and year) and then use that propensity in the psmatch2 command to obtain a set of control firms based on specific industry and year?

No, you don't need to do this, though you could. For a different approach, see the heading "Matching within strata" in the psmatch2 help file. You also might try both approaches and compare the results. Currently there is no consensus on the "best" or "correct" methods for matching overall nor for most specific situations like yours that seeks to match exactly on some covariates (industry and year) and not on others.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment
David Radwin

Join Date: Mar 2014

Posts: 368
#10

28 Sep 2022, 11:50

Originally posted by Moomal Khan View Post

can you guide me about this... https://www.statalist.org/forums/for...82-ps-matching

Please see this extra advice about bumping from the FAQ: https://www.statalist.org/forums/help#adviceextras

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment
Denise Vella

Join Date: Aug 2022

Posts: 187
#11

23 Jan 2023, 11:02

Originally posted by David Radwin View Post

No, you don't need to do this, though you could. For a different approach, see the heading "Matching within strata" in the psmatch2 help file. You also might try both approaches and compare the results. Currently there is no consensus on the "best" or "correct" methods for matching overall nor for most specific situations like yours that seeks to match exactly on some covariates (industry and year) and not on others.

Hi David Radwin I have looked through all the 20 pages filed under the search 'Propensity scores' to find an answer to my own post which I didn't but anyway... (can't bump my post haha)

I wanted to asked you about this statement regarding the Statalist user asking re performing a logit regression model before psmatch2.

All articles published indicate that one should perform a logit regression with treatment as the outcome/dependent variable and the reset of the covariates as explanatory variables (step one)
Why are you saying to skip this step and move on to psmatch2 which uses probit regression but generates it's own propensity scores?
I know there isn't much evidence regarding difference between probit vs logit - which isn't my question here.

But What is the evidence/reason you recommend skipping the logit step?
Would be interesting to get Melissa Garrido point of view
Comment
David Radwin

Join Date: Mar 2014

Posts: 368
#12

23 Jan 2023, 11:57

The only reason you don't need to calculate propensity scores prior to using psmatch2 is that the program already calculates propensity scores (using your choice of probit or logit) by default and then matches based on the propensity scores. So it's not really skipping a step, but rather combining both steps in one command.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment
Denise Vella

Join Date: Aug 2022

Posts: 187
#13

23 Jan 2023, 15:01

Originally posted by David Radwin View Post

The only reason you don't need to calculate propensity scores prior to using psmatch2 is that the program already calculates propensity scores (using your choice of probit or logit) by default and then matches based on the propensity scores. So it's not really skipping a step, but rather combining both steps in one command.

Ok so perhaps an uncomfortable question…
why didn’t M Garrido in the article published here just recommend using psmatch2 rather than go through the hassle of doing logit first. Was it because the article came out before psmatch2?

https://pubmed.ncbi.nlm.nih.gov/24779867/

also is this the code psmatch2 for logit

psmatch2 treatment covariate1 covariate2, pscore(name_your_ps) outcome(outcome) caliper(.1) common logit noreplacement neighbor(1)
Comment
David Radwin

Join Date: Mar 2014

Posts: 368
#14

23 Jan 2023, 16:54

You may need to read the article more closely. Among other things, it cites psmatch2 with a date of 2003.

As to your second question, you have to choose to either use existing propensity scores using the pscore() option or include the covariates to be used to create a new variable with propensity scores. Your example code does both and will yield an error message.

David Radwin
Senior Researcher, California Competes
californiacompetes.org
Pronouns: He/Him
Comment
David Ray McCoy

Join Date: Dec 2016

Posts: 24
#15

24 Oct 2023, 11:10

For anyone who stumbles across this... I'm still looking for a clean solution for exact matching in psmatch2, but I developed a workaround. The idea is to use preserve... restore and iterate through and subset on the exact matching criteria. In my case, this worked with Mahalanobis distance matching and NOT anything using logit/probit, unless you have sufficiently large samples within exact match criteria. Mahalanobis can find matches in small samples. For more: https://www.statalist.org/forums/for...-with-mahapick
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment