factor variable notation in teffects

Eric Jardine

Join Date: Aug 2016

Posts: 15
#1

factor variable notation in teffects

02 Oct 2019, 09:52

Hi everyone,

I am working on a project using propensity score matching, via the teffects command in Stata 15. My data is from a survey of respondents in 8 countries. I am using the following basic notation for teffects:

Code:

teffects psmatch (Outcome) (Treatment $Demo_list_vars $Other_Demo_Vars $Perception_list_vars i.country, logit) if treatment!=1

I restrict the treatment as there are people who do not use the underlying technology at all (treatment = 1) and I am interested in exposure to an event on the technology, where treatment = 2 = no and treatment = 3 = yes.

The matching portion of my model includes a lot of dummy variables, not just the country list but also gender, do people own a smartphone, etc. Herein lies my question: should I use factor variable notation (i.e. i.varname) on the variable matching portion of the model.

This choice seems to have has material implications for the estimation of the ATE.

For the model using factor notation, I get the following output:

Treatment-effects estimation Number of obs = 6,650
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 2
---------------------------------------------------------------------------------------
| AI Robust
Outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
ATE |
(yes vs No) | .2606015 .0363096 7.18 0.000 .189436 .331767
---------------------------------------------------------------------------------------

Doing the same without factor variable notation for the matching variables returns the following:

Treatment-effects estimation Number of obs = 6,650
Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1
Treatment model: logit max = 2
---------------------------------------------------------------------------------------
| AI Robust
Outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
ATE |
(yes vs No) | .2065414 .035904 5.75 0.000 .1361707 .276912
---------------------------------------------------------------------------------------

The number of observations and matches remains the same, but there are pretty big differences in terms of the 95 percent CI and ATE across these two models, suggesting that there might be a correct way forward here but I am not clear from the manual or the Stata YouTube Channel videos that I have watched.

Thanks for your time, all.

Sincerely,

Eric J

Last edited by Eric Jardine; 02 Oct 2019, 09:53. Reason: Forgot the tags.
Tags: factor variables, matching, propensity score matching, teffects
Eric Jardine

Join Date: Aug 2016

Posts: 15
#2

02 Oct 2019, 10:12

So, I would still welcome replies if anyone has firm feelings here, but in poking around the drop down menu in the treatment effects tab, there is a box with three dots (...) that indicates you can populate your treatment model with factor variables. Given this functionality, I suspect the treatment model is more accurate with the use of factor variable notation and will likely proceed on that basis, as I assume it defaults to reading variables as continuous if not otherwise noted. Again, the comments of others are welcome, but this included technical function seems indicative of factor notation being an advisable idea.
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 116
#3

02 Oct 2019, 13:50

Hi Eric,

If all your factor variables were binary, you should see the same results regardless of using factor variable notation or not. However, seeing that you have i.country in the model, this does not seem to be the case. What happens if you don't use factor variable notation in this case is that you include that variable as a linear predictor in the treatment assignment model, which makes no sense for nominal variables such as a country identifier.

I hope this helps,
Joerg
Comment
Eric Jardine

Join Date: Aug 2016

Posts: 15
#4

03 Oct 2019, 11:57

Thanks for the reply, Joerg. To clarify one point, the results in my initial post that vary so widely when I use factor variable notation use i.country in both cases. The coding difference that is producing the different results is my use of i. notation on the other binary demographic variables in matching portion of the model. These are all coded zero and one. My understanding is that, mathematically, use of the i. should not matter in these cases if, as you said, the factor variables are binary. But it seems to matter in my case. Happy to supply more info if that is helpful.
Comment

Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014
Posts: 116

04 Oct 2019, 14:42

Yes, it should not make a difference whether you use binary variables with or without factor variable notation in this case. However, using a toy example, I cannot reproduce the behavior you describe:

Code:

. clear

. set seed 123

. set obs 100
number of observations (_N) was 0, now 100

. forval i = 1/10 {
  2.         gen x`i' = runiform() > 0.5
  3. }

. gen D = runiform() > 0.5

. gen y = rnormal()

. 
. teffects psmatch (y) (D x1-x10)

Treatment-effects estimation                   Number of obs      =        100
Estimator      : propensity-score matching     Matches: requested =          1
Outcome model  : matching                                     min =          1
Treatment model: logit                                        max =          2
------------------------------------------------------------------------------
             |              AI Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
           D |
   (1 vs 0)  |  -.0581461   .2929358    -0.20   0.843    -.6322896    .5159975
------------------------------------------------------------------------------

. mat b0 = e(b)

. 
. teffects psmatch (y) (D i.(x1-x10))

Treatment-effects estimation                   Number of obs      =        100
Estimator      : propensity-score matching     Matches: requested =          1
Outcome model  : matching                                     min =          1
Treatment model: logit                                        max =          2
------------------------------------------------------------------------------
             |              AI Robust
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
           D |
   (1 vs 0)  |  -.0581461   .2929358    -0.20   0.843    -.6322896    .5159975
------------------------------------------------------------------------------

. mat b1 = e(b)

. 
. di mreldif(b0,b1)
0

We can see that the results of the effect estimates are exactly the same. If you could post a reproducible example, I am happy to take a look at it. Alternatively, you could also send your data and code to [email protected] and we will take a look.

Joerg

Comment

Eric Jardine

Join Date: Aug 2016

Posts: 15
#6

16 Oct 2019, 11:03

Joerg, Thanks for your post. I managed to get the issue sorted. I was using global lists in my matching models and was using lists for for the first time. What ended up happening is that I failed to capitalize one of my lists in one of the models. Unlike when a variable is incorrectly specified, the model ran without returning an error. Basically, I was matching on different variables, so getting different results. Thanks for your patience and for using the simulation to show there had to be something amiss. I really appreciate it.
Comment

Announcement