Difference-in-differences

Cairone Federica

Join Date: Apr 2021

Posts: 31
#1

Difference-in-differences

12 Apr 2021, 12:38

Hello everyone! I'm doing my thesis, but I have a big problem with how applying the difference-in-differences methodology on the panel data I have.
Our goal is to ascertain how the enactment of preregistration laws affects the political participation of young individuals and the distribution of public resources. We begin the analysis by empirically examining the effect of preregistration on young voter registration and turnout. To this end, we take advantage of the fact that preregistration reduces the cost of registering and in turn the cost of voting for young relative to other age groups. Since the age of an individual is a dimension along which the treatment varies, along with time and space, we first split the set of individuals into two age groups: the young and the old. For each of them, we then use a difference-in-differences (hereafter DD) regression design, which compares electoral outcomes for individuals in states with preregistration and states without before and after voting reform is introduced.
We operationalize the empirical strategy employing the following event study model based on a DD estimator:

So, I created my dummies variables, as you can seen below:

My problem is now figuring out how to make the difference-in-differences I mentioned above. Taking into consideration the two respective groups (young and old). Should I proceed through regression, or is it better to use other commands (for example the specific diff command); moreover, it is not clear to me in this context, with so many data, to understand how to identify the control and treatment groups.
I would be grateful if any of you could help me; unfortunately i have only studied the simplest case of diff-in-diff, and also i don't manage very well Stata.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#2

12 Apr 2021, 13:25

To follow the equation you show, you need to set up some other variables. You need to create the variable called P_s in that equation--it is this one which will distinguish the treatment group from the control group.

Code:

by statefip, sort: egen ps = max(pre_reg)

You also need a variable that gives the number of years since the first election following implementation in the state:

Code:

by statefip, sort: egen year_first_post_implement = min(cond(pre_reg, year, .)) gen years_since_implement = year - year_first_post_implement replace years_since_implement = max(min(year_first_post_implement, 3), -5)

Now, setting aside for the moment the distinction between the young and the older, your regression would then go like this:

Code:

xtset statefip xtlogit Y i.ps##i.years_since_implement i.year, fe

where Y should be replaced by your actual 0/1 outcome variable (you don't show what it is called).

To incorporate the possibility of separate impacts on young and old, you need to expand the interaction to a three-way one:

Code:

xtlogit Y i.age18_24##i.ps##i.years_since_implement i.year, fe

Notes:
1. I am giving the "bare bones" model here. There may be other variables that need to be taken into account. You may (or may not) need to use cluster-robust standard errors.
2. None of this code is tested as usable example data was not provided. Therefore, beware of typos or other errors. In the future, when seeking help with coding, always show example data. And always use the -dataex- command to do that. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
3. Interpreting the results of the model including young vs old is going to be complicated because the years_since_implement variable has 9 levels, and the effects may well be heterogeneous across those 9 time periods to start with, and the moderating effect of age on those may also be heterogeneous across those 9 time periods. Good luck! It's going to be a series of -lincom- commands for all of the three-way interaction terms, and perhaps for an omnibus test, a joint -test- of all 9.

If you have enough other variables to include in the model so that, conditional on all of those, the delta_s and epsilon_i_s_t are reasonably considered independent, you can simplify your life by using a random effects model instead of fixed effects (-re- instead of -fe-). Because then you can use the -margins- command to get the marginal effects. But you can't really do that after a fixed-effects logistic regression.
Comment
Cairone Federica

Join Date: Apr 2021

Posts: 31
#3

12 Apr 2021, 14:15

Ciao Clyde! I was hoping for your answer, I saw that you are very good and familiar with Stata, especially with this methodology. Is it possible to send the do.file via Statalist? I have all the necessary variables and the various do.file containing the results to be produced in the first instance; what I should do is create new interactions and see the difference in effects using other variables such as gender and race. My biggest problem with the files I have is understanding the "syntax" that is used, above all because the study I'm referring to uses the so-called event approach study...
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#4

12 Apr 2021, 14:34

The way to show the contents of a do-file here on Statalist is to click on the # button in the toolbar at the top of the message window. (If there is no toolbar at the top of the message window, click on the A button: that will make the toolbar appear.) After clicking on the # button, "code delimiters" will appear in the message window. Copy the contents of the do-file (or the parts of it that you want to show), and paste it between the code delimiters. When you hit "Post Reply", the code will appear in a Code box in a fixed-width font, nicely aligned (well, as nicely aligned as the original code!) When you want to show Results that Stata gave you, you can do the same thing: just copy/paste from the Results window (or your log file) between code delimiters.

Happy to help with specific questions about Stata syntax.
Comment

Cairone Federica

Join Date: Apr 2021
Posts: 31

12 Apr 2021, 14:54

Code:

**** GENERATING VARIABLE ****
*****************************

/*GEN PREREGISTRATION*/
gen pre_reg=1 if statefip==6 & year>= 2009 /*California*/
replace pre_reg=1 if statefip==8 & year>=2013 /*Colorado*/
replace pre_reg=1 if statefip==10 & year>=2010 /*Delaware*/ 
replace pre_reg=1 if statefip==11 & year>=2009 /*DC*/
replace pre_reg=1 if statefip==12 & year>=2007 /*Florida*/
replace pre_reg=1 if statefip==22 & year>=2014 /*Louisiana*/ 
replace pre_reg=1 if statefip==23 & year>=2011 /*Maine*/    
replace pre_reg=1 if statefip==24 & year>=2010 /*Maryland*/
replace pre_reg=1 if statefip==25 & year>=2014 /*Massachusetts*/
replace pre_reg=1 if statefip==37 & year>=2009 & year<2013/*North Caroline*/
replace pre_reg=1 if statefip==41 & year>=2007 /*Oregon*/
replace pre_reg=1 if statefip==44 & year>=2010 /*Rhode Island*/
replace pre_reg=1 if statefip==49 & year>=2015 /*Utah*/
replace pre_reg=1 if statefip==15 & year>=1993 /*Hawaii*/
replace pre_reg=1 if statefip==34 & year>=2016 /*New Jersey*/
replace pre_reg=0 if pre_reg==.
.

/*GEN VOTED AND REGISTERED*/
gen register=1 if voreg==2 
replace register=0 if voreg==1
replace register=1 if voted==2 & register==.
gen vote=1 if voted==2
replace vote=0 if voted==1

/*GEN AGE DUMMIES*/
gen age18_24 =1 if age>=18 & age<25   
replace age18_24=0 if age>24 & age!=.

gen pre18=pre_reg*age18_24
gen online18=online*age18_24
gen edr18=edr*age18_24

/*CLEAN DATA*/
replace sex=. if sex==9
gen black=1 if race==200 
replace black=0 if black==. & race!=.
gen hispanic=1 if hispan>0 & hispan<900
replace hispanic=0 if hispan==0
replace labforce=. if labforce==0
replace voteresp=. if voteresp==9
replace faminc=. if faminc>843
replace faminc=. if faminc==800
recode faminc (100=0) (110=0) (120=0) (130=0) (140=0) (150=0) (210=1) (220=1) (231=1) (300=1) (430=2) (440=2) (460=2) (470=2) (500=3) (540=3) (550=3) (600=4) (700=5) (710=5) (720=5) (730=5) (740=5) (810=6) (820=6) (830=6) (840=7) (841=7) (842=7) (843=7),  g(faminc1)
tab faminc faminc1
replace educ=. if educ==1 | educ==999
recode educ (2=0) (10=0) (11=0) (12=0) (13=0) (14=0) (20=0) (21=0) (22=0) (30=0) (31=0) (32=0) (40=0) (50=0) (60=0) (71=1) (72=1) (73=1) (80=2) (81=2) (90=2) (91=2) (92=2) (100=2) (110=3) (111=3) (121=3) (122=3) (123=3) (124=3) (125=3), g(educ1)
tab educ educ1

global controls i.sex i.black i.hispanic i.educ1 i.faminc1 i.labforce i.metro i.voteresp

drop if hispanic==.
drop if labforce==. 
drop if voteresp==.
drop if faminc1==.
drop if age18_24==.
drop if age>90

/*GEN EVENTS ON AGE 18-24 PREREG*/

so statefip

*Generate Cohorts*
gen treated_states = 0
by statefip: egen max_pre=max(pre_reg)
by statefip: replace treated_state=max_pre

by statefip: egen pre_reg_y=min(year) if pre_reg==1
by statefip: egen target=min(pre_reg_y)
egen treated_year=group(pre_reg_y)
by statefip: egen cohort=max(treated_year)
replace cohort=0 if cohort==.

*Generate Leads and Lags*
forvalues kk = 0(1)5 {
by statefip: gen F`kk'=target-2*`kk'
by statefip: gen F`kk'_pre=0
by statefip: replace F`kk'_pre=1 if age18_24==1 & year==F`kk'
by statefip: gen Fold`kk'_pre=1 if age18_24==0 & year==F`kk'
}

forvalues kk = 1(1)3 {
by statefip: gen L`kk'=target+2*`kk'
by statefip: gen L`kk'_pre=0
by statefip: replace L`kk'_pre=1 if age18_24==1 & year==L`kk'
by statefip: gen Lold`kk'_pre=1 if age18_24==0 & year==L`kk'
}

by statefip: gen F5_last=0
by statefip: replace F5_last=1 if age18_24==1 & year<=target-10 & target!=.
by statefip: gen L3_last=0
by statefip: replace L3_last=1 if age18_24==1 & year>=target+6 & target!=.

*Generate event window*
gen eventwindow = 0
forvalues kk = 0(1)5 {
    replace eventwindow = 1 if F`kk'_pre == 1 | Fold`kk'_pre==1
    }
forvalues kk = 1(1)2 {
    replace eventwindow = 1 if L`kk'_pre == 1 | Lold`kk'_pre==1
    }

* Generate mean of omitted time
gen year_omitted=.
replace year_omitted=year if F1_pre==1
by state: egen max_year_omitted=max(year_omitted)
by state: egen register_young_m=mean(register) if (age18_24==1 & year==max_year_omitted)
by state: egen register_old_m=mean(register) if (age18_24==0 & year==max_year_omitted)
by state: egen max_register_young_m=max(register_young_m)
by state: egen max_register_old_m=max(register_old_m)
gen register_gap_m=max_register_old_m-max_register_young_m

by state: egen vote_young_m=mean(vote) if (age18_24==1 & year==max_year_omitted)
by state: egen vote_old_m=mean(vote) if (age18_24==0 & year==max_year_omitted)
by state: egen max_vote_young_m=max(vote_young_m)
by state: egen max_vote_old_m=max(vote_old_m)
gen vote_gap_m=max_vote_old_m-max_vote_young_m

by state: egen register_young_D=mean(register) if (age18_24==1 & pre_reg==0 & target!=.)
by state: egen register_old_D=mean(register) if (age18_24==0 & pre_reg==0 & target!=.)
by state: egen max_register_young_D=max(register_young_D)
by state: egen max_register_old_D=max(register_old_D)
gen register_gap_D=max_register_old_D-max_register_young_D

by state: egen vote_young_D=mean(vote) if (age18_24==1 & pre_reg==0 & target!=.)
by state: egen vote_old_D=mean(vote) if (age18_24==0 & pre_reg==0 & target!=.)
by state: egen max_vote_young_D=max(vote_young_D)
by state: egen max_vote_old_D=max(vote_old_D)
gen vote_gap_D=max_vote_old_D-max_vote_young_D

Well, this is my database, and, honestly, I have no confindence with leads and lags, and also with the commands used in the "Generate mean of omitted time".
Below, you can see the part of the file that I should use to produce the tables, and from here I was trying to get references to generate the DD model (and a further step consists of a DDD model).

Code:

/*DEF GLOBAL VARIABLES*/

global controls i.sex i.black i.hispanic i.educ1 i.faminc1 i.labforce i.metro i.voteresp

*****************
**** TABLE 1 ****
*****************

*baseline: Model 1*
gen uno=0
 
eststo: reg register F5_last F4_pre F3_pre F2_pre uno F0_pre L1_pre L2_pre L3_last i.year#i.age18_24 i.statefip#i.age18_24 i.statefip#i.year [pweight= wtfinl] , cluster(statefip)
eststo reg_baseline
sum register_gap_m, meanonly
estadd scalar ymean_int = r(mean)

*controls: Model 2*
eststo: reg register F5_last F4_pre F3_pre F2_pre uno F0_pre L1_pre L2_pre L3_last $controls i.year#i.age18_24 i.statefip#i.age18_24 i.statefip#i.year [pweight= wtfinl] , cluster(statefip)
eststo reg_controls
sum register_gap_m, meanonly
estadd scalar ymean_int = r(mean)
 
*average: Model 3*
eststo: reg register pre18 $controls i.year#i.age18_24 i.statefip#i.age18_24 i.statefip#i.year [pweight= wtfinl] , cluster(statefip)
eststo reg_DDD_controls
sum register_gap_D, meanonly
estadd scalar ymean_int = r(mean)

Do the variables F5_last, F4_pre, etc.. correspond to those previously generated with leads and lags?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#6

12 Apr 2021, 17:29

Well, you certainly put a lot of effort into writing all that code. Much of it, I'm afraid, is not necessary. There is no reason to have separate indicator variables for F5_last, F4_pre, etc. All of those are handled automatically by Stata when you take the approach I used in #2. In general, there is seldom any reason to create indicator ("dummy") variables or separate variables for lags and leads of anything in Stata: read -help fvvarlist- and -help tsvarlist- respectively. Using them saves you a lot of time, reduces the risk of errors, and makes the code much shorter and easier to read/understand.

Also, I see you frequently follow the pattern (I see this often here on Statalist and wonder where it comes from):

Code:

gen new_var = 0 replace new_var = 1 if some_logical_condition

That construction can be compressed to:

Code:

gen new_var = some_logical_condition

which again saves time, reduces typing and errors, compactifies the code, and makes the code easier to read and understand.

Also, you have some very long -recode- commands that can be compressed. The scheme you are using:

Code:

recode some_variable (10 = 0) (12 = 0) (15 = 0) (19 = 0) (22 = 1) (31 = 1) (46 = 1) (49 = 1) ...

can be shortened to:

Code:

recode some_variable (10/19 = 0) (20/49 = 1) ...

In addition to the benefits of time, typing, etc., this also does not require you to know every actual value of the variable being -recode-d. You just specify the ranges involved.

It is your choice whether to use -xtreg, fe- after -xtset statefip-, or to use -regress ... i.statefip-...- You will get equivalent results. But using -regress i.statefip- will result in your getting a list of coefficients for the state indicator variables which, in most settings, are not of interest and just clutter up the output. -xtreg, fe- will absorb those and not bother you with them.

All of the above are just different ways to do what you have done in your code. They don't constitute errors on your part, just doing things the hard way instead of the easy way. What I mention below constitute what I believe to be actual errors that need to be fixed in order to correctly fit the model you propose in #1. You can see in #2 how most of the code you have shown can be reduced to about a half-dozen lines.

I see modeling error in all of your -regress- commands. First, the term i.statefip#i.year does not correspond to anything in the equation you showed in #1. It is legal to have terms like this, but they make it a different model from the one you claim to be trying to follow. Moreover, in the absence of i.statefip and i.year by themselves, the model is just a mis-specification: whenever you specify an interaction, the constituent effects must also be included. That can be accomplished automatically by using ## rather than # to specify the interaction, or you can use #, but then you have to explicitly list the constituents. In this specific case, however, the model you mention in #1 does not include any term correspodning to i.statefip#i.year, so you should eliminate that and replace it with just i.statefip and i.year separately: no interaction between them.

I also notice that you are using -pweight-s. If this is survey data, the use of pweights is important to get unbiased coefficient estimates. But, if the survey design included stratification or primary and higher level sampling units, then the standard errors will not be correct unless you also account for those. You would have to refer to the documentation provided by the source of the data itself to learn both how the sampling was carried out and which variables in the data provided give you the strata and psus (and higher level sampling units, if any). You would then have to incorporate that information, along with the pweight variable, into the -svyset- command, and you would need to use the -svy:- prefix on your -regress- command (and take out the [pweight = ...] from the -regress- command.
Comment
Cairone Federica

Join Date: Apr 2021

Posts: 31
#7

13 Apr 2021, 11:50

Ciao Clyde! Data on voting and registration at the individual level are obtained from the Voting and Registration Supplement of the Current Population Survey (CPS) carried out biennially after each November election by the US Census Bureau, that's why my supervisor suggested me to use pweights. Another problem is that my supervisor is a theoretical, not applied economist; in fact, she doesn't help much in generating the correct model on Stata. Concerning the commands - statefip#i.year and similar- I suppose they were introduced to denote year fixed effects ( to control for time shocks) and state fixed effects ( to account for unobserved state characteristics); More, as you can see from the equation in #1, the - Xi,s,t - is a vector of time-varying individual characteristics that I wrote as

Code:

global controls i.sex i.black i.hispanic i.educ1 i.faminc1 i.labforce i.metro i.voteresp

that I need to introduce in my regression model (to do perform that, can I simply introduce them in my regression by using the command $controls?).
Another aim of our research is to verify if there are differences related to gender and race (as well as age), creating variables of interactions that can grasp the effects. Maybe, even in this case, I made the syntax more complicated , I'll show you how I did it:

Code:

/*GEN SEX AND RACE*/ gen male= sex==1 gen female= sex==2 gen black= race==200 gen hispanic= hispan>0 & hispan<900 /*GEN INTERACTIONS*/ gen pre18=pre_reg*age18_24 gen pre18black=pre_reg*black gen pre18hispanic=pre_reg*hispanic gen pre18female=pre_reg*female gen pre18male=pre_reg*male gen pre18blackmale=pre_reg*black*male gen pre18blackfemale=pre_reg*black*female gen pre18hispanicmale=pre_reg*hispanic*male gen pre18hispanicfemale=pre_reg*hispanic*female

Don't consider the variable name, I still have to fix it well and create shorter names!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#8

13 Apr 2021, 12:05

Concerning the commands - statefip#i.year and similar- I suppose they were introduced to denote year fixed effects ( to control for time shocks) and state fixed effects ( to account for unobserved state characteristics);

Yes, I imagined that is what you were thinking, but that is not what it does. statefip#i.year introduces a separate fixed effect for each combination of state and year. That is many more fixed effects than just having fixed effects for states and fixed effects for years. The equation that you cited in #1 contains only fixed effects for states and fixed effects for years, not for their combinations. So you need i.statefip and i.year in the model, but not their interaction.

To introduce your covariates, yes you can just include $controls in the list of predictor variables in your regression command.

The code you show for those interaction terms looks mostly correct, though, you should not include both male and female as these are mutually exclusive and exhaustive categories. Pick one. Also, this can be done more simply There is no need to create these interaction variables at all. Instead, in your regression command you can just include i.,pre18##i.(age18_24 black hispanic)##i.female and Stata will automatically include all of those variables and all of their two way and three way interactions in your model.

Don't consider the variable name, I still have to fix it well and create shorter names!

One of the problems with creating your own interaction variables is that you can end up with names that are too long to type (or even too long for legal Stata syntax), or names that are so abbreviated as to be unreadable. The best solution is not to create your own interaction variables.* Let factor-variable notation handle it and you completely avoid this problem.

*There are occasional situations where an interaction really needs to be a variable in its own right, but these are relatively uncommon. For garden variety regression models with interaction terms, factor-variable notation is the better approach.

Last edited by Clyde Schechter; 13 Apr 2021, 12:08.
Comment
Cairone Federica

Join Date: Apr 2021

Posts: 31
#9

14 Apr 2021, 09:54

Thanks a lot Clyde! You gave me a great starting point, more than great I would say. So, doing using the command diff-in-diff, or using the xtreg command is pretty much the same? Of course, later I will ask you for some other help because the second step in this work is to make a triple DDD, and I'm sure you already understand my less than basic level in Stata!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#10

14 Apr 2021, 14:58

Most diff-in-diff analyses are carried out using the -xtreg- command. But sometimes they are done with other commands. When the data is serial cross-sections rather than panels, sometimes simple -regress- is good. Also -xtreg, fe- can always be emulated with -regress ... i.panelvar ...-. And some people prefer -areg- or -reghdfe- to -xtreg- for their fixed effects linear regressions. So the way I recommend you think about them is this: -xtreg- is one Stata command that carries out fixed-effects linear regression. There are other Stata commands that do that, too. Fixed-effects linear regression is applicable to many kinds of problems, of which diff-in-diff estimation of intervention effects is only one. There are many other applications of fixed-effects linear regression. Finally, don't think of diff-in-diff as a command. It's a study design, a strategy for identifying causal effects--it can involve statistical models other than fixed-effects linear regression, too.
Comment

Cairone Federica

Join Date: Apr 2021
Posts: 31

#11

16 Apr 2021, 08:13

Grazie Clyde!
I tried to run the following regression

Code:

xtset statefip
xtlogit register i.ps##i.years_since_implement i.year, fe

and this is the outcome:

Code:

. xtlogit register i.ps##i.years_since_implement i.year, fe 
note: 3.years_since_implement omitted because of collinearity
note: 1.ps#3.years_since_implement omitted because of collinearity
note: multiple positive outcomes within groups encountered.
note: 1.ps omitted because of no within-group variance.
18,722 (group size) take 14,827 (# positives) combinations results in numeric overflow; computations cannot proceed
r(1400);

How to fix the problem?
Also, when I need to introduce interactions see #8 in my regression, should I write each of them separately or I can simply write all together like:

Code:

xtset statefip
xtlogit register i.ps##i.years_since_implement i.year i.pre18##i.(age18_24 black hispanic)##i.female, fe

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#12

18 Apr 2021, 12:24

The lines that begin with -note:- are just informational: they are not problems and don't require you to do anything.

The final line about numeric overflow, however, is a fatal error. You have some panel (fipstate) that has 18,722 observations, of which 14,827 have register = 1. That problem is simply too large for Stata to manage. It arises because calculating the likelihood for that panel requires calculated the number of ways you can select 14,827 observations out of 18,722. That is some hugely astronomical number that Stata simply cannot accommodate. You will either have to find a computer and software that can manage this kind of calculation, or, select a substantially smaller random subset of your full data to do your analysis on. Another alternative might be using a linear probability model instead of logistic regression.
Comment

Cairone Federica

Join Date: Apr 2021
Posts: 31

#13

19 Apr 2021, 07:42

Thank you Clyde!
I tried to run a both a logit model and a simple regression; here the results:

Code:

 reg register i.ps##i.years_since_implement i.statefip i.year , cluster(statefip)
note: 3.years_since_implement omitted because of collinearity
note: 1.ps#3.years_since_implement omitted because of collinearity
note: 44.statefip omitted because of collinearity

Linear regression                               Number of obs     =  1,350,537
                                                F(15, 50)         =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0188
                                                Root MSE          =     .41783

                                          (Std. Err. adjusted for 51 clusters in statefip)
------------------------------------------------------------------------------------------
                         |               Robust
                register |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
                    1.ps |  -.0064737   .0008297    -7.80   0.000    -.0081402   -.0048072
 3.years_since_implement |          0  (omitted)
                         |
ps#years_since_implement |
                    1 3  |          0  (omitted)
                         |
                statefip |
                 alaska  |   .0308412   .0001853   166.46   0.000      .030469    .0312133
                arizona  |  -.0928223    .000166  -559.19   0.000    -.0931557   -.0924889
               arkansas  |   -.086923   .0001292  -672.86   0.000    -.0871824   -.0866635
             california  |  -.0283233   .0007059   -40.13   0.000     -.029741   -.0269055
               colorado  |  -.0109156   .0002137   -51.08   0.000    -.0113447   -.0104864
            connecticut  |   .0005448   .0008953     0.61   0.546    -.0012535    .0023431
               delaware  |  -.0287721   .0001844  -156.04   0.000    -.0291424   -.0284017
   district of columbia  |    .034705   .0001774   195.66   0.000     .0343487    .0350613
                florida  |  -.0329829   .0010115   -32.61   0.000    -.0350144   -.0309513
                georgia  |  -.0683078    .000455  -150.12   0.000    -.0692217   -.0673939
                 hawaii  |  -.1165353   .0002912  -400.22   0.000    -.1171201   -.1159504
                  idaho  |  -.0603099   .0001217  -495.41   0.000    -.0605544   -.0600654
               illinois  |   .0048569    .000483    10.06   0.000     .0038868     .005827
                indiana  |   -.061877   .0002409  -256.81   0.000     -.062361   -.0613931
                   iowa  |   .0000373    .000417     0.09   0.929    -.0008003    .0008749
                 kansas  |  -.0532566   .0002944  -180.89   0.000    -.0538479   -.0526653
               kentucky  |   -.044231   .0004095  -108.02   0.000    -.0450535   -.0434085
              louisiana  |   .0213509   .0008043    26.55   0.000     .0197355    .0229663
                  maine  |   .0762839   .0000896   851.11   0.000     .0761039     .076464
               maryland  |  -.0090021   .0002205   -40.82   0.000    -.0094451   -.0085592
          massachusetts  |   .0277593   .0019392    14.31   0.000     .0238642    .0316543
               michigan  |   .0440295   .0007776    56.62   0.000     .0424676    .0455914
              minnesota  |   .0784553   .0007606   103.15   0.000     .0769276    .0799829
            mississippi  |   .0312339   .0003148    99.22   0.000     .0306016    .0318662
               missouri  |   .0057357   .0003441    16.67   0.000     .0050445    .0064269
                montana  |  -.0110675    .000201   -55.07   0.000    -.0114712   -.0106639
               nebraska  |  -.0271213   .0002542  -106.67   0.000    -.0276319   -.0266106
                 nevada  |  -.1383628   .0006148  -225.06   0.000    -.1395976    -.137128
          new hampshire  |  -.0393803   .0013241   -29.74   0.000    -.0420398   -.0367208
             new jersey  |   -.008576   .0009686    -8.85   0.000    -.0105214   -.0066305
             new mexico  |  -.0721512   .0002759  -261.53   0.000    -.0727053    -.071597
               new york  |  -.0349675   .0006587   -53.08   0.000    -.0362905   -.0336444
         north carolina  |  -.0470388   .0015972   -29.45   0.000     -.050247   -.0438307
           north dakota  |   .1223859   .0001326   923.07   0.000     .1221196    .1226522
                   ohio  |  -.0353636   .0005926   -59.68   0.000    -.0365538   -.0341733
               oklahoma  |  -.0589539   .0000995  -592.60   0.000    -.0591538   -.0587541
                 oregon  |   .0281229   .0004335    64.88   0.000     .0272523    .0289936
           pennsylvania  |  -.0778033    .000534  -145.71   0.000    -.0788758   -.0767307
           rhode island  |          0  (omitted)
         south carolina  |  -.0806422    .000304  -265.28   0.000    -.0812528   -.0800316
           south dakota  |    .009464   .0001519    62.30   0.000     .0091589    .0097692
              tennessee  |  -.0660559   .0002094  -315.40   0.000    -.0664766   -.0656353
                  texas  |  -.0567234   .0001418  -400.00   0.000    -.0570082   -.0564386
                   utah  |  -.0566612   .0000786  -720.98   0.000    -.0568191   -.0565034
                vermont  |   .0080213   .0007878    10.18   0.000     .0064389    .0096037
               virginia  |  -.0467071   .0003025  -154.39   0.000    -.0473148   -.0460994
             washington  |  -.0177072   .0005339   -33.17   0.000    -.0187795   -.0166349
          west virginia  |  -.0853103   .0000926  -921.41   0.000    -.0854962   -.0851243
              wisconsin  |   .0490889   .0003743   131.14   0.000     .0483371    .0498407
                wyoming  |  -.0744315   .0005945  -125.21   0.000    -.0756256   -.0732375
                         |
                    year |
                   1984  |   .0468675   .0046819    10.01   0.000     .0374636    .0562713
                   1986  |   .0060849   .0045582     1.33   0.188    -.0030705    .0152402
                   1988  |   .0314823   .0059296     5.31   0.000     .0195724    .0433921
                   1990  |   .0024185   .0058917     0.41   0.683    -.0094154    .0142524
                   1992  |   .0636051   .0061186    10.40   0.000     .0513156    .0758946
                   1994  |   .0049109   .0066092     0.74   0.461     -.008364    .0181858
                   1996  |   .0525019   .0072986     7.19   0.000     .0378422    .0671616
                   1998  |    .024599   .0078077     3.15   0.003     .0089167    .0402812
                   2000  |   .0676506   .0077914     8.68   0.000     .0520011    .0833001
                   2002  |   .0355685   .0087095     4.08   0.000     .0180749    .0530621
                   2004  |   .0985044   .0079113    12.45   0.000      .082614    .1143947
                   2006  |   .0616212   .0089254     6.90   0.000     .0436939    .0795484
                   2008  |   .1101179   .0089903    12.25   0.000     .0920603    .1281754
                   2010  |   .0660821   .0087477     7.55   0.000     .0485119    .0836524
                   2012  |   .1066888   .0095837    11.13   0.000     .0874394    .1259382
                   2014  |   .0623274   .0099264     6.28   0.000     .0423896    .0822651
                         |
                   _cons |   .7456745   .0057499   129.68   0.000     .7341255    .7572235
------------------------------------------------------------------------------------------

Honestly, I don't know if the results are right or how to interpret the coefficients. Also, I wanted to ask you if you could explain to me how you built both the variables -ps- and the event-time dummy. I am also puzzled as to how the result of their interaction tells me something on the treatment effect: how it tells me that the electoral participation of young people has increased in the states that introduced the preregistration law compared to those that did not introduce it?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29953
#14

20 Apr 2021, 18:57

This model is not a good implementation of what is shown in the screenshot in #1. In fact, the problem arose with the code I wrotein #2, and you just copied it--so my apologies. The command should be:

Code:

reg register i.ps#i.years_since_implement i.statefip i.year , cluster(statefip)

Note the use of #, not ##. This will eliminate the extra colinearities, and it will leave you with an interpretable interaction coefficient.

When you have indicators for i.statefip and i.year, you do not want to also have i.ps and i.years_since_implement as separate terms in your model: that introduces colinear relationships and something gets dropped. In your case, one of the things that got dropped was your most important term: the interaction. By replacing ## with #, those additional terms will not be generated, and things will be more comprehensible in the output.
Comment

Cairone Federica

Join Date: Apr 2021
Posts: 31

#15

21 Apr 2021, 08:41

Ciao Clyde! Againg thank you!!!
I modified what you just said above.. and that's the result:

Code:

. reg register i.ps#i.years_since_implement i.statefip i.year, cluster(statefip)
note: 0b.ps#3.years_since_implement omitted because of collinearity
note: 1.ps#3.years_since_implement omitted because of collinearity

Linear regression                               Number of obs     =  1,350,537
                                                F(15, 50)         =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0188
                                                Root MSE          =     .41783

                                          (Std. Err. adjusted for 51 clusters in statefip)
------------------------------------------------------------------------------------------
                         |               Robust
                register |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
ps#years_since_implement |
                    0 3  |          0  (omitted)
                    1 3  |          0  (omitted)
                         |
                statefip |
                 alaska  |   .0308412   .0001853   166.46   0.000      .030469    .0312133
                arizona  |  -.0928223    .000166  -559.19   0.000    -.0931557   -.0924889
               arkansas  |   -.086923   .0001292  -672.86   0.000    -.0871824   -.0866635
             california  |   -.034797   .0001841  -188.96   0.000    -.0351668   -.0344271
               colorado  |  -.0173893   .0006333   -27.46   0.000    -.0186614   -.0161172
            connecticut  |   .0005448   .0008953     0.61   0.546    -.0012535    .0023431
               delaware  |  -.0352458   .0006805   -51.80   0.000    -.0366126    -.033879
   district of columbia  |   .0282313   .0007183    39.30   0.000     .0267886     .029674
                florida  |  -.0394566   .0002665  -148.08   0.000    -.0399918   -.0389214
                georgia  |  -.0683078    .000455  -150.12   0.000    -.0692217   -.0673939
                 hawaii  |   -.123009    .000585  -210.28   0.000     -.124184    -.121834
                  idaho  |  -.0603099   .0001217  -495.41   0.000    -.0605544   -.0600654
               illinois  |   .0048569    .000483    10.06   0.000     .0038868     .005827
                indiana  |   -.061877   .0002409  -256.81   0.000     -.062361   -.0613931
                   iowa  |   .0000373    .000417     0.09   0.929    -.0008003    .0008749
                 kansas  |  -.0532566   .0002944  -180.89   0.000    -.0538479   -.0526653
               kentucky  |   -.044231   .0004095  -108.02   0.000    -.0450535   -.0434085
              louisiana  |   .0148772   .0001567    94.93   0.000     .0145624    .0151919
                  maine  |   .0698102   .0007664    91.09   0.000     .0682708    .0713496
               maryland  |  -.0154759   .0008111   -19.08   0.000     -.017105   -.0138467
          massachusetts  |   .0212855   .0011464    18.57   0.000     .0189828    .0235882
               michigan  |   .0440295   .0007776    56.62   0.000     .0424676    .0455914
              minnesota  |   .0784553   .0007606   103.15   0.000     .0769276    .0799829
            mississippi  |   .0312339   .0003148    99.22   0.000     .0306016    .0318662
               missouri  |   .0057357   .0003441    16.67   0.000     .0050445    .0064269
                montana  |  -.0110675    .000201   -55.07   0.000    -.0114712   -.0106639
               nebraska  |  -.0271213   .0002542  -106.67   0.000    -.0276319   -.0266106
                 nevada  |  -.1383628   .0006148  -225.06   0.000    -.1395976    -.137128
          new hampshire  |  -.0393803   .0013241   -29.74   0.000    -.0420398   -.0367208
             new jersey  |   -.008576   .0009686    -8.85   0.000    -.0105214   -.0066305
             new mexico  |  -.0721512   .0002759  -261.53   0.000    -.0727053    -.071597
               new york  |  -.0349675   .0006587   -53.08   0.000    -.0362905   -.0336444
         north carolina  |  -.0535125   .0008327   -64.27   0.000     -.055185   -.0518401
           north dakota  |   .1223859   .0001326   923.07   0.000     .1221196    .1226522
                   ohio  |  -.0353636   .0005926   -59.68   0.000    -.0365538   -.0341733
               oklahoma  |  -.0589539   .0000995  -592.60   0.000    -.0591538   -.0587541
                 oregon  |   .0216492   .0003991    54.24   0.000     .0208475    .0224509
           pennsylvania  |  -.0778033    .000534  -145.71   0.000    -.0788758   -.0767307
           rhode island  |  -.0064737   .0008297    -7.80   0.000    -.0081402   -.0048072
         south carolina  |  -.0806422    .000304  -265.28   0.000    -.0812528   -.0800316
           south dakota  |    .009464   .0001519    62.30   0.000     .0091589    .0097692
              tennessee  |  -.0660559   .0002094  -315.40   0.000    -.0664766   -.0656353
                  texas  |  -.0567234   .0001418  -400.00   0.000    -.0570082   -.0564386
                   utah  |  -.0566612   .0000786  -720.98   0.000    -.0568191   -.0565034
                vermont  |   .0080213   .0007878    10.18   0.000     .0064389    .0096037
               virginia  |  -.0467071   .0003025  -154.39   0.000    -.0473148   -.0460994
             washington  |  -.0177072   .0005339   -33.17   0.000    -.0187795   -.0166349
          west virginia  |  -.0853103   .0000926  -921.41   0.000    -.0854962   -.0851243
              wisconsin  |   .0490889   .0003743   131.14   0.000     .0483371    .0498407
                wyoming  |  -.0744315   .0005945  -125.21   0.000    -.0756256   -.0732375
                         |
                    year |
                   1984  |   .0468675   .0046819    10.01   0.000     .0374636    .0562713
                   1986  |   .0060849   .0045582     1.33   0.188    -.0030705    .0152402
                   1988  |   .0314823   .0059296     5.31   0.000     .0195724    .0433921
                   1990  |   .0024185   .0058917     0.41   0.683    -.0094154    .0142524
                   1992  |   .0636051   .0061186    10.40   0.000     .0513156    .0758946
                   1994  |   .0049109   .0066092     0.74   0.461     -.008364    .0181858
                   1996  |   .0525019   .0072986     7.19   0.000     .0378422    .0671616
                   1998  |    .024599   .0078077     3.15   0.003     .0089167    .0402812
                   2000  |   .0676506   .0077914     8.68   0.000     .0520011    .0833001
                   2002  |   .0355685   .0087095     4.08   0.000     .0180749    .0530621
                   2004  |   .0985044   .0079113    12.45   0.000      .082614    .1143947
                   2006  |   .0616212   .0089254     6.90   0.000     .0436939    .0795484
                   2008  |   .1101179   .0089903    12.25   0.000     .0920603    .1281754
                   2010  |   .0660821   .0087477     7.55   0.000     .0485119    .0836524
                   2012  |   .1066888   .0095837    11.13   0.000     .0874394    .1259382
                   2014  |   .0623274   .0099264     6.28   0.000     .0423896    .0822651
                         |
                   _cons |   .7456745   .0057499   129.68   0.000     .7341255    .7572235
------------------------------------------------------------------------------------------

Is something still wrong? You think it's okay for the first few lines to result in "omitted"? But by removing one #,I no longer have the interaction which is actually the one that allows me to grasp the treatment effect.

Announcement