Instrumenting on dummies and categorical

Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#1

Instrumenting on dummies and categorical

27 Jun 2022, 10:47

Dear all,

I am having a panel data where I am trying to instrument on a bunch of dummies and categorical, among others. The number of dummies and categorical is large!.
My problem is:

If var1 is a 0/1 indicator, then var1*var1 = var1 because 0*0=0 and 1*1=1
How do I solve this?

I use ivreghdfe for the estimation with Driscoll-Kraay AR(1) errors

How do I solve that issue?

Last edited by Giorgio Di Stefano; 27 Jun 2022, 10:53.
Tags: None
Fei Wang

Join Date: Oct 2021

Posts: 726
#2

27 Jun 2022, 20:02

Giorgio, is var1 an independent variable or an instrumental variable? If it's an independent variable, then you are not able to and don't need to control its squared form. If it's an instrumental variable for another endogenous continuous variable and the squared form of the endogenous variable is also included in the main regression, then you may refer to Jeff Wooldridge's two-step control function method (#5 in https://www.statalist.org/forums/for...quadratic-term).
1 like
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#3

28 Jun 2022, 02:26

Originally posted by Fei Wang View Post

Giorgio, is var1 an independent variable or an instrumental variable? If it's an independent variable, then you are not able to and don't need to control its squared form. If it's an instrumental variable for another endogenous continuous variable and the squared form of the endogenous variable is also included in the main regression, then you may refer to Jeff Wooldridge's two-step control function method (#5 in https://www.statalist.org/forums/for...quadratic-term).

Thank you, very much Fei Wang!

Actually, there are endogenous predictors, not depended variables, among other endogenous predictors, which I also use them as instruments for a dynamic panel wirh AR(1) process. That'' way I use ivreghdfe , which now seems not the way to go I guess?

I used this

Code:

ivreghdfe growthgdp l.gdp cpi u Output dummy1 dummy2 c.indicator1##c.indicator1 (lkse=outpit), absorb(idcode year) dkraay(1)

I understand that the interaction of a continuous variable with itself, the quadratic term in the code, defines two variables, the variable and its quadratic term. I note that multiplying a 0/1 indicator by itself results in the same variable. It should be the other categorical variables that result in many indicators. How do I solve this?

I would like to obtain dkraay(1) errors in a way even indirectly, having. an AR(1) process

Last edited by Giorgio Di Stefano; 28 Jun 2022, 02:46.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#4

28 Jun 2022, 02:34

Actually, there are endogenous predictors, not depended variables, among other endogenous predictors. So, ivreghdfe is not the way to go I guess?

Well, you haven't told me what var1 is, endogenous predictor or instrumental variable? And why are you worried about the issue that var1 = var1*var1?
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#5

28 Jun 2022, 03:47

Originally posted by Fei Wang View Post

Well, you haven't told me what var1 is, endogenous predictor or instrumental variable? And why are you worried about the issue that var1 = var1*var1?

Yes, they are in the RHS, and I am also instrumenting on themselves and on their first lag. I use an AR(1) process. I am mostly worried since I am not getting any Sargan or other test on the display output , so there must be an issue
I had started a thread here,
https://www.statalist.org/forums/for...-out-for-panel

with Andrew marking out the problem
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#6

28 Jun 2022, 04:45

Sorry Giorgio, I'm not able to understand your question. You said in #1 that your problem comes from that var1*var1 = var1 if var1 is a binary variable. I simply don't understand why it's a problem.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#7

28 Jun 2022, 05:43

I'm also confused. Is the variable "indicator1" a dummy variable? If so, why are you interacting it with itself? It should just be there by itself. Did you mean to include c.indicator1#c.lkse (in addition to c.indicator1 and lkse) and then you need an instrument for c.indicator1#c.lkse? If so, then c.indicator1#c.outpit should be the IV.

Code:

ivreghdfe growthgdp l.gdp cpi u Output dummy1 dummy2 c.indicator1 (lkse c.indicator1#c.lkse = outpit c.indicator1#c.outpit), absorb(idcode year) dkraay(1)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#8

29 Jun 2022, 07:18

Dear Prof Jeff Wooldridge and Fei Wang,

Thank you warmly for your time in answering my Inquire.

Here they are more details on what I am working on and would appreciate if you could devote one minute to assist me further, since I have been struggling with this for a long time now.

I am having a dynamic panel from post-war, to 2021 ,1944-2021, for about 50 countries on macro data. I am studying the effects on GDP growth rate, having a dataset composted with the solid macro variables, GDP growth rate, cpi, REER, fiscal size, trade index etc, a set numerous both dummies and categorical variables, and some custom quality continuous indicators, expressed in percentage. There are ten kinds of indicators in the data, interacting only with themselves. According to my model, indicators definitely determine RHS macro variables and GDP growth rate.

My main model is given by :

Code:

ΔY= (a-1) + b₁y_I,_t-1+ b₂Ty_tI,_-1+b₃T²y_tI,_-1 +Z_i,t+ D_i,t+ e_,it,

Where Y is the dependent variable of interest, Z a set of control variables, and D a set of dummy and categorical variables. T is an operator for the indicators.

I am trying with the following code as in #3 by correcting and filling it further. I am interested in Driscoll-Kraay AR(1) errors

ivreghdfe growthgdp l.gdp cpi fiscal reer trade labor u dummy1 dummy2 dummy3 i.categorial1 i.categorical2 i.categorial3 c.indicator##c.indicator (lkse=output), absorb(idcode year) dkraay(1)

My dependent variable Y is

Code:

growthgdp

and my independent variables at the RHS

l.gdp cpi reer tradeindx labor u dummy1 dummy2 dummy3 i.categorial1 i.categorical2 i.categorial3 c.indicator##c.indicator

All independent variable are endogenous variables. For obvious reasons of time and space, I've only included in this code just a few of them. Dummy variables capture events and categorical variables capture classification or duration of an event in years. Their total number is large.

In #3 Idefinitely used wrong the syntax as well as the lkse (factor prices) and output (output price value) thinking them as instruments, setting them equal in the syntax

Code:

( lkse=output)

I am not interested in coefficients for lkse (factor prices) nor output (output price value)

I would appreciate if you could comment on my model and if I should use my endogenous variables and the indicators as instruments, perhaps on their first lag. As said above, indicators interact only with themselves.

I also would appreciate if you could kindly provide a code on how to proceed for the estimation with Driscoll-Kraay AR(1) errors.

On the initial question I was told that, considering also, the large numbers of dummies and categorical, If var1 is a 0/1 indicator, then var1*var1 = var1 because 0*0=0 and 1*1=1, it is an issue to be solved before any estimation. I did not get the reason that's why I asked

Nonetheless, I might seem a bit naive, and thus would appreciate if you could kindly comment on the code.

Thank you wholehardly,

Giorgio!
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#9

29 Jun 2022, 09:15

Giorgio, I have a few comments.

First, your model is not a standard dynamic panel-data model. A dynamic model for your case would control for the GDP growth rates in the past periods rather than previous levels of GDP.

Second, while saying some independent variables are endogenous, you need to give concrete reasons -- what're in the error terms and why some independent variables are correlated with them. You're not able to think of valid instruments unless you figure out the reasons for endogeneity and clearly state what you've assumed for your model. It's quite ambitious to say that all independent variables are endogenous, but everything about your model is ambiguous to me -- I don't think you're clear about your model either, because, for example, your econometric model does not include individual FEs but your code includes them.

Third, even though you think all independent variables are endogenous, what you've coded treats only lkse to be endogenous.

Last but not least, if an indicator in your model is a binary variable (0 or 1), then you shouldn't include its interaction with itself from the very beginning.

I suggest you read carefully the related literature to learn how the GDP growth rate is theoretically determined and how the determination of the GDP growth rate is statistically modeled and estimated. I'm not familiar with this branch of literature but I believe there are many to follow.
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#10

30 Jun 2022, 10:57

Originally posted by Fei Wang View Post

Giorgio, I have a few comments.

First, your model is not a standard dynamic panel-data model. A dynamic model for your case would control for the GDP growth rates in the past periods rather than previous levels of GDP.

Second, while saying some independent variables are endogenous, you need to give concrete reasons -- what're in the error terms and why some independent variables are correlated with them. You're not able to think of valid instruments unless you figure out the reasons for endogeneity and clearly state what you've assumed for your model. It's quite ambitious to say that all independent variables are endogenous, but everything about your model is ambiguous to me -- I don't think you're clear about your model either, because, for example, your econometric model does not include individual FEs but your code includes them.

Third, even though you think all independent variables are endogenous, what you've coded treats only lkse to be endogenous.

Last but not least, if an indicator in your model is a binary variable (0 or 1), then you shouldn't include its interaction with itself from the very beginning.

I suggest you read carefully the related literature to learn how the GDP growth rate is theoretically determined and how the determination of the GDP growth rate is statistically modelled and estimated. I'm not familiar with this branch of literature but I believe there are many to follow.

@ Fei Wang, thank you for your comments!

I typed the model, but seems not to have appeared properly. I am typing it again, hoping it is more clear now.

(y_i,t − y_i,t−1 )= (α − 1)y_i,t−1 + β₁y_i,t−1+ β₂T y_i,t−1 + β₃T ^₂y_i,t−1 + y_{i, t}+Σ¹_Nβ₄+ny_i,t + +β₅Ζ_i,t+D_i,t+μ_t+η_i,+ε_i,t

I include the indicator measure as squared polynomials in a direct way in my estimation of the GDP growth rate. Z is a vector of control variables that affect the level of GDP. D is a vector of events and other types of dummies; μt is the time-specific effect; ηi is the country-specific effect; and ε_i,tis the error term.

Depended variable is GDP growth rate.

My indicators are continuous values expressed in percent, not a binary variable (0 or 1). Dummies in the model are binary variable(0/1) and the categorical, where some of them take single values from 1 up to 20 sometimes

indicators are interacting only with themselves.

On your points in #9 I can clearly see them The model I guess is dynamic at it is considering first lag and variables growth rates not levels.

On your point 2 and 3 of your remark, I wrongly included variables or mixed varialbes , that's just because I did not understand the syntax. I now have seen some videos on YouTube on how it should be written properly. Please discharge any point beyond the present one.

Considering, that my indicators, as wrote above, are continuous values expressed in percent, if I wanted to check the effects on the depended variable gdpgrowth of the control variable say for example the control variable labor in #6 related to the indicators, should c.indicator##c.indicator be the IV?

Is this code correct?

Code:

ivreghdfe growthgdp l.gdp cpi reer tradeindx labor u dummy1 dummy2 dummy3 i.categorial1 i.categorical2 i.categorial3 c.indicator##c.indicator (labor=c.indicator##c.indicator), absorb(idcode year) dkraay(1)

I am fascinated by the Driscoll-Kraay errors, aiming to capture the time variation as an AR(1) process. If there is another way to produce the same results, would be happy to adopt.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#11

30 Jun 2022, 11:12

Technically, an IV should be excluded from the main regression. If you've controlled for c.indicator##c.indicator, then it cannot be an IV. Conceptually, even though you exclude c.indicator##c.indicator from the list of control variables, I don't know whether they can be IVs for labor -- As I said in #9, you need to think why labor is endogenous and why c.indicator##c.indicator are valid IVs (satisfy assumptions of IVs). Sorry to say that, but the DK standard errors are the least important in your research design as you've not yet properly set up the fundamental structure of the model.
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#12

30 Jun 2022, 12:16

Originally posted by Fei Wang View Post

Technically, an IV should be excluded from the main regression. If you've controlled for c.indicator##c.indicator, then it cannot be an IV. Conceptually, even though you exclude c.indicator##c.indicator from the list of control variables, I don't know whether they can be IVs for labor -- As I said in #9, you need to think why labor is endogenous and why c.indicator##c.indicator are valid IVs (satisfy assumptions of IVs). Sorry to say that, but the DK standard errors are the least important in your research design as you've not yet properly set up the fundamental structure of the model.

I see that! Thank you Fei!
Comment

Announcement

Instrumenting on dummies and categorical

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment