Linear time trend in birth cohorts

Sandipa Bhattacharya

Join Date: May 2020

Posts: 26
#1

Linear time trend in birth cohorts

12 Apr 2022, 09:11

I want to write a stata code to solve the following:

Pre-treatment state GDP growth rate interacted with linear birth cohort trend.

Say my treatment starts in some year X, I have the preceeding 10 years average GDP growth rate for each state (variable name: GDPgr) in my data. I have the variable birth year and birth month.

I constructed the birth cohort as: gen birth_cohort = ym(birthyear, birthmonth)

Now, to interacted this birth cohort linear trend with my GDP growth rate in each state, how do I write the code?

Is it, i.GDPgr#c.birthcohort

But if I use this, then stata says " factor variables may not contain noninteger values".

Then if I use, c.GDPgr#c.birthcohort - does that make sense to find out "Pre-treatment state GDP growth rate interacted with linear birth cohort trend."?

OR it should be, c.GDPgr#i.birthcohort?

Can anyone help me understand this? And how to interpret this? What is the underlying meaning?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

12 Apr 2022, 12:30

The correct specification depends on whether you want to treat GDPgr as having a continuous linear effect on your outcome variable, or whether each value of GDPgr has a discrete shock-like effect on the outcome variable. If the former you would use c.GDPgr and if the latter you would use i.GDPgr.

Now, if your intent is to have the GDPgr be a shock variable whose effect on the outcome is not related in any straightforward way to its magnitude, then you have to change the way you code it. Rather than using the value of GDPgr itself, you have to just create a new variable that replaces the actual GDPgr values with non-negative integers, and then use that new variable, with the i. prefix in your code. The simplest way to do that is with the -encode- command. (Please read -help encode- if you are not familiar with it.) Again, I want to emphasize that this approach is only sensible if you want to treat the effect of GDPgr on your outcome as a random shock unrelated to the value of GDPgr. I am not an economist, but, frankly, this sounds implausible to me, and, at least here on Statalist, I have never seen anybody propose to do that before. So if you are going to go this route, think it through carefully.

The same consideration applies to birth cohort. The difference is that, in this case, a shock rather than linear trend is a plausible model and one that is commonly used. The reason is that a birth cohort variable can reasonably thought of as a proxy for the many, many things that each birth cohort experiences up to the point in life where they are observed in the study. All of those things fluctuate over time in different ways and may have opposing effects on your study outcome. Consequently, for many kinds of outcome, there is no reason to expect that the relationship of birth month to the outcome will be continuous or llinear: as a proxy for so many things it can be more like a linear shock. On the other hand, there are also many kinds of outcome for which birth cohort would be expected to exert a continuous linear effect, at least over moderate spans of time. For example, in most places over a period of a few decades, infant mortality rates will continuously rise or continuously fall with time.

So the bottom line is that it really all depends on your model of the real-world data generating process. As you have not even stated what your outcome variable is, no more specific advice can be given.

Finally, one technical point. It is safer to enter the interaction term in your regression models with the ## operator rather than the # operator. When you use ##, Stata will automatically include the "main" effects along with the interaction terms, and that makes for a better specified model. There are some circumstances (colinearity with other variables) where are main effect needs to be omitted, but Stata will automatically recognize those situations and do the omission. So you never go wrong using ##--it prevents you from making easy-to-make mistakes.
1 like
Comment

Announcement

Linear time trend in birth cohorts

Comment