Interpret intercept in latent growth curve models

Rosemary Li

Join Date: May 2016

Posts: 68
#1

Interpret intercept in latent growth curve models

27 Sep 2016, 19:12

Hi Experts,

I ran latent growth curve models. Assume I have observations of n firms from year i to year j. For instance, my dependent variable is number of patents. If I understand correctly, the returned intercept means the average number of patents of n firms in year i (initial year).

My question is, if some firms are born after year i (before year j)--observations for these firms before they were born would be treated as missing, how can we interpret the returned intercept then? It just feels a bit weird to claim the intercept represents an average initial status of all firms, while some firms haven't born yet in that initial year.

The same goes for the returned slope. If my panel data are not balanced due to birth and death of firms between year i and year j, some firms would have curves over a shorter (than j minus i) period.

Is there some special model that I should adopt to cope with such an unbalanced panel?

P.S. I ran the latent growth negative binomial model to deal with a count dependent variable.

Hope some of you could help. Thanks, a lot.

Best,
Rosemary
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

27 Sep 2016, 19:49

if some firms are born after year i (before year j)--observations for these firms before they were born would be treated as missing, how can we interpret the returned intercept then?

The intercept, in this case, represents an extrapolation of your model to predict the number of patents that the firm would have had in year 0 if it existed then. In that sense, it is no different from the intercept in any other regression model If I ran

Code:

sysuse auto, clear regress price mpg

the intercept term represents the predicted price for a car that gets 0 miles per gallon. Clearly no such car exists.

The possible solutions are the same in both situations. One is to just ignore it because often we don't care about the intercepts any way. Or, if we need to have meaningful intercepts, the predictor variables can be "centered" so that zero values of the predictors correspond to meaningful points. So, in your case, one might use years elapsed from entry into the data set as your time variable instead of absolute years. If you did that, your intercepts would represent the model fit to the number of patents when the firm entered the study. In the auto example, one might replace mpg with mpg - mean mpg, or something like that.

The same goes for the returned slope. If my panel data are not balanced due to birth and death of firms between year i and year j, some firms would have curves over a shorter (than j minus i) period.

Again, this is no different from any other slope in any other regression model. The more data points you have, the more precisely your slopes will be estimated. But, however much data you have, the model estimates the best slope it can find to fit that data. No special procedures are needed for unbalanced data.

Last edited by Clyde Schechter; 27 Sep 2016, 19:51.
Comment
Rosemary Li

Join Date: May 2016

Posts: 68
#3

27 Sep 2016, 20:43

Thanks, Clyde. Your answers are always to the point. I taught myself and did not find a textbook to properly compare latent techniques with traditional techniques. Thus, I am very happy to see the connection you made. It's a great help

Originally posted by Clyde Schechter View Post

So, in your case, one might use years elapsed from entry into the data set as your time variable instead of absolute years. If you did that, your intercepts would represent the model fit to the number of patents when the firm entered the study.

I think I did exactly what you suggested. But this cannot help avoid the problem I raised. The returned intercept is still an "extrapolation" as you commented. It seems to be unavoidable, no matter how we set the time variable.

But I agree that what you suggested returns a meaningful intercept.

Last edited by Rosemary Li; 27 Sep 2016, 20:51.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

27 Sep 2016, 21:11

I think I did exactly what you suggested. But this cannot help avoid the problem I raised. The returned intercept is still an "extrapolation" as you commented. It seems to be unavoidable, no matter how we set the time variable.

I think perhaps the term "years elapsed from entry into the data set" was ambiguous and you took it differently from what I meant. So let me be more explicit. Suppose each firm is identified in variable id. And suppose the variable year is the calendar year. What I propose you do is parameterize time separately for each firm so that time zero corresponds to the year that particular firm enters the data set:

Code:

by id (year), sort: gen time = year - year[1]

Then use time instead of year in your model. If you do that, the intercept for any firm will correspond to the modeled value of the outcome at time = 0, which corresponds to the first year that that firm appears in your data set. So it is not an extrapolation: it is a fit to the endpoint of your data for that firm.
Comment
Rosemary Li

Join Date: May 2016

Posts: 68
#5

27 Sep 2016, 21:30

WOW! This is new and interesting. Did you have multilevel modeling in your mind? If so, I need to review it to fully appreciate this attractive trick.

I know multilevel models could serve as latent growth curve models, but I am used to the later (specifically, sem or gsem command in Stata).

I thought you mean to set t==0 for year i (initial year of my study) and then set t==1, 2, 3, ..., j-i for years followed. In sem or gsem, this setting applies to all firms, no matter when they were born and when they went out of business.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#6

27 Sep 2016, 21:50

Yes, I did have multi-level modeling in mind. But I see no reason why the same approach would not work in -sem-/-gsem-
Comment
Rosemary Li

Join Date: May 2016

Posts: 68
#7

27 Sep 2016, 22:25

I haven't seen anyone done this in sem/gsem. Could you help look at the code below and see whether there is a way out? I am a little greedy here, ha ha. But, honestly, no one around that I know of can help.

Below is a sample of codes I used. Before applying it, data are transformed into wide format. n0-n5 are observations of the dependent variable in year i (initial year), i+1, i+2....

gsem
(Intercept@1 Slope@0->n0)
(Intercept@1 Slope@1->n1)
(Intercept@1 Slope@2->n2)
(Intercept@1 Slope@3->n3)
(Intercept@1 Slope@4->n4)
(Intercept@1 Slope@5->n5),
noconstant means(Intercept Slope) family(nbinomial) link(log)
Comment
Rosemary Li

Join Date: May 2016

Posts: 68
#8

27 Sep 2016, 22:34

No worries, if you do not use sem or gsem, given that you are very good at multilevel modeling.
Then let's just put post #7 there and see whether someone else happens to have thought about it.
You have already helped more than my original post asked for!
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29956

28 Sep 2016, 10:53

Well, I think in this context it's a matter of how you define n0 through n5. So, something like this:

Code:

by id (year), sort: gen time = year - year[1]
gen n = outcome // YOUR OUTCOME VARIABLE HERE
keep id n time
reshape wide n, i(id) j(time)

gsem ///
(Intercept@1 Slope@0->n0)  ///
(Intercept@1 Slope@1->n1)  ///
(Intercept@1 Slope@2->n2)  ///
(Intercept@1 Slope@3->n3)  ///
(Intercept@1 Slope@4->n4)  ///
(Intercept@1 Slope@5->n5),  ///
noconstant means(Intercept Slope) family(nbinomial) link(log)

Note: With this redefinition of time, you may end up with a different number of time points than you did before. So the -gsem- code will have to be adjusted accordingly.

Comment

Rosemary Li

Join Date: May 2016

Posts: 68
#10

28 Sep 2016, 19:06

Cool! Very cool!
Now I fully understand what your definition of time is and I see the uniqueness and beauty of it.
I should not have made post #7. Thanks for bearing with me
Comment

Announcement

Interpret intercept in latent growth curve models

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment