Hello all,
I'm hoping that I can get some help with understanding something. I have what I believe is a panel dataset with many firms each period that are operating in many states. I'm performing analysis where I'm including fixed effects for firm, state, and time, and it is my preference to use xtgee if possible in order to account for within group correlation. My question is... if I perform:
xtset firm time
Then I get the message that there are repeated time values within panel. I am aware that this is because there are multiple observations for each firm in each time period and accordingly it is because firm is not uniquely identified within each time period. Therefore, is it valid to only perform:
xtset firm
and go on with my analysis?
e.g., xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity) robust
or would this not be appropriate?
I considered instead creating unique ids for each firm-state combination, but my concern is that would in essence be tricking Stata to xtset the panelvar and timevar for me. e.g.,
xtset firmstateid time
then:
xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity) robust
In this scenario, I don't think of the firm-state combination as necessarily a real higher order unit that exists, such as a 'firm' or 'state,' but instead it is an artifact of the structure of the dataset that I am identifying for analysis purposes, right? However, it does seem like the second approach adjusts for within group correlation of the firm-state observations over time (i.e., the idea that firm1's activity (the DV) in Texas at time 1 may be correlated with firm1's activity (the DV) in Texas at time 2), which I like and sounds like it could be right.
Whereas the first approach just equally adjusts for within group correlations of within firm observations (regardless of state and time). Although, I believe it is less likely that firm1's activity in Texas at time 1 is correlated with firm 1's activity in Alaska at time 1.
Am I thinking about this correctly? What would be the valid approach to analyze the data?
I thank you all in advance for giving this a look
I'm hoping that I can get some help with understanding something. I have what I believe is a panel dataset with many firms each period that are operating in many states. I'm performing analysis where I'm including fixed effects for firm, state, and time, and it is my preference to use xtgee if possible in order to account for within group correlation. My question is... if I perform:
xtset firm time
Then I get the message that there are repeated time values within panel. I am aware that this is because there are multiple observations for each firm in each time period and accordingly it is because firm is not uniquely identified within each time period. Therefore, is it valid to only perform:
xtset firm
and go on with my analysis?
e.g., xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity) robust
or would this not be appropriate?
I considered instead creating unique ids for each firm-state combination, but my concern is that would in essence be tricking Stata to xtset the panelvar and timevar for me. e.g.,
xtset firmstateid time
then:
xtgee y x1 x2 x3 i.firm i.state i.time, family(gaussian) link(identity) robust
In this scenario, I don't think of the firm-state combination as necessarily a real higher order unit that exists, such as a 'firm' or 'state,' but instead it is an artifact of the structure of the dataset that I am identifying for analysis purposes, right? However, it does seem like the second approach adjusts for within group correlation of the firm-state observations over time (i.e., the idea that firm1's activity (the DV) in Texas at time 1 may be correlated with firm1's activity (the DV) in Texas at time 2), which I like and sounds like it could be right.
Whereas the first approach just equally adjusts for within group correlations of within firm observations (regardless of state and time). Although, I believe it is less likely that firm1's activity in Texas at time 1 is correlated with firm 1's activity in Alaska at time 1.
Am I thinking about this correctly? What would be the valid approach to analyze the data?
I thank you all in advance for giving this a look
Comment