Hello,
I hope you don't mind another semi-Stata related question. I'm not very experienced in working with panel data and would greatly appreciate your advice.
In my current research project, the data structure is as follows:
I compiled a list of publicly listed US firms that were part of a major stock index from 2010 to 2018, marking my sample period. Once I identified this set of firms, I obtained sentiment data for their quarterly earnings conference calls. In essence, this data provides insights into aspects such as the tone (positive, neutral, or negative) of CEOs speech during these calls. Ideally, I aim to have 36 firm-quarter observations for each firm (9 years * 4 quarters). I am planning to run fixed effects regression using the sentiment data as my dependent variables and some CEO characteristics as my independent variables.
Missing data for some of the firms leads to the following frequency distribution of firm-quarters.
Certainly, I am working with an unbalanced panel, and I would highly appreciate it if you could point out any concerns regarding the underlying data structure that might hinder me from running a panel data regression, incorporating firm and quarter fixed effects. As far as I know whether the dataset is balanced or unbalanced is not influencing the estimation of the coefficients, not sure about other parts of the model estimation though.
Thank you!
I hope you don't mind another semi-Stata related question. I'm not very experienced in working with panel data and would greatly appreciate your advice.
In my current research project, the data structure is as follows:
I compiled a list of publicly listed US firms that were part of a major stock index from 2010 to 2018, marking my sample period. Once I identified this set of firms, I obtained sentiment data for their quarterly earnings conference calls. In essence, this data provides insights into aspects such as the tone (positive, neutral, or negative) of CEOs speech during these calls. Ideally, I aim to have 36 firm-quarter observations for each firm (9 years * 4 quarters). I am planning to run fixed effects regression using the sentiment data as my dependent variables and some CEO characteristics as my independent variables.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int Conf_Call_Date double Firm_ID int(CEO_Pres_Speech_NoWords CC_Quarter) float(CEO_Pres_Speech_NoPosWords CEO_Pres_Speech_NoNegWords) 18392 4295899290 397 200 5 3 18469 4295899290 411 201 7 2 18570 4295899290 1172 202 19 29 19116 4295899290 1002 208 24 6 19207 4295899290 897 209 18 2 19298 4295899290 1036 210 12 8 19409 4295899290 728 211 14 8 19480 4295899290 987 212 12 8 19571 4295899290 901 213 11 4 19662 4295899290 926 214 11 8 19772 4295899290 620 215 8 6 19844 4295899290 655 216 6 4 19935 4295899290 1084 217 21 8 20026 4295899290 769 218 12 5 20137 4295899290 803 219 16 8 20208 4295899290 927 220 9 9 20299 4295899290 982 221 30 13 20390 4295899290 947 222 8 13 20502 4295899290 868 223 15 12 20579 4295899290 1007 224 16 14 20670 4295899290 747 225 10 10 20761 4295899290 988 226 15 13 20867 4295899290 1054 227 16 6 20943 4295899290 1154 228 14 11 21034 4295899290 1348 229 33 24 end format %tdDD/NN/CCYY Conf_Call_Date format %tq CC_Quarter
Code:
No_Quarter | Freq. Percent Cum. ------------+----------------------------------- 1 | 3 0.04 0.04 2 | 4 0.05 0.09 3 | 9 0.11 0.20 4 | 20 0.25 0.45 5 | 5 0.06 0.52 8 | 32 0.40 0.92 9 | 9 0.11 1.03 11 | 22 0.28 1.31 12 | 48 0.60 1.91 13 | 13 0.16 2.07 14 | 28 0.35 2.43 15 | 30 0.38 2.80 16 | 16 0.20 3.00 17 | 17 0.21 3.22 18 | 72 0.90 4.12 19 | 38 0.48 4.60 20 | 80 1.01 5.60 21 | 63 0.79 6.40 22 | 88 1.11 7.50 23 | 230 2.89 10.39 24 | 120 1.51 11.90 25 | 225 2.83 14.73 26 | 260 3.27 17.99 27 | 459 5.77 23.76 28 | 448 5.63 29.39 29 | 551 6.92 36.32 30 | 1,110 13.95 50.26 31 | 1,581 19.87 70.13 32 | 576 7.24 77.37 33 | 66 0.83 78.20 34 | 272 3.42 81.62 35 | 455 5.72 87.33 36 | 1,008 12.67 100.00 ------------+----------------------------------- Total | 7,958 100.00 xtset Firm_ID CC_Quarter xtdescribe Firm_ID: 4.296e+09, 4.296e+09, ..., 8.590e+09 n = 292 CC_Quarter: 2010q1, 2010q2, ..., 2018q4 T = 36 Delta(CC_Quarter) = 1 quarter Span(CC_Quarter) = 36 periods (Firm_ID*CC_Quarter uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 1 8 26 30 31 36 36 Freq. Percent Cum. | Pattern ---------------------------+-------------------------------------- 28 9.59 9.59 | 111111111111111111111111111111111111 8 2.74 12.33 | 11111111111.....11111111111111111111 7 2.40 14.73 | ....11111111111111111111111111111111 6 2.05 16.78 | 111.....1111111111111111111111111111 6 2.05 18.84 | 111111111111111111......111111111111 5 1.71 20.55 | 111111111111111.....1111111111111111 5 1.71 22.26 | 11111111111111111111111111......1111 4 1.37 23.63 | 1111111111111111111.........11111111 4 1.37 25.00 | 11111111111111111111.....11111111111 219 75.00 100.00 | (other patterns) ---------------------------+-------------------------------------- 292 100.00 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Thank you!
Comment