Dear Stata-listers,
I hope you are all doing well.
Before I ask my question, let me provide a brief background about my research. I think this might be useful when someone (kindly) writes a reply. I am a junior researcher in accounting and finance who has been using panel data-sets for a while and implementing pooled OLS and/or panel models including several identification strategies (difference-in-differences, regression discontinuity design, instrumental variables, etc.).
Recently, I started working on a project with a colleague that comes from a management/strategy background. Our project is in his field of research, where the use of multilevel modelling is quite common, unlike my field of research (finance). Before we start the main analysis, my colleague and I are trying to replicate a seminal paper. The idea of the paper is simple: Chief Executive Officers (CEOs) have gained an increased importance in determining the firm's performance over the years. The data-set used in this paper is a panel data where, in a given year, a CEO manages a firm that operates in some industry. Assuming that firm performance is measured using return on assets (ROA), i.e., the dependent variable, the paper finds that the percentage of the variance of ROA explained by CEOs has increased over time (the paper compares three intervals of time: 1950-1969, 1970-1989, and 1990-2009). I include below a sample of a similar data-set:
The authors of the original paper mentions the following: "We use multilevel modeling (MLM), which has the advantage of explicitly accounting for the nested structure of the data. For the MLM analysis, we specified a four-level nested model: years, within CEOs, within firms, within industries. We used the Stata command xtmixed for the MLM analysis."
Before I wrote this post, I spent a couple of days searching and reading several resources. I got the general idea of the analysis and how it works (Stata's videos and blogs are very helpful). Yet I am not sure if the command I thought of does what the authors of the original paper described. My suggested code is included below:
Please let me know what you think. Any additional explanation about MLM or about coding is welcomed.
Thank you all.
Mostafa
(Stata 15.1 MP)
I hope you are all doing well.
Before I ask my question, let me provide a brief background about my research. I think this might be useful when someone (kindly) writes a reply. I am a junior researcher in accounting and finance who has been using panel data-sets for a while and implementing pooled OLS and/or panel models including several identification strategies (difference-in-differences, regression discontinuity design, instrumental variables, etc.).
Recently, I started working on a project with a colleague that comes from a management/strategy background. Our project is in his field of research, where the use of multilevel modelling is quite common, unlike my field of research (finance). Before we start the main analysis, my colleague and I are trying to replicate a seminal paper. The idea of the paper is simple: Chief Executive Officers (CEOs) have gained an increased importance in determining the firm's performance over the years. The data-set used in this paper is a panel data where, in a given year, a CEO manages a firm that operates in some industry. Assuming that firm performance is measured using return on assets (ROA), i.e., the dependent variable, the paper finds that the percentage of the variance of ROA explained by CEOs has increased over time (the paper compares three intervals of time: 1950-1969, 1970-1989, and 1990-2009). I include below a sample of a similar data-set:
Firm_ID | Year | CEO | Industry | ROA |
1 | 2003 | Liang | K7 | 0.06019 |
1 | 2004 | Liang | K7 | 0.069624 |
1 | 2005 | Liang | K7 | 0.077258 |
1 | 2006 | Liang | K7 | 0.069463 |
1 | 2007 | Liang | K7 | 0.075686 |
1 | 2008 | Liang | K7 | 0.048303 |
1 | 2009 | Liang | K7 | 0.054536 |
1 | 2010 | Liang | K7 | 0.052903 |
1 | 2011 | Liang | K7 | 0.047317 |
1 | 2012 | Liang | K7 | 0.048673 |
1 | 2013 | Liang | K7 | 0.04473 |
1 | 2014 | Liang | K7 | 0.040357 |
1 | 2015 | Liang | K7 | 0.047204 |
1 | 2016 | Liang | K7 | 0.04153 |
1 | 2017 | Liang | K7 | 0.039362 |
2 | 2003 | Kexin | C27 | 0.046562 |
2 | 2004 | Kexin | C27 | -0.00105 |
2 | 2005 | Kexin | C27 | -0.08607 |
2 | 2006 | Kexin | C27 | 0.021265 |
2 | 2007 | Kexin | C27 | -0.04802 |
2 | 2008 | Lufeng | C27 | -0.06058 |
2 | 2009 | Lufeng | C27 | 0.027213 |
2 | 2010 | Xiao | C27 | 0.095465 |
The authors of the original paper mentions the following: "We use multilevel modeling (MLM), which has the advantage of explicitly accounting for the nested structure of the data. For the MLM analysis, we specified a four-level nested model: years, within CEOs, within firms, within industries. We used the Stata command xtmixed for the MLM analysis."
Before I wrote this post, I spent a couple of days searching and reading several resources. I got the general idea of the analysis and how it works (Stata's videos and blogs are very helpful). Yet I am not sure if the command I thought of does what the authors of the original paper described. My suggested code is included below:
Code:
xtset ID Year mixed ROA control_variables || Industry: || Firm_ID: || CEO: || Year:, mle variance estat ICC // to get the percentage of variance explained by Industry, Firm_ID, CEO, and Year.
Thank you all.
Mostafa
(Stata 15.1 MP)
Comment