Multilevel Modelling - Variance Partitioning

Mostafa Harakeh

Join Date: Jul 2014
Posts: 91

Multilevel Modelling - Variance Partitioning

09 Aug 2019, 13:11

Dear Stata-listers,

I hope you are all doing well.

Before I ask my question, let me provide a brief background about my research. I think this might be useful when someone (kindly) writes a reply. I am a junior researcher in accounting and finance who has been using panel data-sets for a while and implementing pooled OLS and/or panel models including several identification strategies (difference-in-differences, regression discontinuity design, instrumental variables, etc.).

Recently, I started working on a project with a colleague that comes from a management/strategy background. Our project is in his field of research, where the use of multilevel modelling is quite common, unlike my field of research (finance). Before we start the main analysis, my colleague and I are trying to replicate a seminal paper. The idea of the paper is simple: Chief Executive Officers (CEOs) have gained an increased importance in determining the firm's performance over the years. The data-set used in this paper is a panel data where, in a given year, a CEO manages a firm that operates in some industry. Assuming that firm performance is measured using return on assets (ROA), i.e., the dependent variable, the paper finds that the percentage of the variance of ROA explained by CEOs has increased over time (the paper compares three intervals of time: 1950-1969, 1970-1989, and 1990-2009). I include below a sample of a similar data-set:

Firm_ID	Year	CEO	Industry	ROA
1	2003	Liang	K7	0.06019
1	2004	Liang	K7	0.069624
1	2005	Liang	K7	0.077258
1	2006	Liang	K7	0.069463
1	2007	Liang	K7	0.075686
1	2008	Liang	K7	0.048303
1	2009	Liang	K7	0.054536
1	2010	Liang	K7	0.052903
1	2011	Liang	K7	0.047317
1	2012	Liang	K7	0.048673
1	2013	Liang	K7	0.04473
1	2014	Liang	K7	0.040357
1	2015	Liang	K7	0.047204
1	2016	Liang	K7	0.04153
1	2017	Liang	K7	0.039362
2	2003	Kexin	C27	0.046562
2	2004	Kexin	C27	-0.00105
2	2005	Kexin	C27	-0.08607
2	2006	Kexin	C27	0.021265
2	2007	Kexin	C27	-0.04802
2	2008	Lufeng	C27	-0.06058
2	2009	Lufeng	C27	0.027213
2	2010	Xiao	C27	0.095465

The authors of the original paper mentions the following: "We use multilevel modeling (MLM), which has the advantage of explicitly accounting for the nested structure of the data. For the MLM analysis, we specified a four-level nested model: years, within CEOs, within firms, within industries. We used the Stata command xtmixed for the MLM analysis."

Before I wrote this post, I spent a couple of days searching and reading several resources. I got the general idea of the analysis and how it works (Stata's videos and blogs are very helpful). Yet I am not sure if the command I thought of does what the authors of the original paper described. My suggested code is included below:

Code:

xtset ID Year
mixed ROA control_variables || Industry: || Firm_ID: || CEO: || Year:, mle variance
estat ICC // to get the percentage of variance explained by Industry, Firm_ID, CEO, and Year.

Please let me know what you think. Any additional explanation about MLM or about coding is welcomed.

Thank you all.

Mostafa
(Stata 15.1 MP)

Last edited by Mostafa Harakeh; 09 Aug 2019, 13:17.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29944
#2

09 Aug 2019, 15:34

Well, the code you show will partition the variance in ROE, adjusted for the "control" variables among industry, firm, ceo and year. But it will not enable you to discern whether the variance attributable to CEO is changing over the years. I have done a fair amount of multi-level modeling, and, to be honest, I can't think of a way to do that using -mixed-. Making the variance component attributable to CEOs vary with year is, I think, beyond the capability of Stata's multi-level modeling commands. I may be overlooking something, and if I am, I hope somebody will correct me, because I would like to learn.

The best I would be able to suggest is that you run the -mixed- command on three separate subsets of the data, each subset representing one of the time intervals you mention. Then you can compare the variance components for CEO across those three models.

As an aside, you can simplify the code a bit. You can eliminate the -xtset- command: it is not needed for this analysis. You can omit the -mle- and -variance- options in your -mixed- command because, unless you are using an ancient version of Stata, these are the defaults for -mixed-, and have been since at least version 13. Remember, too, that firm, industry, and ceo must all be numeric variables. If you currently have them as strings, -encode- will probably be the best way for you to create corresponding numeric variables.

I'm not sure what the role of -estat icc- is here. It will give you the proportions of variance attributable to industry, firm, ceo and year. But a change in the proportion of variance at the CEO level could just represent changes at the other levels of the model. Remember the proportions of variance must sum to 1.0. So if firm variance goes up and nothing else changes, the proportion of CEO variance will go down, even though CEO variance itself is unchanged. Perhaps in terms of your research goals that is OK: I don't work in finance, nor in management/strategy, so I'm not sure what's really relevant here. I just raise this as a caution about the difficulties of interpreting changes in proportions--they don't directly reflect changes in the components themselves.
2 likes
Comment

Mostafa Harakeh

Join Date: Jul 2014
Posts: 91

10 Aug 2019, 08:20

Hello Clyde,

Thank you for you comprehensive and detailed reply, as usual.

I should have explicitly mentioned that I will be running this code for each time interval (or period) separately. Then, I will have to compare the proportions of the ROA variance explained by the CEO across time intervals. This is what you suggested and this is what the authors of the seminal paper have done. Please see below a part of their results:

Dependent Variable: ROA				Test for diff. in CEO effect (Fisher's Z)
Period:	1: (1950-1969)	2: (1970-1989)	3: (1990-2009)	1-2	2-3	1-3
Year (%)	3.1	1.7	1.8
Industry (%)	38.7	10.2	3.7
Company (%)	41.8	34.6	33.7
CEO (%)	4.1	10.9	16.2	***	***	***
Unexplained (%)	12.3	42.6	44.7

As shown in the table, the percentage of variation in ROA explained by the CEO increases from 4.1% in period (1), to 10.9% in period (2), and finally to 16.2% in period (3). These increases are statistically significant as indicated by the Fisher's Z statistic.

Kindly, I have a couple of additional questions about this case in specific and about MLM in general:

1) In one of the basic examples on MLM, the data-set includes math scores (math_score) earned by students within schools within states. Thus, it is a three-level model. Yet, the Stata code used in the example is:

Code:

mixed math_score || state: || school:

To my understanding, this code omits one level, which is the student (i.e., the lowest level). I thought that the code should look like:

Code:

mixed math_score || state: || school: || student:

Am I missing on something here? If so, shall I exclude my lowest level from the model (i.e., Year)?

2) How can I test the difference in two proportions of variance explained by the same level? Just like the example I included above where the authors test the difference in CEO effect using the Fisher's Z statistic.

3) Assuming that the -estat icc- command gives me four percentages referring to the effect of Year, CEO, Firm_ID, and Industry in explaining the variation in ROA, can I simply compute the percentage of the unexplained variation in ROA as: 100% - (Year% + CEO% + Firm_ID% + Industry%)?

Thank you very much indeed.

Mostafa
(Stata 15.1 MP)

Last edited by Mostafa Harakeh; 10 Aug 2019, 08:24.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29944
#4

10 Aug 2019, 10:57

1. Well, I am not familiar with the math scores data set you refer to in your first question, but my guess is that in that data set there is only one observation per student, so that the student "level" is in fact the observation level. In that case, you cannot add || student: to the command, because the "student" level and the residual level are the same. If in your data set there is, similarly, only one observation for any given year (within industry, firm, and ceo), then you would also omit the year level. But if you have multiple observations within a given year (within industry firm and ceo) then year would indeed be a true level in the hierarchy and || year: should be included.

2. I don't know. I have never seen this done before and have no recollection of ever learning how it might be done. The Fisher z-transform is often used to test Pearson correlation coefficients. Whether intra-class correlation coefficients have a similar distribution (i.e. they are normalized by the Fisher z-transform, with variance N-3) I do not know.

3. Yes.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29944
#5

10 Aug 2019, 12:12

Regarding your second question in #3, about comparing the proportions of variance, on reflection, I think I would do this by bootstrapping.
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#6

10 Aug 2019, 17:11

Hi Clyde,

Many thanks for your useful input.

Regarding the 1st point, I think I got the intuition behind keeping or omitting the lowest level. Actually, your guess that there is only one observation per student is true. So, to make sure that I got the concept correctly, if we trace the students over three years, then the lowest level will be years and thus the model will include
|| student: but not || year:. Right?

As for testing the difference in the proportions of variance, can you kindly provide me with a code that I can tailor to fit my data-set? What I know about the Bootstrapping approach is that is takes random subsamples for N times, each time a different subsample while allowing replacements, and then it estimates the standard errors based on all subsamples. However, I cannot think of how this can be done in my case.

I'm truly grateful for taking the time to explain thoroughly and even getting back in #5!

Mostafa
(Stata 15.1 MP)

Last edited by Mostafa Harakeh; 10 Aug 2019, 17:17.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29944
#7

10 Aug 2019, 17:26

Regarding the 1st point, I think I got the intuition behind keeping or omitting the lowest level. Actually, your guess that there is only one observation per student is true. So, to make sure that I got the concept correctly, if we trace the students over three years, then the lowest level will be years and thus the model will include
|| student: but not || year:. Right?

Right.

As for testing the difference in the proportions of variance, can you kindly provide me with a code that I can tailor to fit my data-set? What I know about the Bootstrapping approach is that is takes random subsamples for N times, each time a different subsample while allowing replacements, and then it estimates the standard errors based on all subsamples. However, I cannot think of how this can be done in my case.

This is not the kind of thing that can be done abstractly. I would need you to post back with an excerpt from your data set. The excerpt would need to contain at least 3 or 4 entities at each level above year, and the full range of years for those entities. Be sure to use the -dataex- command to create the excerpt. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment

Mostafa Harakeh

Join Date: Jul 2014
Posts: 91

12 Aug 2019, 02:13

Hi again Clyde,

I took some time to try a few things with the dataset before I post an example of it below.

Kindly note that I decided to omit the year level from the model. So, the code will look like (using the variables in the data example below):

Code:

mixed roe || industry: || firm_id: || ceo_id:

I have one more question, please: when I run the -estat icc- following the -mixed- command in one of the subsamples, I get the following output:

Level	ICC	Std. Err.	[95% Conf.	Interval]

industry	0.029846	0.012164	0.013324	0.065494
firm_id \| industry	0.21478	0.028782	0.163716	0.276506
ceo_id \| firm_id \| industry	0.499628	0.017997	0.464415	0.534844

My question is: does this mean that the ceo_id explains 49.96% of the variation in the dependent variable? OR it explains 0.499628-0.21478=28.4848% of the aforementioned variation? My question arises since I tried this analysis using different datasets and I had the impression that this ICC accumulates across levels.

Thank you very much for all the help you are providing.

Mostafa

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double roe long firm_id int year float ceo_id long industry
 .136144  2 2003  1 25
 .164964  2 2004  1 25
 .192591  2 2005  1 25
 .180437  2 2006  1 25
 .207015  2 2007  1 25
 .127577  2 2008  1 25
 .152683  2 2009  1 25
 .176802  2 2010  1 25
 .189507  2 2011  1 25
 .208875  2 2012  1 25
 .195093  2 2013  1 25
 .174285  2 2014  1 25
 .205782  2 2015  1 25
 .190279  2 2016  1 25
 .213626  2 2017  1 25
 -.16694  4 2005  2  9
 .028793  4 2006  2  9
-.097307  4 2007  2  9
-.125463  4 2008  3  9
 .051772  4 2009  3  9
  .18446  4 2010  4  9
 .001862  5 2004  6 28
-.232434  5 2005  6 28
 .080547  5 2006  6 28
 .106237  5 2007  5 28
 .013896  5 2008  5 28
-.067274  5 2009  5 28
-.006426  5 2010  5 28
 .017097  5 2011  5 28
 .032818  5 2012  5 28
-.064829  5 2013  5 28
 .066716  5 2014  5 28
-.057392  5 2015  5 28
 .091073  5 2016  5 28
 .016622  5 2017  5 28
-.247241  6 2003  8 25
 .044124  6 2004  8 25
 .092261  6 2005 11 25
 .175457  6 2006  9 25
 .152122  6 2007  9 25
 .086478  6 2008  9 25
 .173258  6 2009  9 25
 .209026  6 2010  9 25
 .159389  6 2011  9 25
 .195581  6 2012  9 25
  .18046  6 2013  7 25
 .121993  6 2014  7 25
 .100759  6 2015  7 25
 .167039  6 2016  7 25
 .151419  6 2017 10 25
-.471528  7 2004 12 25
-.471528  7 2004 12 25
 .101096  7 2005 12 25
-.471528  7 2006 12 25
 .064035  7 2011 13 25
-.471528  8 2003 14 15
 .353407  8 2004 14 15
-.079017  8 2005 14 15
-.025164  8 2006 14 15
 .023245  8 2007 14 15
 .010165  8 2008 14 15
 .004284  8 2009 14 15
 .012035  8 2010 14 15
-.007277  8 2011 14 15
 .074276  8 2012 14 15
 .036691  9 2004 15 31
 .070082  9 2005 15 31
  .08562  9 2006 15 31
 .150948  9 2007 15 31
 .111266  9 2008 15 31
 .149136  9 2009 15 31
  .13184  9 2010 15 31
 .097123  9 2011 15 31
 .069734  9 2012 15 31
 .091145  9 2013 15 31
 .089865  9 2014 15 31
 .160746  9 2015 15 31
 .057118  9 2016 15 31
 .035756  9 2017 15 31
 .115091 10 2003 19 20
 .123643 10 2009 16 20
 .045299 10 2010 18 20
 .093467 10 2011 18 20
 .037996 10 2012 17 20
 .136198 11 2005 20 25
-.075882 11 2006 20 25
-.047292 11 2007 20 25
 .017279 11 2008 21 25
 .101298 12 2003 25 11
 .158306 12 2004 25 11
 .146412 12 2005 25 11
 .146344 12 2006 25 11
 .156908 12 2007 25 11
 .109958 12 2008 25 11
  .17548 12 2009 23 11
 .258193 12 2010 23 11
 .190189 12 2011 25 11
 .050748 12 2012 25 11
 .206931 12 2013 22 11
 .100983 12 2014 22 11
 .067211 12 2015 22 11
 .101942 12 2016 24 11
 .098012 12 2017 26 11
 .120994 14 2003 27 25
 .099847 14 2004 27 25
 .096003 14 2005 27 25
 .078874 14 2006 29 25
 .004659 14 2007 29 25
 .108933 14 2008 29 25
 .181623 14 2009 29 25
 .148121 14 2010 29 25
 .111051 14 2011 29 25
 .053612 14 2012 29 25
 .054295 14 2013 28 25
 .090538 14 2014 28 25
 .071391 14 2015 28 25
 .037145 14 2016 28 25
 .013781 14 2017 28 25
  .04767 16 2004 32 17
 .005577 16 2005 32 17
  .03141 16 2006 32 17
 .057856 16 2007 32 17
 .071085 16 2008 32 17
   .0362 16 2009 33 17
 .024294 16 2010 33 17
 .005716 16 2011 33 17
 .010712 16 2012 33 17
 .014036 16 2013 31 17
 .014111 16 2014 31 17
 -.34609 16 2015 31 17
 .028832 16 2016 31 17
 .353407 16 2017 30 17
-.471528 17 2013 34 15
 .004936 18 2003 36 20
 .001276 18 2004 36 20
-.145624 18 2005 36 20
 .070263 18 2006 36 20
-.471528 18 2007 36 20
 -.17289 18 2008 35 20
 .085335 18 2009 35 20
 .016627 18 2010 35 20
-.134393 18 2011 35 20
-.002104 18 2012 35 20
 .064859 18 2013 35 20
 .017827 19 2005 37  4
  .13636 19 2006 39  4
 .135054 19 2007 39  4
  .04269 19 2008 38  4
-.026361 19 2009 38  4
 .024341 19 2010 38  4
 .015273 19 2011 40  4
 .078125 19 2012 40  4
 .044814 19 2013 40  4
 .014089 19 2014 40  4
-.044045 19 2015 40  4
 .089837 19 2016 40  4
-.060861 19 2017 40  4
-.174295 20 2004 43 17
 .028114 20 2005 42 17
-.084522 20 2006 41 17
 .095454 20 2007 44 17
 .030889 20 2008 44 17
 .087987 21 2004 46 17
 .113663 21 2005 46 17
 .114299 21 2006 46 17
 .204747 21 2007 46 17
 .091054 21 2008 46 17
 .068403 21 2009 46 17
  .08453 21 2010 45 17
 .050038 21 2011 45 17
 .001994 21 2012 45 17
 .034851 21 2013 45 17
 .032047 21 2014 45 17
   .0352 21 2015 45 17
 .042605 21 2016 45 17
 .102243 21 2017 45 17
 .269337 22 2003 51 22
 .353407 22 2004 50 22
  .32344 22 2005 50 22
 .310923 22 2006 50 22
 .319336 22 2007 50 22
 .291832 22 2008 50 22
  .17905 22 2009 50 22
  .22874 22 2010 50 22
 .168775 22 2011 48 22
 .143685 22 2012 49 22
 .139775 22 2013 49 22
 .110331 22 2014 49 22
 .129188 22 2015 49 22
 .123289 22 2016 49 22
 .109397 22 2017 47 22
-.094011 23 2005 55 11
 .127604 23 2006 55 11
 .049519 23 2007 52 11
 .072028 23 2008 52 11
  .07934 23 2009 52 11
 .041572 23 2010 53 11
 .027467 23 2011 53 11
 .013635 23 2012 54 11
  .02875 23 2013 54 11
end

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29944
#9

12 Aug 2019, 09:42

OK. For some reason I have not been able to get this to work using the -bootstrap- command. Probably I'm doing something wrong, but I don't have time to figure it out. So I have instead written a bootstrap routine directly into the code.

Code:

// FIX THE NEXT THREE LINES TO CORRECTLY DEFINE YOUR // THREE TIME PERIODS OF INTEREST gen byte era = 1 if year < 2007 replace era = 2 if inrange(year, 2008, 2012) replace era = 3 if year > 2012 & !missing(year) capture program drop icc_diff program define icc_diff, rclass summ forvalues e = 1/3 { mixed roe if era == `e' || industry: || firm_id: || ceo_id: estat icc scalar icc_era_`e' = r(icc2) return scalar icc_era_`e' = icc_era_`e' } return scalar diff_1_2 = icc_era_2 - icc_era_1 return scalar diff_2_3 = icc_era_3 - icc_era_2 return scalar diff_1_3 = icc_era_3 - icc_era_1 exit end icc_diff set seed 1234 // OR YOUR FAVORITE RANDOM NUMBER SEED tempfile results capture postutil clear postfile handle icc_era_1 icc_era_2 icc_era_3 diff_1_2 diff_2_3 diff_1_3 /// using `results' local nreps 3 // PICK A REASONABLE NUMBER OF REPS, NOT 3 forvalues i = 1/`nreps' { preserve bsample, cluster(industry firm_id ceo_id) idcluster(new_id) quietly icc_diff post handle (`r(icc_era_1)') (`r(icc_era_2)') (`r(icc_era_3)') /// (`r(diff_1_2)') (`r(diff_2_3)') (`r(diff_1_3)') restore } postclose handle use `results', clear ci means _all

You will have to mark this up in certain ways.

1. The code near the top that generates the variable era needs to be modified to reflect your actual 3 time periods of interest: the example data you gave did not extend over those years, so I had to use years that fell within the range of the example.

2. You can use 1234 as the random number seed if you like, or you can use any other non-negative integer you like.

3. Change the -local nreps 3- command to a sufficiently large number. I would suggest something on the order of 1,000, but it really depends on how much precision you need in your results. Evidently the larger the number of reps you take, the longer it will run. Note that I have put the -quietly- prefix in front of the icc_diff command in the loop to avoid having an excessive amount of output accumulating on your screen. But the downside is you will see no output at all and may wonder if something is going wrong. You can remove the quietly if you like. Or you can add a command to display a message every time through the loop (or every 10th, or whatever.)

Added: The output of -estat icc- is not cumulative. In the example you show, ceo_id is explaining 49.96% of the variance.

Last edited by Clyde Schechter; 12 Aug 2019, 10:07.
Comment
Mostafa Harakeh

Join Date: Jul 2014

Posts: 91
#10

12 Aug 2019, 10:14

Hi Clyde,

I am sorry for any inconvenience I might have caused as you had to manually write the bootstrap routine into the code.

I will take some time to understand and be able to use the code. I will get back to this post later on for sure.

I cannot thank you enough!

Mostafa
(Stata 15.1 MP)
Comment

Announcement

Multilevel Modelling - Variance Partitioning

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment