Fixed effects and sample data structure (cross sectional or panel)

Marc Pelow

Join Date: Jul 2021
Posts: 85

Fixed effects and sample data structure (cross sectional or panel)

24 Feb 2022, 07:24

Hi,

I'm not sure what kind of dataset I'm looking at here and would appreciate some help - even if the question is only partly related to Stata.
I have read some papers that are based on a data structure that seems to be very similar to mine. For example, in a paper I am currently reading (Keck, Tang 2021, Working Paper) the data is described as follows:

Our sample consists of acquisitions carried out by publicly-traded American companies that were completed between 2003 and 2013. We obtained information on 1243 acquisitions [...]. In particular, we considered only non-international acquisitions with a value of more than $100 million, in which before the acquisition the acquiring firm controlled less than 50% of the shares of the target firm and after the acquisition the acquiring firm ended up with 100% of the shares of the acquired firm. We matched these observations with data from Execucomp on compensation and personal characteristics of CEOs and CFOs, and with data from Compustat on firm financial information.

The dataset I have is collected in the same way: Based on selected criteria, data on company acquisitions announced between 1996 and 2018 were collected.

The sampling criteria are:

Time of deal announcement between 1996 and 2018
Headquarters of the acquiring and acquired company: U.S.
Acquiring company has to be listed while private company may be listed or private.
Price of the acquisition: >= USD 1 million.
Certain subtypes of corporate acquisitions are excluded (Leveraged Buyouts, Share repurchases etc.)

Variables collected include:

Time of announcement of the acquisition
Name of the CEO that was in place while the Deal was announced
Name and other information on the acquiring and acquired company
Characteristics of the acquisition such as purchase price, method of payment, etc.

Is this type of data considered to be cross sectional or panel data?

I have always thought that this is cross-sectional data. I have read several papers with similar data structures where year and industry fixed effects were taken into account, which made me wonder, as I thought these could only be taken into account for panel data.

Here is an excerpt of my sample:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long Acq_ID str7 CFO_ID byte(CFO_Age CFO_Gender) float CFO_PaySlice byte CFO_No_Boardsitze int CFO_Tenure byte CFO_No_Deals int(CFO_Start CFO_End) str52 Acq_Name int Deal_Announced double Deal_Value float(Acq_Leverage Acq_TobinsQ)
 71 "207071"  66 1  .1676225 0 1680 0 14245 16952 "First Commonwealth Financial Corp"                15925   30784000    .91129 1.0611492
 71 "207071"  66 1  .1676225 0 1806 0 14245 16952 "First Commonwealth Financial Corp"                16051  186044000    .91129 1.0611492
 71 "621554"  59 1 .26270258 0  378 0 19841 22664 "First Commonwealth Financial Corp"                20219   14750000  .8874036 1.0203671
 71 "621554"  59 1 .24425076 0 1353 0 19841 22664 "First Commonwealth Financial Corp"                21194   58336000  .8784809 1.0694315
 71 "621554"  59 1  .2607087 1  889 0 19841 22664 "First Commonwealth Financial Corp"                20730  1.070e+08  .8904282 1.0132986
 71 "207071"  66 1 .19224176 0 2673 0 14245 16952 "First Commonwealth Financial Corp"                16918   56251000  .9135385 1.0645406
 73 "337555"  71 1 .22188364 1 3171 0 15950 19821 "PacWest Bancorp"                                  19121  234076000  .9011976  1.028899
 73 "456194"  58 1  .1902949 1  605 2 20310 22281 "PacWest Bancorp"                                  20915  7.186e+08  .7951942 1.0971042
 73 "337555"  71 1 .20212823 1 1393 0 15950 19821 "First Community Bancorp Inc,San Diego,California" 17343   35000000  .7896164 1.0685618
 73 "337555"  71 1 .25601473 1  328 0 19821 20301 "PacWest Bancorp"                                  20149  814529000   .784018 1.0724958
 73 "337555"  71 1 .24129276 1 3611 0 15950 19821 "PacWest Bancorp"                                  19561 2281831000  .8921746 1.0618262
112 "1139697" 42 1 .13385691 0  967 0 20058 21275 "First Financial Bancorp,Cincinnati,Ohio"          21025  988464000  .8974606 1.1064364
112 "512084"  65 1 .16962637 0  122 0 19590 20058 "First Financial Bancorp,Cincinnati,Ohio"          19712   36600000  .8906542 1.0212723
112 "512084"  65 1 .17966245 0  252 0 19590 20058 "First Financial Bancorp,Cincinnati,Ohio"          19842   13500000  .8936983 1.0499475
113 "207147"  66 1  .2163688 0 3614 0 15795 22050 "First Financial Bankshares Inc"                   19409   62665000  .8762857 1.1468942
113 "207147"  66 1   .226366 0 5309 0 15795 22050 "First Financial Bankshares Inc"                   21104   59400000  .8769613  1.312286
113 "207147"  66 1 .25062662 0 4384 0 15795 22050 "First Financial Bankshares Inc"                   20179   48519000  .8834621  1.208207
113 "207147"  66 1   .323378 0 2719 0 15795 22050 "First Financial Bankshares Inc"                   18514   22200000  .8732406 1.2149463
124 "206946"  65 1 .12749545 0 1536 0 13939 15706 "First Health Group Corp"                          15475   40000000 .56541026   3.73492
151 "499279"  68 1  .1902036 0 2610 0 17302 20824 "First Midwest Bancorp Inc,Chicago,Illinois"       19912   59992000  .8786632 1.0381118
151 "1345489" 58 1   .228476 0  701 0 20824 22664 "First Midwest Bancorp Inc,Chicago,Illinois"       21525  1.450e+08  .8675238 1.0427192
151 "499279"  68 1  .1901028 0 3332 0 17302 20824 "First Midwest Bancorp Inc,Chicago,Illinois"       20634  339327000  .8822248 1.0298363
151 "206069"  61 1  .1988393 0 1229 0 15553 17302 "First Midwest Bancorp Inc,Chicago,Illinois"       16782  3.070e+08  .9224817 1.1660495
151 "1345489" 58 1   .228476 0  518 0 20824 22664 "First Midwest Bancorp Inc,Chicago,Illinois"       21342   90844000  .8675238 1.0427192
151 "499279"  68 1 .19584246 0 3102 0 17302 20824 "First Midwest Bancorp Inc,Chicago,Illinois"       20404  105511000  .8834559 1.0242015
151 "206069"  61 1 .18505888 0  406 0 15553 17302 "First Midwest Bancorp Inc,Chicago,Illinois"       15959  1.294e+08  .9177409 1.1285703
152 "1338321" 59 1 .18265976 0   70 0 20698 21815 "Meta Financial Group Inc"                         20768   51158000  .8927404 1.0275264
152 "1338321" 59 1 .09827385 0  495 0 20698 21815 "Meta Financial Group Inc"                         21193  302051000  .9168959 1.0611949
152 "1338321" 59 1  .1415998 0  653 0 19632 20698 "Meta Financial Group Inc"                         20285   54098000  .9148981 1.0208138
190 "511742"  59 1  .1957591 0  949 0 17156 18721 "First Niagara Financial Group Inc"                18105  239874000  .8148972 1.0149858
end
format %tddd/nn/CCYY CFO_Start
format %tddd/nn/CCYY CFO_End
format %tdDD/NN/CCYY Deal_Announced
label values CFO_Gender Gender
label def Gender 1 "M", modify

Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17851

24 Feb 2022, 07:48

Marc:
assuming that -CFO_ID is your -panelvar-:

Code:

. encode CFO_ID, g(CFO_ID_num)

. xtset CFO_ID Deal_Announced
string variables not allowed in varlist;
CFO_ID is a string variable
r(109);

. xtset CFO_ID_num Deal_Announced

Panel variable: CFO_ID_num (unbalanced)
 Time variable: Deal_Announced, 15/05/2002 to 07/12/2018, but with gaps
         Delta: 1 day

.

I would say that this is a panel dataset.

Kind regards,
Carlo
(Stata 19.0)

Comment

Marc Pelow

Join Date: Jul 2021

Posts: 85
#3

24 Feb 2022, 09:03

Thank you very much for your answer.
I came across a (non scientifical) blog entry that writes something about identifying the data category and states:

If the identifier is a time data field then the data set belongs to time series. Whereas if the data records can be uniquely identified with time data filed and along with an identifier that is non-time related like employee id, student id, airline code, firm code, country code, etc then the data set is panel data. If the data records can be uniquely identified with the non-time identifier, then the data set is cross-sectional data.

Source: https://medium.com/geekculture/detai...a-fd973fa788ae

Each entry in my dataset represents one deal made by an US acquiror between 1996 and 2018 and includes unique identifiers for the Acquiring company (Acq_ID), the Acquiror CFO in place while the deal was announced (CFO_ID) as well as an ID that identifies the Deal itself with all it's characteristics (Deal Value, Announcement Date, Target etc.), namely the Deal_No. I have missed to include the latter in my data example, sorry for that. The last sentence states of the blog entry states that "If the data records can be uniquely identified with the non-time identifier, then the data set is cross-sectional data." As I am able to identify each entry in my dataset by the Deal_No my dataset should be cross sectional, according to the blog entry. Would you agree with that?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#4

24 Feb 2022, 09:30

Marc:
to make things (hopefully) simpler:
1) in a cross-sectional dataset, we have many ids with one wave of data only (say the back-hand strokes hit by all tennis players who won Wimpledon men single from 1930-2021;
2) in a time series dataset, we have one id only with many waves of data (say, the back-hand strokes hit by Roger Federer from his first year as a tennis pro on);
3) in a panel dataset we have at least two ids and at least two waves of data (say, the back-hand strokes hit by Roger Feder and Andy Murray from 2010-2019).

Kind regards,
Carlo
(Stata 19.0)
Comment
Marc Pelow

Join Date: Jul 2021

Posts: 85
#5

24 Feb 2022, 09:54

Thanks Carlo.
But wouldn't definition #1, so cross-sectional dataset apply in may case as I have many ids (Deal_No) an one wave of data (from 1994-2018)?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#6

24 Feb 2022, 11:58

Marc:
technically speaking, a dataset with many ids (or -Deal_no-) and one wave of data is labelled cross-sectional, that, in your case, I would consider one of the year you mentioned (say, all the -Deal_no- closed during 1994).
For the sake of precision, the definition of a panel dataset implies that the same sample of -ids- (more or less, as attrition may play a role in that) is measured on the very same variables across a given time horizon (Roger Federer and Andy Murray, in my previous example)..
If your -ids- change across the years, you might have a repeated cross-sectional dataset.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marc Pelow

Join Date: Jul 2021

Posts: 85
#7

24 Feb 2022, 14:37

That makes sense to me, thanks.
The ids change in the sense that there isn’t a single company that announces a deal every single year. There are in fact many company’s that announce multiple deals in the 1995-2018 period. Therefore I thought it is an unbalanced panel rather than cross sectional Data.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#8

25 Feb 2022, 00:26

Marc:
yes, I think so too.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Marc Pelow

Join Date: Jul 2021

Posts: 85
#9

25 Feb 2022, 00:41

Thank you very much for helping me out Carlo.
Comment
Marc Pelow

Join Date: Jul 2021

Posts: 85
#10

25 Feb 2022, 05:27

Hi Carlo,

there is another thing that came to my mind and I would really appreciate to know if I got this right. You wrote:

the definition of a panel dataset implies that the same sample of -ids- (more or less, as attrition may play a role in that) is measured on the very same variables across a given time horizon.

Each row/entry in my dataset describes a single deal.
As mentioned in my original post, I have an ID that identifies each Deal in my sample, as well as one that identifies the Acquiror and the CFO in place at the time the deal was announced.

I am not quite sure how to define

Code:

xtset id time

Setting the Deal_No as id doesn't make sense to me as it would contradict with the definition that every id is measured on the same variable: The Deal_No uniquely identifies every deal (and its characteristics like value, announcement date etc.). So this Id is not measured on the same variables across a given time horizon.

What I am left with is the Acquiror ID that identifies the company that purchases a target and the CFO ID that identifies the CFO in place on the acquiror side. Setting

Code:

xtset Acq_ID Year_of_Deal_Announcement

results in the error "repeated time values within panel" which makes absolutely sense as I have a number of Company's that have more than one deal per year. Keeping only the first Deal of a company in a given year is possible but would reduce the sample size substantially. The same applies for setting

Code:

xtset CFO_ID Year_of_Deal_Announcement

as there are many CFOs that manage more than one deal per year.

Setting the specific day of the announcement as the time component rather than the year would result in a very unbalanced panel if I am not mistaken.

I would be very interested in hearing whether I understood everything correctly.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#11

25 Feb 2022, 08:32

Marc:
this issue creeps up frequently when dealing with panel datasets.
The easiest work-around is to -xtset- your dataset with -panelid- only (I would go with -Acquiror_ID-):

Code:

xtsetAcquiror_ID

Unfortunately, this fix comes at the cost of making time-series operators unavailable (as they need a -timevar- dimension to work).

Kind regards,
Carlo
(Stata 19.0)
Comment
Marc Pelow

Join Date: Jul 2021

Posts: 85
#12

28 Feb 2022, 07:25

Thanks Carlo. Many papers on my research topic include industry and time fixed effects. If I would like to include time fixed effects as well, I suppose I need to specify a -timevar- dimension, right?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17851

#13

28 Feb 2022, 07:32

Marc:
not necessarily so, if you do not plan to use time-series operators:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age##c.age i.year, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1162                                         min =          1
     Between = 0.1078                                         avg =        6.1
     Overall = 0.0932                                         max =         15

                                                F(16,23784)       =     195.45
corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0728746   .0107894     6.75   0.000     .0517267    .0940224
             |
 c.age#c.age |  -.0010113    .000061   -16.57   0.000    -.0011309   -.0008917
             |
        year |
         69  |   .0647054   .0158222     4.09   0.000     .0336928     .095718
         70  |   .0284423   .0234621     1.21   0.225     -.017545    .0744295
         71  |   .0579959   .0326524     1.78   0.076    -.0060048    .1219967
         72  |   .0510671   .0422995     1.21   0.227    -.0318426    .1339769
         73  |   .0424104    .052118     0.81   0.416    -.0597442    .1445651
         75  |   .0151376   .0717194     0.21   0.833    -.1254371    .1557123
         77  |   .0340933   .0918106     0.37   0.710    -.1458613    .2140478
         78  |   .0537334   .1023339     0.53   0.600    -.1468475    .2543143
         80  |   .0369475   .1221806     0.30   0.762    -.2025343    .2764293
         82  |   .0391687   .1423573     0.28   0.783    -.2398606     .318198
         83  |    .058766   .1523743     0.39   0.700    -.2398974    .3574294
         85  |   .1042758   .1726431     0.60   0.546    -.2341157    .4426673
         87  |   .1242272   .1930108     0.64   0.520    -.2540863    .5025406
         88  |   .1904977   .2068016     0.92   0.357    -.2148466     .595842
             |
       _cons |   .3937532   .2001741     1.97   0.049     .0013992    .7861072
-------------+----------------------------------------------------------------
     sigma_u |  .40275174
     sigma_e |  .30127563
         rho |  .64120306   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23784) = 8.75                 Prob > F = 0.0000

. xtset idcode

Panel variable: idcode (unbalanced)

. xtreg ln_wage c.age##c.age i.year, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1162                                         min =          1
     Between = 0.1078                                         avg =        6.1
     Overall = 0.0932                                         max =         15

                                                F(16,23784)       =     195.45
corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0728746   .0107894     6.75   0.000     .0517267    .0940224
             |
 c.age#c.age |  -.0010113    .000061   -16.57   0.000    -.0011309   -.0008917
             |
        year |
         69  |   .0647054   .0158222     4.09   0.000     .0336928     .095718
         70  |   .0284423   .0234621     1.21   0.225     -.017545    .0744295
         71  |   .0579959   .0326524     1.78   0.076    -.0060048    .1219967
         72  |   .0510671   .0422995     1.21   0.227    -.0318426    .1339769
         73  |   .0424104    .052118     0.81   0.416    -.0597442    .1445651
         75  |   .0151376   .0717194     0.21   0.833    -.1254371    .1557123
         77  |   .0340933   .0918106     0.37   0.710    -.1458613    .2140478
         78  |   .0537334   .1023339     0.53   0.600    -.1468475    .2543143
         80  |   .0369475   .1221806     0.30   0.762    -.2025343    .2764293
         82  |   .0391687   .1423573     0.28   0.783    -.2398606     .318198
         83  |    .058766   .1523743     0.39   0.700    -.2398974    .3574294
         85  |   .1042758   .1726431     0.60   0.546    -.2341157    .4426673
         87  |   .1242272   .1930108     0.64   0.520    -.2540863    .5025406
         88  |   .1904977   .2068016     0.92   0.357    -.2148466     .595842
             |
       _cons |   .3937532   .2001741     1.97   0.049     .0013992    .7861072
-------------+----------------------------------------------------------------
     sigma_u |  .40275174
     sigma_e |  .30127563
         rho |  .64120306   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23784) = 8.75                 Prob > F = 0.0000

.

Kind regards,
Carlo
(Stata 19.0)

Announcement

Fixed effects and sample data structure (cross sectional or panel)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment