Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effects and sample data structure (cross sectional or panel)

    Hi,

    I'm not sure what kind of dataset I'm looking at here and would appreciate some help - even if the question is only partly related to Stata.
    I have read some papers that are based on a data structure that seems to be very similar to mine. For example, in a paper I am currently reading (Keck, Tang 2021, Working Paper) the data is described as follows:
    Our sample consists of acquisitions carried out by publicly-traded American companies that were completed between 2003 and 2013. We obtained information on 1243 acquisitions [...]. In particular, we considered only non-international acquisitions with a value of more than $100 million, in which before the acquisition the acquiring firm controlled less than 50% of the shares of the target firm and after the acquisition the acquiring firm ended up with 100% of the shares of the acquired firm. We matched these observations with data from Execucomp on compensation and personal characteristics of CEOs and CFOs, and with data from Compustat on firm financial information.
    The dataset I have is collected in the same way: Based on selected criteria, data on company acquisitions announced between 1996 and 2018 were collected.

    The sampling criteria are:
    • Time of deal announcement between 1996 and 2018
    • Headquarters of the acquiring and acquired company: U.S.
    • Acquiring company has to be listed while private company may be listed or private.
    • Price of the acquisition: >= USD 1 million.
    • Certain subtypes of corporate acquisitions are excluded (Leveraged Buyouts, Share repurchases etc.)
    Variables collected include:
    • Time of announcement of the acquisition
    • Name of the CEO that was in place while the Deal was announced
    • Name and other information on the acquiring and acquired company
    • Characteristics of the acquisition such as purchase price, method of payment, etc.
    Is this type of data considered to be cross sectional or panel data?

    I have always thought that this is cross-sectional data. I have read several papers with similar data structures where year and industry fixed effects were taken into account, which made me wonder, as I thought these could only be taken into account for panel data.

    Here is an excerpt of my sample:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long Acq_ID str7 CFO_ID byte(CFO_Age CFO_Gender) float CFO_PaySlice byte CFO_No_Boardsitze int CFO_Tenure byte CFO_No_Deals int(CFO_Start CFO_End) str52 Acq_Name int Deal_Announced double Deal_Value float(Acq_Leverage Acq_TobinsQ)
     71 "207071"  66 1  .1676225 0 1680 0 14245 16952 "First Commonwealth Financial Corp"                15925   30784000    .91129 1.0611492
     71 "207071"  66 1  .1676225 0 1806 0 14245 16952 "First Commonwealth Financial Corp"                16051  186044000    .91129 1.0611492
     71 "621554"  59 1 .26270258 0  378 0 19841 22664 "First Commonwealth Financial Corp"                20219   14750000  .8874036 1.0203671
     71 "621554"  59 1 .24425076 0 1353 0 19841 22664 "First Commonwealth Financial Corp"                21194   58336000  .8784809 1.0694315
     71 "621554"  59 1  .2607087 1  889 0 19841 22664 "First Commonwealth Financial Corp"                20730  1.070e+08  .8904282 1.0132986
     71 "207071"  66 1 .19224176 0 2673 0 14245 16952 "First Commonwealth Financial Corp"                16918   56251000  .9135385 1.0645406
     73 "337555"  71 1 .22188364 1 3171 0 15950 19821 "PacWest Bancorp"                                  19121  234076000  .9011976  1.028899
     73 "456194"  58 1  .1902949 1  605 2 20310 22281 "PacWest Bancorp"                                  20915  7.186e+08  .7951942 1.0971042
     73 "337555"  71 1 .20212823 1 1393 0 15950 19821 "First Community Bancorp Inc,San Diego,California" 17343   35000000  .7896164 1.0685618
     73 "337555"  71 1 .25601473 1  328 0 19821 20301 "PacWest Bancorp"                                  20149  814529000   .784018 1.0724958
     73 "337555"  71 1 .24129276 1 3611 0 15950 19821 "PacWest Bancorp"                                  19561 2281831000  .8921746 1.0618262
    112 "1139697" 42 1 .13385691 0  967 0 20058 21275 "First Financial Bancorp,Cincinnati,Ohio"          21025  988464000  .8974606 1.1064364
    112 "512084"  65 1 .16962637 0  122 0 19590 20058 "First Financial Bancorp,Cincinnati,Ohio"          19712   36600000  .8906542 1.0212723
    112 "512084"  65 1 .17966245 0  252 0 19590 20058 "First Financial Bancorp,Cincinnati,Ohio"          19842   13500000  .8936983 1.0499475
    113 "207147"  66 1  .2163688 0 3614 0 15795 22050 "First Financial Bankshares Inc"                   19409   62665000  .8762857 1.1468942
    113 "207147"  66 1   .226366 0 5309 0 15795 22050 "First Financial Bankshares Inc"                   21104   59400000  .8769613  1.312286
    113 "207147"  66 1 .25062662 0 4384 0 15795 22050 "First Financial Bankshares Inc"                   20179   48519000  .8834621  1.208207
    113 "207147"  66 1   .323378 0 2719 0 15795 22050 "First Financial Bankshares Inc"                   18514   22200000  .8732406 1.2149463
    124 "206946"  65 1 .12749545 0 1536 0 13939 15706 "First Health Group Corp"                          15475   40000000 .56541026   3.73492
    151 "499279"  68 1  .1902036 0 2610 0 17302 20824 "First Midwest Bancorp Inc,Chicago,Illinois"       19912   59992000  .8786632 1.0381118
    151 "1345489" 58 1   .228476 0  701 0 20824 22664 "First Midwest Bancorp Inc,Chicago,Illinois"       21525  1.450e+08  .8675238 1.0427192
    151 "499279"  68 1  .1901028 0 3332 0 17302 20824 "First Midwest Bancorp Inc,Chicago,Illinois"       20634  339327000  .8822248 1.0298363
    151 "206069"  61 1  .1988393 0 1229 0 15553 17302 "First Midwest Bancorp Inc,Chicago,Illinois"       16782  3.070e+08  .9224817 1.1660495
    151 "1345489" 58 1   .228476 0  518 0 20824 22664 "First Midwest Bancorp Inc,Chicago,Illinois"       21342   90844000  .8675238 1.0427192
    151 "499279"  68 1 .19584246 0 3102 0 17302 20824 "First Midwest Bancorp Inc,Chicago,Illinois"       20404  105511000  .8834559 1.0242015
    151 "206069"  61 1 .18505888 0  406 0 15553 17302 "First Midwest Bancorp Inc,Chicago,Illinois"       15959  1.294e+08  .9177409 1.1285703
    152 "1338321" 59 1 .18265976 0   70 0 20698 21815 "Meta Financial Group Inc"                         20768   51158000  .8927404 1.0275264
    152 "1338321" 59 1 .09827385 0  495 0 20698 21815 "Meta Financial Group Inc"                         21193  302051000  .9168959 1.0611949
    152 "1338321" 59 1  .1415998 0  653 0 19632 20698 "Meta Financial Group Inc"                         20285   54098000  .9148981 1.0208138
    190 "511742"  59 1  .1957591 0  949 0 17156 18721 "First Niagara Financial Group Inc"                18105  239874000  .8148972 1.0149858
    end
    format %tddd/nn/CCYY CFO_Start
    format %tddd/nn/CCYY CFO_End
    format %tdDD/NN/CCYY Deal_Announced
    label values CFO_Gender Gender
    label def Gender 1 "M", modify

  • #2
    Marc:
    assuming that -CFO_ID is your -panelvar-:
    Code:
    . encode CFO_ID, g(CFO_ID_num)
    
    . xtset CFO_ID Deal_Announced
    string variables not allowed in varlist;
    CFO_ID is a string variable
    r(109);
    
    . xtset CFO_ID_num Deal_Announced
    
    Panel variable: CFO_ID_num (unbalanced)
     Time variable: Deal_Announced, 15/05/2002 to 07/12/2018, but with gaps
             Delta: 1 day
    
    .
    I would say that this is a panel dataset.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Thank you very much for your answer.
      I came across a (non scientifical) blog entry that writes something about identifying the data category and states:
      If the identifier is a time data field then the data set belongs to time series. Whereas if the data records can be uniquely identified with time data filed and along with an identifier that is non-time related like employee id, student id, airline code, firm code, country code, etc then the data set is panel data. If the data records can be uniquely identified with the non-time identifier, then the data set is cross-sectional data.
      Source: https://medium.com/geekculture/detai...a-fd973fa788ae

      Each entry in my dataset represents one deal made by an US acquiror between 1996 and 2018 and includes unique identifiers for the Acquiring company (Acq_ID), the Acquiror CFO in place while the deal was announced (CFO_ID) as well as an ID that identifies the Deal itself with all it's characteristics (Deal Value, Announcement Date, Target etc.), namely the Deal_No. I have missed to include the latter in my data example, sorry for that. The last sentence states of the blog entry states that "If the data records can be uniquely identified with the non-time identifier, then the data set is cross-sectional data." As I am able to identify each entry in my dataset by the Deal_No my dataset should be cross sectional, according to the blog entry. Would you agree with that?

      Comment


      • #4
        Marc:
        to make things (hopefully) simpler:
        1) in a cross-sectional dataset, we have many ids with one wave of data only (say the back-hand strokes hit by all tennis players who won Wimpledon men single from 1930-2021;
        2) in a time series dataset, we have one id only with many waves of data (say, the back-hand strokes hit by Roger Federer from his first year as a tennis pro on);
        3) in a panel dataset we have at least two ids and at least two waves of data (say, the back-hand strokes hit by Roger Feder and Andy Murray from 2010-2019).
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Thanks Carlo.
          But wouldn't definition #1, so cross-sectional dataset apply in may case as I have many ids (Deal_No) an one wave of data (from 1994-2018)?

          Comment


          • #6
            Marc:
            technically speaking, a dataset with many ids (or -Deal_no-) and one wave of data is labelled cross-sectional, that, in your case, I would consider one of the year you mentioned (say, all the -Deal_no- closed during 1994).
            For the sake of precision, the definition of a panel dataset implies that the same sample of -ids- (more or less, as attrition may play a role in that) is measured on the very same variables across a given time horizon (Roger Federer and Andy Murray, in my previous example)..
            If your -ids- change across the years, you might have a repeated cross-sectional dataset.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              That makes sense to me, thanks.
              The ids change in the sense that there isn’t a single company that announces a deal every single year. There are in fact many company’s that announce multiple deals in the 1995-2018 period. Therefore I thought it is an unbalanced panel rather than cross sectional Data.

              Comment


              • #8
                Marc:
                yes, I think so too.
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  Thank you very much for helping me out Carlo.

                  Comment


                  • #10
                    Hi Carlo,

                    there is another thing that came to my mind and I would really appreciate to know if I got this right. You wrote:
                    the definition of a panel dataset implies that the same sample of -ids- (more or less, as attrition may play a role in that) is measured on the very same variables across a given time horizon.
                    Each row/entry in my dataset describes a single deal.
                    As mentioned in my original post, I have an ID that identifies each Deal in my sample, as well as one that identifies the Acquiror and the CFO in place at the time the deal was announced.

                    I am not quite sure how to define
                    Code:
                    xtset id time
                    Setting the Deal_No as id doesn't make sense to me as it would contradict with the definition that every id is measured on the same variable: The Deal_No uniquely identifies every deal (and its characteristics like value, announcement date etc.). So this Id is not measured on the same variables across a given time horizon.

                    What I am left with is the Acquiror ID that identifies the company that purchases a target and the CFO ID that identifies the CFO in place on the acquiror side. Setting
                    Code:
                    xtset Acq_ID Year_of_Deal_Announcement
                    results in the error "repeated time values within panel" which makes absolutely sense as I have a number of Company's that have more than one deal per year. Keeping only the first Deal of a company in a given year is possible but would reduce the sample size substantially. The same applies for setting
                    Code:
                    xtset CFO_ID Year_of_Deal_Announcement
                    as there are many CFOs that manage more than one deal per year.

                    Setting the specific day of the announcement as the time component rather than the year would result in a very unbalanced panel if I am not mistaken.

                    I would be very interested in hearing whether I understood everything correctly.

                    Comment


                    • #11
                      Marc:
                      this issue creeps up frequently when dealing with panel datasets.
                      The easiest work-around is to -xtset- your dataset with -panelid- only (I would go with -Acquiror_ID-):
                      Code:
                      xtsetAcquiror_ID
                      Unfortunately, this fix comes at the cost of making time-series operators unavailable (as they need a -timevar- dimension to work).
                      Kind regards,
                      Carlo
                      (StataNow 18.5)

                      Comment


                      • #12
                        Thanks Carlo. Many papers on my research topic include industry and time fixed effects. If I would like to include time fixed effects as well, I suppose I need to specify a -timevar- dimension, right?

                        Comment


                        • #13
                          Marc:
                          not necessarily so, if you do not plan to use time-series operators:
                          Code:
                          . use "https://www.stata-press.com/data/r17/nlswork.dta"
                          (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                          
                          . xtreg ln_wage c.age##c.age i.year, fe
                          
                          Fixed-effects (within) regression               Number of obs     =     28,510
                          Group variable: idcode                          Number of groups  =      4,710
                          
                          R-squared:                                      Obs per group:
                               Within  = 0.1162                                         min =          1
                               Between = 0.1078                                         avg =        6.1
                               Overall = 0.0932                                         max =         15
                          
                                                                          F(16,23784)       =     195.45
                          corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000
                          
                          ------------------------------------------------------------------------------
                               ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                   age |   .0728746   .0107894     6.75   0.000     .0517267    .0940224
                                       |
                           c.age#c.age |  -.0010113    .000061   -16.57   0.000    -.0011309   -.0008917
                                       |
                                  year |
                                   69  |   .0647054   .0158222     4.09   0.000     .0336928     .095718
                                   70  |   .0284423   .0234621     1.21   0.225     -.017545    .0744295
                                   71  |   .0579959   .0326524     1.78   0.076    -.0060048    .1219967
                                   72  |   .0510671   .0422995     1.21   0.227    -.0318426    .1339769
                                   73  |   .0424104    .052118     0.81   0.416    -.0597442    .1445651
                                   75  |   .0151376   .0717194     0.21   0.833    -.1254371    .1557123
                                   77  |   .0340933   .0918106     0.37   0.710    -.1458613    .2140478
                                   78  |   .0537334   .1023339     0.53   0.600    -.1468475    .2543143
                                   80  |   .0369475   .1221806     0.30   0.762    -.2025343    .2764293
                                   82  |   .0391687   .1423573     0.28   0.783    -.2398606     .318198
                                   83  |    .058766   .1523743     0.39   0.700    -.2398974    .3574294
                                   85  |   .1042758   .1726431     0.60   0.546    -.2341157    .4426673
                                   87  |   .1242272   .1930108     0.64   0.520    -.2540863    .5025406
                                   88  |   .1904977   .2068016     0.92   0.357    -.2148466     .595842
                                       |
                                 _cons |   .3937532   .2001741     1.97   0.049     .0013992    .7861072
                          -------------+----------------------------------------------------------------
                               sigma_u |  .40275174
                               sigma_e |  .30127563
                                   rho |  .64120306   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          F test that all u_i=0: F(4709, 23784) = 8.75                 Prob > F = 0.0000
                          
                          . xtset idcode
                          
                          Panel variable: idcode (unbalanced)
                          
                          . xtreg ln_wage c.age##c.age i.year, fe
                          
                          Fixed-effects (within) regression               Number of obs     =     28,510
                          Group variable: idcode                          Number of groups  =      4,710
                          
                          R-squared:                                      Obs per group:
                               Within  = 0.1162                                         min =          1
                               Between = 0.1078                                         avg =        6.1
                               Overall = 0.0932                                         max =         15
                          
                                                                          F(16,23784)       =     195.45
                          corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000
                          
                          ------------------------------------------------------------------------------
                               ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                   age |   .0728746   .0107894     6.75   0.000     .0517267    .0940224
                                       |
                           c.age#c.age |  -.0010113    .000061   -16.57   0.000    -.0011309   -.0008917
                                       |
                                  year |
                                   69  |   .0647054   .0158222     4.09   0.000     .0336928     .095718
                                   70  |   .0284423   .0234621     1.21   0.225     -.017545    .0744295
                                   71  |   .0579959   .0326524     1.78   0.076    -.0060048    .1219967
                                   72  |   .0510671   .0422995     1.21   0.227    -.0318426    .1339769
                                   73  |   .0424104    .052118     0.81   0.416    -.0597442    .1445651
                                   75  |   .0151376   .0717194     0.21   0.833    -.1254371    .1557123
                                   77  |   .0340933   .0918106     0.37   0.710    -.1458613    .2140478
                                   78  |   .0537334   .1023339     0.53   0.600    -.1468475    .2543143
                                   80  |   .0369475   .1221806     0.30   0.762    -.2025343    .2764293
                                   82  |   .0391687   .1423573     0.28   0.783    -.2398606     .318198
                                   83  |    .058766   .1523743     0.39   0.700    -.2398974    .3574294
                                   85  |   .1042758   .1726431     0.60   0.546    -.2341157    .4426673
                                   87  |   .1242272   .1930108     0.64   0.520    -.2540863    .5025406
                                   88  |   .1904977   .2068016     0.92   0.357    -.2148466     .595842
                                       |
                                 _cons |   .3937532   .2001741     1.97   0.049     .0013992    .7861072
                          -------------+----------------------------------------------------------------
                               sigma_u |  .40275174
                               sigma_e |  .30127563
                                   rho |  .64120306   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          F test that all u_i=0: F(4709, 23784) = 8.75                 Prob > F = 0.0000
                          
                          .
                          Kind regards,
                          Carlo
                          (StataNow 18.5)

                          Comment

                          Working...
                          X