Balancing an unbalanced panel

em lowthian

Join Date: Jun 2017
Posts: 51

Balancing an unbalanced panel

24 Jul 2017, 04:58

Hi all,

I'm looking to balance my panel data - any opinions/warnings on this would also be appreciated. Essentially I need both baseline (wave 4) and outcome (wave 6) measures for analysis of adolescent substance use and self-esteem.
My panel is pretty unbalanced - see code below. I essentially need my unique ID youth_pidp to only be present if they have data in both wave 4 and 6.
** My panel is mostly unbalanced due to the timing of the questionnaires across waves are only for those aged 10-15 so there will likely be a lot of missing values - unsure whether this is MAR.

I've looked online but I can't find any information which I could apply.

Here is my xtset, xtdescribe.

Code:

** setting it as panel just in case **
. xtset youth_pidp wave
       panel variable:  youth_pidp (unbalanced)
        time variable:  wave, 4 to 6, but with gaps
                delta:  1 unit

Code:

youth_pidp:  68014295, 68028583, ..., 1.639e+09              n =       5725
    wave:  4, 6, ..., 6                                      T =          2
           Delta(wave) = 1 unit
           Span(wave)  = 3 periods
           (youth_pidp*wave uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       1       1         1         2       2       2

     Freq.  Percent    Cum. |  Pattern*
 ---------------------------+----------
     2259     39.46   39.46 |  1.
     1790     31.27   70.72 |  11
     1676     29.28  100.00 |  .1
 ---------------------------+----------
     5725    100.00         |  XX
 --------------------------------------
 *Each column represents 2 periods.

I hope thats enough information, thanks in advance.

Em

Last edited by em lowthian; 24 Jul 2017, 05:04. Reason: extra information

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17631
#2

24 Jul 2017, 05:04

Em:
there's no need to baòance your unbalanced panel, as Stata can handle both balanced and unbalanced panels with no problems.
Besides, trying to balance an unblalanced panel means, in all likelihood, ending up with a dataset that is pretty different from the original one (with consequences on the subsequent inference).

Kind regards,
Carlo
(StataNow 18.5)
Comment
em lowthian

Join Date: Jun 2017

Posts: 51
#3

24 Jul 2017, 05:11

Hi Carlo, thanks for your message.

Could you explain a bit more please?
When you say Stata can handle it, how would it handle it? Would it still be able to run xtprobit, xtlogit etc without the panel being balanced?
I am concerned about the number of cross-sectional cases going into the model and the results being affected by it in turn?

Many thanks,

Em
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17631
#4

24 Jul 2017, 05:21

Em:
q1) yes, Stata would still be able to.
q2) if I were you, my main concern would be about the mechanism (ie, is it ignorable or not?) of the missingness affecting the dataset.

Kind regards,
Carlo
(StataNow 18.5)
Comment
em lowthian

Join Date: Jun 2017

Posts: 51
#5

24 Jul 2017, 05:28

Carlo,

Thank you.
In terms of Q2, I would expect a level of missingness in the data due to the questionnaire being administered to those aged 10 - 15 years of age, so there is likely to be a lot of missingness due to time periods of data collection e.g. 12+ year olds completing the questionnaire in wave 4 will not be in wave 6 as they miss the time-frame and new adolescents answering the questionnaire at wave 6. However, there is a lack of information on this dataset regarding attrition, non-response and missing values so that does lead me to be slightly concerned when cutting out a lot of values. However, I am looking at divorce as a predictor in wave 4, and then the outcome response (self-esteem) in wave 6, so it would not be sensical to look at the immediate outcomes e.g. self-esteem in wave 4, when divorced occured - I am more interested in change over time which is why I thought balanced data would be sensical. But I do think that the level of missingness would bias the dataset. Difficult situation - do you know of any texts that cover these questions in any detail?

Best,

Em
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17631
#6

24 Jul 2017, 08:16

Em:
as far as missing values is concerned, you may find very useful hints in:
Allison PD. Missing Data. Thousand Oaks, CA: SAGE Publications, 2001.
Little RJA, Rubin DR. Statistical analysis with missing data, 2nd ed. Chichester: Wiley, 2002.
van Buuren S. Flexible Imputation of Missing Data. Boca Raton, FL: Chapman and Hall/CRC, 2012.

A cautionary approach would not delete missing values, but try to deal with them instead.

Kind regards,
Carlo
(StataNow 18.5)
Comment
em lowthian

Join Date: Jun 2017

Posts: 51
#7

25 Jul 2017, 08:40

Many thanks Carlo
Comment

Announcement

Balancing an unbalanced panel

Comment

Comment

Comment

Comment

Comment

Comment