Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Balancing an unbalanced panel

    Hi all,

    I'm looking to balance my panel data - any opinions/warnings on this would also be appreciated. Essentially I need both baseline (wave 4) and outcome (wave 6) measures for analysis of adolescent substance use and self-esteem.
    My panel is pretty unbalanced - see code below. I essentially need my unique ID youth_pidp to only be present if they have data in both wave 4 and 6.
    ** My panel is mostly unbalanced due to the timing of the questionnaires across waves are only for those aged 10-15 so there will likely be a lot of missing values - unsure whether this is MAR.

    I've looked online but I can't find any information which I could apply.

    Here is my xtset, xtdescribe.

    Code:
    ** setting it as panel just in case **
    . xtset youth_pidp wave
           panel variable:  youth_pidp (unbalanced)
            time variable:  wave, 4 to 6, but with gaps
                    delta:  1 unit

    Code:
    youth_pidp:  68014295, 68028583, ..., 1.639e+09              n =       5725
        wave:  4, 6, ..., 6                                      T =          2
               Delta(wave) = 1 unit
               Span(wave)  = 3 periods
               (youth_pidp*wave uniquely identifies each observation)
    
    Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                             1       1       1         1         2       2       2
    
         Freq.  Percent    Cum. |  Pattern*
     ---------------------------+----------
         2259     39.46   39.46 |  1.
         1790     31.27   70.72 |  11
         1676     29.28  100.00 |  .1
     ---------------------------+----------
         5725    100.00         |  XX
     --------------------------------------
     *Each column represents 2 periods.
    I hope thats enough information, thanks in advance.

    Em


    Last edited by em lowthian; 24 Jul 2017, 05:04. Reason: extra information

  • #2
    Em:
    there's no need to baĆ²ance your unbalanced panel, as Stata can handle both balanced and unbalanced panels with no problems.
    Besides, trying to balance an unblalanced panel means, in all likelihood, ending up with a dataset that is pretty different from the original one (with consequences on the subsequent inference).
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hi Carlo, thanks for your message.

      Could you explain a bit more please?
      When you say Stata can handle it, how would it handle it? Would it still be able to run xtprobit, xtlogit etc without the panel being balanced?
      I am concerned about the number of cross-sectional cases going into the model and the results being affected by it in turn?

      Many thanks,

      Em

      Comment


      • #4
        Em:
        q1) yes, Stata would still be able to.
        q2) if I were you, my main concern would be about the mechanism (ie, is it ignorable or not?) of the missingness affecting the dataset.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Carlo,

          Thank you.
          In terms of Q2, I would expect a level of missingness in the data due to the questionnaire being administered to those aged 10 - 15 years of age, so there is likely to be a lot of missingness due to time periods of data collection e.g. 12+ year olds completing the questionnaire in wave 4 will not be in wave 6 as they miss the time-frame and new adolescents answering the questionnaire at wave 6. However, there is a lack of information on this dataset regarding attrition, non-response and missing values so that does lead me to be slightly concerned when cutting out a lot of values. However, I am looking at divorce as a predictor in wave 4, and then the outcome response (self-esteem) in wave 6, so it would not be sensical to look at the immediate outcomes e.g. self-esteem in wave 4, when divorced occured - I am more interested in change over time which is why I thought balanced data would be sensical. But I do think that the level of missingness would bias the dataset. Difficult situation - do you know of any texts that cover these questions in any detail?

          Best,

          Em

          Comment


          • #6
            Em:
            as far as missing values is concerned, you may find very useful hints in:
            Allison PD. Missing Data. Thousand Oaks, CA: SAGE Publications, 2001.
            Little RJA, Rubin DR. Statistical analysis with missing data, 2nd ed. Chichester: Wiley, 2002.
            van Buuren S. Flexible Imputation of Missing Data. Boca Raton, FL: Chapman and Hall/CRC, 2012.

            A cautionary approach would not delete missing values, but try to deal with them instead.

            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Many thanks Carlo

              Comment

              Working...
              X