Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why is my panel data unbalanced?

    Hi,

    I'm relatively new using Stata and could use some help with doing a Hausman test for my panel data... I would really appreciate any advice.

    Basically, my panel has 160 observations, which are 10 times 16 firm years (2000-2015), so it is a sample of ten firms. I gave them IDs from 1 to 10, so each firm year of "firm 1" has "1", each firm year from "firm 2" has "2" etc.. Each firm has exactly 16 observations (2000-2015) and always the same ID (1-10) for these observations.

    Now when I type in "xtset ID YEAR", Stata tells me the following:


    . xtset ID YEAR
    panel variable: ID (unbalanced)
    time variable: YEAR, 2000 to 2015
    delta: 1 unit


    How can it be that my panel data is unbalanced when every firm has the same number of observations (16) with the same IDs and the same years (2000-2015)?

    I really appreciate any help!

    Best wishes
    Florian

  • #2
    Florian:
    are you sure that you do not have missing values in -ID- and/or -YEAR-?
    I would take a look at:
    Code:
    xttab YEAR
    Last edited by Carlo Lazzaro; 25 Jun 2016, 11:21.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hey Carlo,

      thanks for the reply! I'm absolutely sure...

      I attached a screenshot of my output for xtsum.

      Attached Files
      Last edited by Florian Mayer; 25 Jun 2016, 11:26.

      Comment


      • #4
        Well, perhaps your data are not what you think they are. Look at the output of

        Code:
        table YEAR ID
        If the data are truly balanced, you will see the same counts in every cell of the table. If there are some gaps, you will see where they are.

        Comment


        • #5
          I suspect Carlo is right, or else perhaps that id is miscoded in one or more years. But try running the xtdes command, e.g.

          Code:
          . webuse abdata, clear
          
          . xtdes
          
                id:  1, 2, ..., 140                                    n =        140
              year:  1976, 1977, ..., 1984                             T =          9
                     Delta(year) = 1 unit
                     Span(year)  = 9 periods
                     (id*year uniquely identifies each observation)
          
          Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                   7       7       7         7         8       9       9
          
               Freq.  Percent    Cum. |  Pattern
           ---------------------------+-----------
                 62     44.29   44.29 |  1111111..
                 39     27.86   72.14 |  .1111111.
                 19     13.57   85.71 |  .11111111
                 14     10.00   95.71 |  111111111
                  4      2.86   98.57 |  11111111.
                  2      1.43  100.00 |  ..1111111
           ---------------------------+-----------
                140    100.00         |  XXXXXXXXX
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            I don't think the xtsum result rules out the possibility that there are extra records with missing values, e.g.

            Code:
            . use http://www3.nd.edu/~rwilliam/statafiles/wages, clear
            
            . xtset
                   panel variable:  id (strongly balanced)
                    time variable:  t, 1 to 7
                            delta:  1 unit
            
            . set obs 4166
            number of observations (_N) was 4,165, now 4,166
            
            . replace id = . in 4166
            (0 real changes made)
            
            . xtset
                   panel variable:  id (unbalanced)
                    time variable:  t, 1 to 7
                            delta:  1 unit
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Thank you for your replies.

              @Clyde: I did what you said and I got the table as attached to this message - all ones only...

              @ Richard: I also tried that, but I'm not sure I know what to get out of it. I also attached the output of xtdes.

              Additionally, I now went through the underlying Excel file and checked all IDs (everything correctly formatted, 16 observations each) and YEARs (everything correctly formatted, same years for each firm).
              Attached Files

              Comment


              • #8
                That is very strange; it shouldn't happen. I would try a couple of things.

                1. Exit Stata, re-launch, and try again.

                2. If that doesn't work, run -update all, force-, and then try again.

                If neither of those things work, then I would send the data set along with your code and all of this output to Stata technical support.

                Comment


                • #9
                  Allright, I'll try that. Thanks a lot!
                  I might also try to import the data again...

                  One more question: will it affect my analyses in any way if I just proceed with the xtset data, which Stata calls "unbalanced"?

                  Comment


                  • #10
                    Just run plane old -des- and it will tell you how many observations you have. If greater than 160 you know you have some observations with missing data.

                    Code:
                    use http://www3.nd.edu/~rwilliam/statafiles/wages, clear
                    xtset
                    des
                    set obs 4166
                    replace id = . in 4166
                    xtset
                    des
                    Or, perhaps better, do

                    Code:
                    tab2 YEAR ID, missing
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    StataNow Version: 19.5 MP (2 processor)

                    EMAIL: [email protected]
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11
                      Ok, I did the second code. My result is attached.

                      The other thing I should do ist just let Stata tell me how many observations both ID and YEAR have? I did that with -summarize- and its 160 for both.
                      Attached Files

                      Comment


                      • #12
                        Carlo was right! you have 56 records where both YEAR and ID are missing. My guess is an import error and that lines 161-216 are all missing data records. You could import again or just drop the missing cases. But make sure they really are missing, e.g. you don't have 56 records where ID and YEAR are missing when they shouldn't be.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          Wow great! Richard, Carlo, thank you very much! I cleared all the cells below my data in the Excel and now everything is strongly balanced.

                          Great and fast help!

                          Comment


                          • #14
                            This is day 1 for me on my STATA odessy and I thought you would like to know I had exactly the same (unbalanced) problem as Florian and I found these posts really helpful. Turns out it was also cells in Excel underneath the data that appeared blank) causing the issue. I copied and pasted only the populated cells to a fresh Excel file and re-imported - problem solved and I now have (strongly balanced).

                            Comment


                            • #15
                              Martine:
                              welcome to this forum.
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X