Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Descriptive statistics in panel data: within, between and more

    Hello,

    I have a panel dataset of different regions. I have data of the number of conflicts of each region from 1989 - 2020 . My database is unique identified by the region and the year. Below I attach an example:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long gid int year float wanted
     81037 1989 1
     81037 1990 1
     81037 1991 3
     81037 1992 2
     81037 1993 1
     81037 1994 2
     81037 1995 0
     81037 1996 0
     81037 1997 2
     81037 1998 0
     81037 1999 0
     81037 2000 0
     81037 2001 0
     81037 2002 1
     81037 2003 0
     81037 2004 0
     81037 2005 0
     81037 2006 0
     81037 2007 0
     81037 2008 0
     81037 2009 0
     81037 2010 0
     81037 2011 0
     81037 2012 0
     81037 2013 0
     81037 2014 0
     81037 2015 0
     81037 2016 0
     81037 2017 0
     81037 2018 0
     81037 2019 0
     81037 2020 0
    137902 1989 0
    137902 1990 0
    137902 1991 0
    137902 1992 0
    137902 1993 0
    137902 1994 0
    137902 1995 0
    137902 1996 0
    137902 1997 0
    137902 1998 0
    137902 1999 0
    137902 2000 0
    137902 2001 0
    137902 2002 0
    137902 2003 0
    137902 2004 0
    137902 2005 0
    137902 2006 0
    137902 2007 0
    137902 2008 0
    137902 2009 0
    137902 2010 0
    137902 2011 0
    137902 2012 0
    137902 2013 0
    137902 2014 0
    137902 2015 0
    137902 2016 0
    137902 2017 0
    137902 2018 0
    137902 2019 0
    137902 2020 1
    140096 1989 0
    140096 1990 0
    140096 1991 0
    140096 1992 0
    140096 1993 0
    140096 1994 0
    140096 1995 0
    140096 1996 0
    140096 1997 0
    140096 1998 0
    140096 1999 0
    140096 2000 0
    140096 2001 0
    140096 2002 0
    140096 2003 0
    140096 2004 0
    140096 2005 0
    140096 2006 0
    140096 2007 0
    140096 2008 0
    140096 2009 0
    140096 2010 0
    140096 2011 0
    140096 2012 1
    140096 2013 0
    140096 2014 0
    140096 2015 2
    140096 2016 0
    140096 2017 0
    140096 2018 2
    140096 2019 0
    140096 2020 0
    140097 1989 0
    140097 1990 0
    140097 1991 0
    140097 1992 0
    end

    I would like to know if the variation of the variable "wanted" is due to a variation within regions or between regions. To do it, I have used the commands xtsum and xttab (for all the database). I write below the results:
    Click image for larger version

Name:	Paneldata_Screen.JPG
Views:	1
Size:	46.2 KB
ID:	1643505







    I would like to ask two things:

    1) I would like to know how big is my variation and if that variation is due to between regions or within regions. I have read "help xtsum" but I have not cleared how quantify the magnitude of the variation (if it's big or not) and how much is due to within or between.
    2) I would also like to ask if there are other commands in Stata (do a graph or something) to visualize if the variation is between or within or to help me to understand what is going on with the data in descriptive terms.

    If my questions are not clear or you need more information, please do not doubt to ask for it.
    Last edited by Diego Malo; 04 Jan 2022, 02:06.

  • #2
    Diego:
    it would seem that your within std. dev. is higher than its between counterpart.
    However, mimicking what Stata does with -xtsum- behind the curtain, it is important to stress that these std. dev. are calculated in very different ways:
    Code:
    . xtset gid year
    
    Panel variable: gid (unbalanced)
     Time variable: year, 1989 to 2020
             Delta: 1 unit
    
    . xtsum wanted
    
    Variable         |      Mean   Std. dev.       Min        Max |    Observations
    -----------------+--------------------------------------------+----------------
    wanted   overall |       .19   .5630903          0          3 |     N =     100
             between |             .1846573          0     .40625 |     n =       4
             within  |             .5403539    -.21625    2.78375 | T-bar =      25
    
    *overall std. dev.
    . sum wanted
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
          wanted |        100         .19    .5630903          0          3
    
    *within std. dev.*
      
    . bysort gid: egen within=mean(wanted)
    
    . bysort gid: replace within=(wanted-within+.19)
    
    
    . sum within
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
          within |        100         .19    .5403539    -.21625    2.78375
    
    *between std. dev.*
    
    . bysort gid : egen between=mean(wanted)
    
    . bysort gid : replace between=. if _n>1
    
    
    . sum between
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
         between |          4    .1484375    .1846573          0     .40625
    
    .
    I'm no aware of graphical aid about this issue.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Carlo Lazzaro Thank you for your answer. According to it, is it possible to know if the variation is big?

      On the other hand, I do not understad very well what you mean when you say "standard deviations are calculated in very different ways".

      Comment


      • #4
        Diego:
        1) it's difficul to say whether the variation is big or small if you do not have a standard. What you can retrieve is that the within variation plays a more relevant role that its betwen counterparts. Therefore, I suspect that -fe- would be te way to go.
        2) if you take a look at -xtsum- entry in Stata .pdf (on which my example was based) you can see how within and between std. dev. are calculated.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Thank you for your answer Carlo Lazzaro . Even if we consider just the overall variation, is not possible to say if the variation is big or not?

          Comment


          • #6
            Diego:
            sorry, no, as small or big are qualitative and not quantitative terms.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              "I would like to know if the variation of the variable "wanted" is due to a variation within regions or between regions. To do it, I have used the commands xtsum and xttab (for all the database). I write below the results:"
              You need to look at intra-class correlation (ICC) which has an interpretation like correlation coefficient. Closer the correlation coefficient value towards 1, more similar the clusters/subjects are within and closer the value is towards zero, clusters/subjects are more dissimilar within. ICC is calculated based on (between/between+within) variance. To calculate ICC on raw standard deviation reported from -xtsum-:

              Code:
              loc b ( 0.1262968 )^2
              loc w (  0.2486002)^2
              
              di "ICC:" `b' / (`b'+`w')
              
              ICC: .20514827
              Alternatively, -xtreg- produce that for you from residual variance after the fitted regression and from subject specific variance. You also can get the same from the post estimation command -estat icc- if you fit a mixed model using -mixed-:

              Code:
              //make sure data is xtset:
              xtset gid year //  assume
              
              //using -xtreg-
              
              xtreg wanted, mle
              
              
              //using -mixed-
              
              mixed wanted || gid:
              
              estat icc
              Roman

              Comment

              Working...
              X