Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing a cumulative frequency count

    I'm working with a panel data set consisting of 659 individuals who I track over different periods of time between 2004 and 2012, i.e. some individuals are tracked for 9 years (2004-2012) while others are tracked 3 years (for example 2004-2006) and so on. I have generated a duration variable, which records for how long they have been tracked and places this value into the 2004 slot for each individual. So my data set looks like this:

    Code:
    pid      syear   duration   
    111    2004    2
    111    2005    .
    222    2004    3
    222    2005    .
    222    2006    .
    In fact, my real data looks like this:

    Code:
    duration    Freq.    Percent    Cum.
                
    1    10    1.52    1.52
    2    53    8.04    9.56
    3    83    12.59    22.15
    4    78    11.84    33.99
    5    79    11.99    45.98
    6    92    13.96    59.94
    7    85    12.90    72.84
    8    92    13.96    86.80
    9    87    13.20    100.00
                
    Total    659    100.00
    And if I make a histogram out of that, then everything is fine. BUT I want the following:

    - a histogram (bar chart?) and line chart with cumulative counts so the first bar is 10, the second is 10+53, the third is 10+53+83

    - A line chart that goes the other way, so something that shows how many individuals I still have remaining: so the first bar should be 659-10, the second 659-10-53 etc.

    I understand that if I just had these numbers in 9 cells, then the matter would be trivial. But right now they're spread out over 659 individuals.

  • #2
    Originally posted by Arik Beremzon View Post
    BUT I want the following:
    - a histogram (bar chart?) and line chart with cumulative counts so the first bar is 10, the second is 10+53, the third is 10+53+83
    - A line chart that goes the other way, so something that shows how many individuals I still have remaining: so the first bar should be 659-10, the second 659-10-53 etc.
    I'll leave the bar chart to others (I'm especially challenged when it comes to creating bar charts in Stata), but for the cumulative line plots, why not use sts graph? Specifically, the following should give you what you want:
    Code:
    stset duration
    sts graph, per(659)
    sts graph, failure per(659)
    where you can modify the title/labels as desired.

    Comment


    • #3
      For a cumulative bar chart, you could install distplot (SJ) and go e.g.

      Code:
      sysuse auto, clear
      distplot mpg, recast(bar))
      The default bar width of 1 matches that example and yours. Note that bars will not appear for values not occurring in the data.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        For a cumulative bar chart, you could install distplot (SJ) and go e.g.

        Code:
        sysuse auto, clear
        distplot mpg, recast(bar))
        The default bar width of 1 matches that example and yours. Note that bars will not appear for values not occurring in the data.
        Hi Nick, I noticed there's some issue with the help file of distplot. I think I had the same with an .ado I wrote, don't remember exactly, but it was some text element not supporting too long sentences. Weirdly (as it was in my ado) If you visualise it opening the help file it works properly but it doesn't if you open it by typing help program_name in Stata.

        Comment


        • #5
          distplot has been updated several times since first publication in 1999. I looked at the latest public version which is from 2019, and should be accessible to you:

          Code:
          type http://www.stata-journal.com/software/sj19-1/gr41_5/distplot.hlp
          I can see no issue except that within input examples there should be no dot prompt. That will be fixed in the next update.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            distplot has been updated several times since first publication in 1999. I looked at the latest public version which is from 2019, and should be accessible to you:

            Code:
            type http://www.stata-journal.com/software/sj19-1/gr41_5/distplot.hlp
            I can see no issue except that within input examples there should be no dot prompt. That will be fixed in the next update.
            I installed it yesterday for the first time. I have just uninstalled and re-installed and the issue is not there anymore, mystery. Sorry for wasting time and thanks for the quick answer!

            Comment


            • #7
              Hello everyone,

              I needed to create a cumulative frequency histogram and couldn't manage to do it with distplot, so this is the solution I came up with: I used twoway__histogram_gen to get the data from the original histogram, which generates two new variables: one with the start points of each bar and another with the heights. Then, I plotted the data from these two variables using twoway.

              This is a simple solution, but it might have two issues:
              1. Twoway generates margins that aren't present in the original histogram. This can be fixed by adding something like plotregion(margin(t 0 b 0)).
              2. If the histogram generates bars with a height of 0, these bars won’t be captured by twoway__histogram_gen. This can be fixed manually by adding a new observation like (start of the bar with height 0, 0) at the corresponding variables. Personally, I use cumulative histograms a lot, so I automated this by writing my own .ado file.
              Anyway, here are the original histogram provided by Stata and the cumulative histogram I managed to create with my method:

              Click image for larger version

Name:	gr1.jpg
Views:	1
Size:	15.7 KB
ID:	1762816
              Click image for larger version

Name:	gr2.jpg
Views:	1
Size:	16.3 KB
ID:	1762817

              Comment


              • #8
                There are for me two loosely linked issues arising from #7.

                1. how far you can get with distplot (from Stata Journal, #5) in more or less your direction

                2. what you are trying to do any way and how to get there.

                From your point of view too the vertical axis in the plot just above is probability or cumulative probability and definitely not probability density.

                From my point of view there is little or no point to binning for cumulative displays. There is some pedagogic value in pointing out that you could draw a histogram and then cumulate the bars; but if you want to show cumulative probability that changes at each observed value. You can get there directly without using a histogram or even getting histogram variables.

                The point made in #3

                Note that bars will not appear for values not occurring in the data
                could rule out the result as too odd, and that's fine by me.

                I played around a little with distplot, I added two modest challenges (a) a comparison of two groups (b) adding marginal rugs.

                Code:
                sysuse auto, clear 
                
                gen where0 = -0.12
                gen where1 = -0.06  
                
                distplot mpg, over(foreign) recast(area) lcolor(stc1 stc2) fcolor(stc1%10 stc2%10) ///
                legend(pos(12) row(1) order(1 2)) addplot(scatter where0 mpg if !foreign, ms(|) msize(large) mc(stc1) ///
                || scatter where1 mpg if foreign, ms(|) msize(large) mc(stc2))
                )
                The stair-step pattern seems out of reach here.

                Click image for larger version

Name:	distplot.png
Views:	1
Size:	37.6 KB
ID:	1762837


                In general, I much prefer a quantile plot here. For example, using qplot (Stata Journal)

                Code:
                qplot mpg, over(foreign) aspect(1) xla(0 "0" 1 "1" 0.25 "0.25" 0.5 "0.5" 0.75 "0.75") msize(large ..) legend(order(2 1) pos(11) ring(0))

                Click image for larger version

Name:	qplot.png
Views:	1
Size:	41.3 KB
ID:	1762838

                Comment

                Working...
                X