Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple ECDF on one graph from different date points

    I am trying to compare the cdf of ks-test between the first and last year of a distribution. So I have a panel data, lets say with variables years and sales. So I generated the cumulative density with the code
    Code:
    cumul sales , gen(cum)
    . I am able to generate the cdf graph for the beginning year
    Code:
    line cum sales if year==2004, sort
    and the last year
    Code:
    line cum sales if year==2020, sort
    . But I want to have the both graphs on one plot so I can make comparism. How do I go about it?

  • #2
    If you start with a wide layout, i.e., sales2004 and sales2020 and generate the cumulative distributions, then the example below from the documentation shows you how to do it.

    To graph two cumulative distributions on the same graph:

    . sysuse citytemp, clear
    . cumul tempjan, gen(cjan)
    . cumul tempjuly, gen(cjuly)
    . stack cjan tempjan cjuly tempjuly, into(c temp) wide clear
    . line cjan cjuly temp, sort
    See

    Code:
    help cumul

    Comment


    • #3
      =1 is incorrect insofar as neither graph shows a complete and correct cumulative distribution! Each graph shows only the cumulative probabilities for the entire dataset that happen to coincide with particular years.

      You could use distplot from the Stata Journal as documented:

      Code:
      . search distplot, sj historical
      
      Search of official help files, FAQs, Examples, and Stata Journals
      
      SJ-19-1 gr41_5  . . . . . . . . . . . . . . . . . Software update for distplot
              (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q1/19   SJ 19(1):260
              changes include better handling of the by() option calls;
              simpler default y-axis titles; more detailed discussion of
              exactly what is plotted; and more information on ridits
      
      SJ-10-1 gr41_4  . . . . . . . . . . . . . . . . . Software update for distplot
              (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q1/10   SJ 10(1):164
              new reverse(ge) option specifies plotting probabilities or
              frequencies greater than or equal to any data value
      
      SJ-5-3  gr0018  . . . . . . . . . .  Speaking Stata: The protean quantile plot
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
              Q3/05   SJ 5(3):442--460           (see gr41_3 and gr42_3 for commands)
              discusses quantile and distribution plots as used in
              the analysis of species abundance data in ecology
      
      SJ-5-3  gr41_3  . . . . . . . . . . . . . . . . . Software update for distplot
              (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q3/05   SJ 5(3):471
              simplified syntax; both by() and over() are now allowed
      
      SJ-4-2  gr0004  .  Speaking Stata: Graphing categorical and compositional data
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
              Q2/04   SJ 4(2):190--215                                 (no commands)
              discusses graphical possibilities for categorical and
              compositional data
      
      SJ-4-1  gr0003  . . . . . . . . . . . . Speaking Stata: Graphing distributions
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
              Q1/04   SJ 4(1):66--88                                   (no commands)
              a review of official and user-written commands for
              graphing univariate distributions; includes tricks
              beyond what is obviously and readily available
      
      SJ-3-4  gr41_2  . . . . . . . . . . . . . . . . . Software update for distplot
              (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q4/03   SJ 3(4):449
              option tscale() renamed as trscale()
      
      SJ-3-2  gr41_1  . . . . . . . . . . . . . . . . . Software update for distplot
              (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              Q2/03   SJ 3(2):211
              enhanced to use Stata 8 graphics and provides new options
      
      STB-51  gr41  . . . . . . . . . . . . . . . . . .  Distribution function plots
              (help distplot if installed)  . . . . . . . . . . . . . . .  N. J. Cox
              9/99    pp.12--16; STB Reprints Vol 9, pp.108--112
              plots the cumulative distribution function or survival function
              and allows multiple variables
      The original 1999 article makes some points not made in later papers and can be accessed freely (and for free) at https://www.stata.com/products/stb/journals/stb51.pdf

      Otherwise use the latest version as revealed by
      Code:
      search
      in an up-to-date Stata. (The code is unlikely to change much but the version of the help file on my machine continues to grow with expository and historical material and may be released in an Update in the near future.)

      This example shows the technique I think is needed here.

      Code:
      webuse grunfeld
      distplot mvalue if inlist(year, 1935, 1954), over(year)
      That said, I prefer quantile plots for comparison: the twin command qplot from the Stata Journal is one such tool.

      Comment


      • #4
        Thank you Nick and Andrew. But I have this issues with the graph from Nicks suggestions

        When I used Andrews suggestion with the code
        Code:
        line cum1 cum2 lnTFP , sort
        I had a graph like this graph
        Click image for larger version

Name:	graph1.tif
Views:	1
Size:	28.2 KB
ID:	1729596




        When I used the displot command as Nick suggested
        Code:
        distplot cum if inlist(year, 2004, 2020), over(year)
        I had this graph



        Click image for larger version

Name:	Graph2.png
Views:	1
Size:	21.7 KB
ID:	1729597





        The displot command used the ECDF for both the x- and y-axis. Why is it so? I am used to the graph of Andrew. But the displot makes it easier without collapsing the data. Is there a way to make the displot command give an output similar to the earlier graph?

        Comment


        • #5
          distplot (not displot) used what you gave it, but it expects original data, not cumulative probabilities.

          Andrew Musau's suggestions and mine are completely distinct.

          The difference is that cumul calculates the cumulative probabilities, and then line can be used to plot them.

          But distplot does both tasks. Another way to see this is that the example in #3 makes no use of cumul whatsoever. Nor do any of several examples in the help for distplot.

          You seem to have switched your focus from sales to lnTFP.

          Code:
          distplot lnTFP if inlist(year, 2004, 2020), over(year) 
          is the equivalent of

          Code:
          cumul lnTFP if year == 2004, gen(ECDF1)
          cumul lnTFP if year == 2020, gen(ECDF2)
          line ECDF? lnTFP, sort
          Last edited by Nick Cox; 09 Oct 2023, 14:27.

          Comment


          • #6
            Thank you Nick Cox for your explanation of the difference between the two. Your code
            Code:
            distplot lnTFP if inlist(year, 2004, 2020), over(year)
            worked fine like Andrew Musau's suggestion. Sorry for changing the variables and the misspelling of distplot in my last post.
            Last edited by Michael Kwadwo; 09 Oct 2023, 15:00.

            Comment


            • #7
              Hi Nick Cox , I have one other concern which is not necessarily the same thing. I am doing ks-test between the lnTPF in 2004 and lnTFP in 2020. I tried something like this but it did not work.
              Code:
              ksmirnov lnTFP, by(inlist(year, 2004, 2020))
              If you could help me go about it. Thank you

              Comment


              • #8
                Code:
                 
                 ksmirnov lnTFP if inlist(year, 2004, 2020), by(year)

                Comment


                • #9
                  Thank you Nick Cox, I am grateful. It worked perfectly fine

                  Comment

                  Working...
                  X