Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two-way line graph with two y-axis variables

    If I wanted to graph two variables, normally, I found the code to be
    Code:
    twoway (line polarization year, sort)
    However I would like to graph three variables, two of them being in the y-axis (polarization and gini2). But I don't know how to fit in "sort" in the command in order to make sure the graph is sorted on the x axis. Without sort, this is the command:
    Code:
    twoway (line polarization year, c(l) yaxis(1)) (line gini2 year, c(l) yaxis(2))
    But the data comes out a mess because it is not sorted on the x axis. How do I sort it please?

  • #2
    Or, do I need to use the command "xtline" since it's panel data? And if so, how?

    Comment


    • #3
      1. Have you xtset your data to tell Stata that it is panel data
      2. If yes, the following should work:
      Code:
       graph twoway (line polarization year, yaxis(1)) (line gini2 year, yaxis(2)), by(state, yrescale) ytitle("")
      Replace "state" by your "panel variable".
      It may be too complicated but it should do the job.
      Last edited by Eric de Souza; 08 Apr 2018, 11:19. Reason: Edited to add line beginning "Replace"

      Comment


      • #4
        Following Eric's helpful post here are some suggestions on various levels.

        Plotting multiple time series is one of the most difficult problems in statistical graphics. In principle, it is trivial: time automatically determines one axis and so you should just plot the data versus time. In practice, some kind of mess with several intertwining series is encountered with many datasets. It is intriguing that people in quite different fields seem to have come up with the same term, spaghetti, for this difficulty. There is a very small prize, namely my indefinite gratitude, for any early literature references mentioning "spaghetti" for graphics like this in formal journals.

        It's hard to give really good advice without quantification of the problem. In modest generality there could be K variables x T time points x N panels and those sizes constrain what works well. Here K is explicit at 2 but nothing else is concrete.

        Thinking more or less logarithmically, with say 3 panels it is worth devising a careful colour and/or line pattern scheme to label each panel distinctively whereas with even 10 just explaining that scheme in a legend or key takes up a large fraction of the available space and is often over the top. As for 30, 100, 300, ... panels! Similar comments apply to variables.

        Even with these numbers, K variables x T time points x N panels, defined much also depends on how the variables behave! Do they tend to behave similarly despite small differences or differently despite minor similarity? Are they smooth or noisy?

        Concretely, here are some practicalities from my experiences.

        1.Should you juxtapose or superpose? You can plot each panel separately or each variable separately. Plotting all panels and several variables on one graph is unlikely to work well. Many people have learned the idea of small multiples for clarity, so that for example the default default [not a typo] for xtline is a separate graph for each panel. That is clear in the sense that what is being shown can be read off. I have my doubts on how far readers can and do make any kind of mental synthesis unless the number of panels is very small. Conversely, I have a bias that it's best not to put different variables on the same graph unless they are measured in the same units. Thus systolic and diastolic blood pressures are fine, as are minimum, mean and maximum of some outcome.

        2. A logarithmic scale may be a good idea for time series, either because it matches patterns of change or because it spreads out a skewed distribution. You have to work a bit usually to get nice axis labels, but on that see https://www.statalist.org/forums/for...lable-from-ssc
        niceloglabels from SSC or the Stata Journal is used in the code example below.

        3. The legend is at best a necessary evil and at worst a space-filling distraction. I have a slogan "Lose the legend" ("Kill the key") with immediate corollary "if you can". This is not a problem when identifiers are not informative, but it can be a problem when they are. The code example below shows an underused device, trailing text labels. For the Grunfeld data, that is easy, but often two- or three-letter abbreviations can be used. So, for many readers CA, NY, UT, WY, OH and so forth have immediate meaning. (Three-letter abbreviations are, naturally enough, TLAs.)

        4. If you use colours, don't mix red and green, as telling them apart is an all too common difficulty for many readers. Orange and blue often work well together. Grey is useful for less important contextual series. For a strategy of using all the other series as context, see e.g.

        https://www.statalist.org/forums/for...ailable-on-ssc

        https://stats.stackexchange.com/ques...es-in-one-plot

        5. A small irritant but a very easy way to gain some space: remove unnecessary time axis titles such as "Date" and "Year" and especially variable names that are convenient in Stata such as
        mdate or qdate but that will just look odd to people reading your paper or hearing your presentation.

        6. With many panels you may need to do something else instead such as

        * show a random sample (the pattern and variability of 30 panels may be easier to understand than that of 300 or 3000 or ....)

        * show panels that correspond to selected order statistics e.g. minimum, maximum, etc. on some criterion.

        * show summary statistics instead

        * show principal components or independent components

        I just played with the Grunfeld data to show some of these possibilities. (Remark to programmers: if a method doesn't work well with the Grunfeld data, it's unlikely to work much better for larger and/or more complicated datasets.)

        The example here is naturally just indicative. A publishable graph would need better axis titles, including units of measurement.

        For yet other approaches, see


        sparkline (SSC) as exemplified in https://www.statalist.org/forums/for...le-time-series

        or


        multiline (SSC) as announced in https://www.statalist.org/forums/for...ailable-on-ssc

        (
        multiline supersedes the multitsline in 1355560.)

        Code:
        webuse grunfeld 
        set scheme s1color 
        
        niceloglabels mvalue, style(125) local(mvla) 
        line mvalue year, ysc(log) yla(`mvla', ang(h)) scheme(s1color) c(L) ///
        || scatter mvalue year if year == 1954, ysc(log) ms(none) mla(company) name(G1, replace) ///
        legend(off) xtitle("") 
        
        niceloglabels kstock, style(125) local(ksla) 
        line kstock year, ysc(log) yla(`ksla', ang(h)) scheme(s1color) c(L) ///
        || scatter kstock year if year == 1954, ysc(log) ms(none) mla(company) name(G2, replace) ///
        legend(off) xtitle("") 
        
        graph combine G1 G2


        Click image for larger version

Name:	grunfeld.png
Views:	1
Size:	74.2 KB
ID:	1438348


        Comment


        • #5
          Thanks a lot for this

          Comment


          • #6
            Hi Nick, is it possible to add a legend to this combined graph? I don't want individual graph legends nor do I want to use grc1leg since it doesn't allow me to control for graph sizes.

            Comment


            • #7
              My answer to #6 is on two levels.

              1. Yes; omitting legend(off) in the syntax will restore a legend. The point of legend(off) is to suppress the legend. But the legend will be of the two variables plotted, kstock as line and kstock as scatter. which wouldn't be useful.

              2. The question misses an important point behind the graph: to add direct labels of company identifiers to make a legend unnecessary any way. In this example many orthodox commands will show a legend, as with

              Code:
              webuse grunfeld, clear
              xtset company year,
              xtline invest, ysc(log) overlay
              but then there are in my view two awkward -- in other examples very awkward -- consequences of adding a legend

              * The legend itself takes up much of the available real estate. In this example there are 10 companies; in other panel examples there are often many more panels and the legend is correspondingly large.

              * A legend is at best a necessary evil, and obliges mental back and forth -- which panel is this curve, and so on -- which is better avoided.

              So that is not something I want to do here.

              grc1leg (from http://www.stata.com/users/vwiggins, as you are asked to explain), isn't used in my solution at all, and I am not advising its use or using it. Any impression that it is needed must arise otherwise.

              Comment

              Working...
              X