Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Visualizing flow data in Stata

    Hi,

    I am looking to visualize flow data. A preferred approach would be to create a Sankey diagram (see below). I have found - gchart - (http://www.belenchavez.com/data-blog...ducing-gcharts) but have been unable to access to package somewhere. Does anyone know how to make Sankey diagrams in Stata, or have an alternative approach to visualizing flow data.

    Fyi, the two variables I want to plot both have 10+ categories, making it quite difficult to find a good way of visualizing the data.
    Click image for larger version

Name:	1_uINwB6NrzWtncnbf2T-wyg.png
Views:	1
Size:	82.2 KB
ID:	1460690

  • #2
    10 x 10 or so I would present in table or matrix form.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      10 x 10 or so I would present in table or matrix form.
      What about data that is more complex? Say, 4x5x5x5, where the latter three 5s are within the same domain.

      Comment


      • #4
        500 flows in 4 dimensions, that seems to mean. I don't know how to visualize that well except indirectly. But I don't have experience there either.

        Comment


        • #5
          Though I agree with Nick that a table might be the most straightforward presentation, if you are set on such a diagram see:

          https://developers.google.com/chart/...gallery/sankey

          It looks simple to add categories and change the weights to produce the diagram you want. Or, if you are enterprising, you could write a Stata program to write the html file.

          Comment


          • #6
            Originally posted by Scott Merryman View Post
            Though I agree with Nick that a table might be the most straightforward presentation, if you are set on such a diagram see:

            https://developers.google.com/chart/...gallery/sankey

            It looks simple to add categories and change the weights to produce the diagram you want. Or, if you are enterprising, you could write a Stata program to write the html file.
            Thank you for your answer. Unfortunately I am not able to export data to web applications due to confidentiality and I can only use Stata.

            Comment


            • #7
              Short of developing such a graph, I think you are at an impasse. I find Sankey diagrams a waste of ink for more than a few cstegories and even then, can be confusing to read.

              Comment


              • #8
                Have you considered parallel coordinate plots? There would probably be a way to code something using twoway with multiple layers and appropriate alpha transparency, but I imagine it would be fairly challenging to put something together that would run relatively efficiently. It wouldn’t be as clean of a mapping as a Sankey diagram, but you might find spineplot (on SSC from Nick Cox) to be at least somewhat helpful. The only caution I would add is that humans are pretty bad at making comparisons of relative areas so you’d need to design things extremely carefully to ensure users interpreted the information consistently with what you are attempting to illustrate.

                Comment


                • #9
                  Note that those who want to create a Sankey diagram should know that this website offers an online solution.
                  The author, Steve Bogart, of this open source tool notes that it "builds on the open source tool D3.js and its Sankey library"
                  Of course, there is no relation with Stata, except that it should be possible to prepare your data to the format required to input it into SankeyMATIC.
                  http://publicationslist.org/eric.melse

                  Comment


                  • #10

                    "gchart" is available?

                    Comment


                    • #11
                      Originally posted by Andrés Lahur Talavera Cuya View Post
                      "gchart" is available?
                      I second that question, I was unable to find how to download it.

                      Comment


                      • #12
                        As of Stata 16, you can access Sankey through Python. See https://www.mjcrowther.co.uk/software/sankey/

                        Comment


                        • #13
                          It seems that -sankey- require startvar and stopvar to have different values. For example, if your startvar takes values of 1 (male), 2 (female), and stopvar takes values of 1 (1st class), 2 (2nd class), 3 (3rd class) and 4 (crew). Then you run -sankey startvar stopvar freqvar-, it will give you misleading and weird plot.

                          Comment


                          • #14
                            See https://www.stata.com/meeting/uk21/s...UK21_Naqvi.pdf for the slides from a very interesting talk by Asjad Naqvi -- with a Sankey diagram on p.39.

                            A Sankey diagram is exactly what I could want to monitor flows through various stages, e.g. how many students failed, left for other reasons or proceeded to the next year of a degree programme. Asjad's example appears to have similar flavour.

                            Their use for two-way tables I find bemusing and I have to suspect that sometimes the aim is for Wow! and the appeal is of smooth curves, which people tend to like for other reasons.

                            A trope briefly popular in graphics circles contrasted Aha! as the reaction you should want and Wow! as second best. I added Huh? as what you least want. Aha! could be "This helps me to see structure in my data" or (often as or even more helpful) "This shows important or interesting detail I hadn't spotted before". Wow! could be "How did you do that?" (gratifying for the presenter, but secondary) or just "That's pretty". Huh? could be What am I supposed to see here? (as with hairball representations of many networks).

                            Comment


                            • #15
                              Yes, people tend to be attracted by "beautiful" (colorful, complicated, fabulous) but useless statistical (if they can be counted as) plots nowadays.

                              Comment

                              Working...
                              X