Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graph line with values for only a few groups over time

    Hi,

    I want to graph the development of the number of houses of the type "allmännyttan" (publicly owned housing) over time in Swedish municipalities. I have panel data on municipality level between 1975 - 2018, not all years though. I use the variable log_allmannyttan (log of the variable with the number of public owned houses in each municipality and year) as my main variable of interest and there are 290 municipalities. Now I want to graph the time trends, and include the development of the mean and some specific regions.

    I have created a population weighted mean of the number of public houses each year (log variable name log_popwm_allmannyttan) that I want to plot over time, but I also want to include lines that show the time trend in certain municipalities where the number of houses increased most, decreased most, etc, to show the spread. (I can't include all because 290 lines will be to cluttered.)

    When I try this simple code, it yields a graph with two lines, excluding one of the regions I specified. I have tried including more cases as well, usually one or two are excluded in the graph; it seems arbitrary to me which one that are included. Could anyone help me with a suitable code for creating this graph?

    The code I tried:
    Code:
    line log_popwm_allmannyttan year, sort || line log_allmannyttan year if region_code == 360, sort || line log_allmannyttan year if region_code == 536, sort
    Data example:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(year region_code) str22 region_name long allmannyttan_owner float(log_allmannyttan popwm_allmannyttan log_popwm_allmannyttan)
    1975  136 "Haninge"        9300   9.13777 15727.272 9.663152
    1975 1487 "Vänersborg"    2416  7.789868 15727.272 9.663152
    1975 2061 "Smedjebacken"   1577   7.36328 15727.272 9.663152
    1975 1907 "Surahammar"      650  6.476973 15727.272 9.663152
    1975 2183 "Bollnäs"       3177  8.063693 15727.272 9.663152
    1975 2326 "Berg"            166  5.111988 15727.272 9.663152
    1975 1490 "Borås"         9649  9.174609 15727.272 9.663152
    1975 1465 "Svenljunga"      324  5.780744 15727.272 9.663152
    1975 1489 "Alingsås"      2138  7.667626 15727.272 9.663152
    1975  682 "Nässjö"       2473  7.813187 15727.272 9.663152
    1975 2313 "Strömsund"      937  6.842683 15727.272 9.663152
    1975 1401 "Härryda"       1256  7.135687 15727.272 9.663152
    1975  685 "Vetlanda"        662  6.495265 15727.272 9.663152
    1975  880 "Kalmar"         2622  7.871693 15727.272 9.663152
    1975  763 "Tingsryd"        678  6.519147 15727.272 9.663152
    1975 2417 "Norsjö"          95  4.553877 15727.272 9.663152
    1975  821 "Högsby"         392  5.971262 15727.272 9.663152
    1975 2421 "Storuman"         92 4.5217886 15727.272 9.663152
    1975 1466 "Herrljunga"      350  5.857933 15727.272 9.663152
    1975 1494 "Lidköping"     1343  7.202661 15727.272 9.663152
    1975  187 "Vaxholm"         629  6.444131 15727.272 9.663152
    1975  512 "Ydre"             98 4.5849676 15727.272 9.663152
    1975  662 "Gislaved"       1603  7.379632 15727.272 9.663152
    1975  182 "Nacka"          6759   8.81863 15727.272 9.663152
    1975 1485 "Uddevalla"      4864  8.489616 15727.272 9.663152
    1975  160 "Täby"           690  6.536692 15727.272 9.663152
    1975 2321 "Åre"            163   5.09375 15727.272 9.663152
    1975 1081 "Ronneby"        2176  7.685244 15727.272 9.663152
    1975  580 "Linköping"    12177  9.407304 15727.272 9.663152
    1975 1214 "Svalöv"         756  6.628041 15727.272 9.663152
    1975 1498 "Tidaholm"        304  5.717028 15727.272 9.663152
    1975 1284 "Höganäs"      1239   7.12206 15727.272 9.663152
    1975 1760 "Storfors"        698  6.548219 15727.272 9.663152
    1975 1290 "Kristianstad"   4680  8.451054 15727.272 9.663152
    1975 1781 "Kristinehamn"   2223  7.706613 15727.272 9.663152
    1975 2085 "Ludvika"        3424  8.138565 15727.272 9.663152
    1975 1270 "Tomelilla"       216  5.375278 15727.272 9.663152
    1975 1265 "Sjöbo"           63 4.1431346 15727.272 9.663152
    1975 1883 "Karlskoga"      4033  8.302266 15727.272 9.663152
    1975 1492 "Åmål"          104  4.644391 15727.272 9.663152
    1975  184 "Solna"          4619  8.437934 15727.272 9.663152
    1975 1083 "Sölvesborg"     918  6.822197 15727.272 9.663152
    1975  330 "Knivsta"           .         . 15727.272 9.663152
    1975  125 "Ekerö"          337  5.820083 15727.272 9.663152
    1975  643 "Habo"            247  5.509388 15727.272 9.663152
    1975 2463 "Åsele"          168  5.123964 15727.272 9.663152
    1975  486 "Strängnäs"    1101  7.003974 15727.272 9.663152
    1975 2180 "Gävle"        12737  9.452267 15727.272 9.663152
    1975 1275 "Perstorp"        629  6.444131 15727.272 9.663152
    1975 1445 "Essunga"         223  5.407172 15727.272 9.663152
    1975 2583 "Haparanda"         0         . 15727.272 9.663152
    1975 1961 "Hallstahammar"  3185  8.066208 15727.272 9.663152
    1975  163 "Sollentuna"     5317  8.578665 15727.272 9.663152
    1975 1440 "Ale"            1311  7.178545 15727.272 9.663152
    1975  781 "Ljungby"         805  6.690842 15727.272 9.663152
    1975 2560 "Älvsbyn"        479    6.1717 15727.272 9.663152
    1975  123 "Järfälla"     4311  8.368925 15727.272 9.663152
    1975 2082 "Säter"          736   6.60123 15727.272 9.663152
    1975  683 "Värnamo"       1764  7.475339 15727.272 9.663152
    1975  586 "Mjölby"        2109  7.653969 15727.272 9.663152
    1975  428 "Vingåker"       879  6.778785 15727.272 9.663152
    1975 2034 "Orsa"            517  6.248043 15727.272 9.663152
    1975 1446 "Karlsborg"       385  5.953243 15727.272 9.663152
    1975 2132 "Nordanstig"      282  5.641907 15727.272 9.663152
    1975  882 "Oskarshamn"     2569  7.851272 15727.272 9.663152
    1975 1880 "Örebro"       19444  9.875294 15727.272 9.663152
    1975  562 "Finspång"      2418  7.790696 15727.272 9.663152
    1975  686 "Eksjö"         1239   7.12206 15727.272 9.663152
    1975  126 "Huddinge"      12468  9.430921 15727.272 9.663152
    1975 1780 "Karlstad"       8567  9.055673 15727.272 9.663152
    1975  461 "Gnesta"            0         . 15727.272 9.663152
    1975 2481 "Lycksele"        748  6.617403 15727.272 9.663152
    1975 1282 "Landskrona"     4283  8.362409 15727.272 9.663152
    1975 1415 "Stenungsund"    1313   7.18007 15727.272 9.663152
    1975 2418 "Malå"           177   5.17615 15727.272 9.663152
    1975  860 "Hultsfred"      1684  7.428927 15727.272 9.663152
    1975 1864 "Ljusnarsberg"    346  5.846439 15727.272 9.663152
    1975 1060 "Olofström"     2154  7.675082 15727.272 9.663152
    1975 2021 "Vansbro"         450  6.109248 15727.272 9.663152
    1975 2404 "Vindeln"           0         . 15727.272 9.663152
    1975 1499 "Falköping"     1612  7.385231 15727.272 9.663152
    1975 1493 "Mariestad"      1432  7.266828 15727.272 9.663152
    1975 1419 "Tjörn"          107 4.6728287 15727.272 9.663152
    1975 1443 "Bollebygd"         .         . 15727.272 9.663152
    1975 1984 "Arboga"            2  .6931472 15727.272 9.663152
    1975 1283 "Helsingborg"    7919   8.97702 15727.272 9.663152
    1975 1461 "Mellerud"        261   5.56452 15727.272 9.663152
    1975 1463 "Mark"           1950  7.575585 15727.272 9.663152
    1975 2580 "Luleå"         8318  9.026177 15727.272 9.663152
    1975 1291 "Simrishamn"      743  6.610696 15727.272 9.663152
    1975 1315 "Hylte"           359  5.883322 15727.272 9.663152
    1975 2462 "Vilhelmina"      507  6.228511 15727.272 9.663152
    1975 2023 "Malung"          601  6.398595 15727.272 9.663152
    1975 2031 "Rättvik"        236  5.463832 15727.272 9.663152
    1975  767 "Markaryd"        893  6.794587 15727.272 9.663152
    1975 2581 "Piteå"         2159  7.677401 15727.272 9.663152
    1975 2521 "Pajala"            0         . 15727.272 9.663152
    1975 1480 "Göteborg"     74253 11.215234 15727.272 9.663152
    1975 1272 "Bromölla"       370  5.913503 15727.272 9.663152
    1975 1230 "Staffanstorp"    588  6.376727 15727.272 9.663152
    end
    I deeply appreciate any help!

    Best,
    Mikaela

  • #2
    I want to plot over time, but I also want to include lines that show the time trend in certain municipalities where the number of houses increased most, decreased most, etc, to show the spread. (I can't include all because 290 lines will be to cluttered.)
    Your last statement above is not necessarily true in Stata today as you can set the opacity of a subset of lines. If you search for spaghetti plots in the forum, you will find plenty of examples of how to include all the series, highlighting only a few. Your data example is not useful as it contains only a single year, but here is an example using the Grunfeld dataset and xtline.

    Code:
    webuse grunfeld, clear
    xtset company year
    keep if company<6
    gen cname= "Company"+" "+string(company) if inlist(company, 1, 5)
    bys cname (year): replace cname="" if _n!=_N
    xtline invest, overlay plot2opts(lcolor(gs8%30)) ///
    plot3opts(lcolor(gs8%30)) plot4opts(lcolor(gs8%30)) ///
    addplot(scatter invest year, mcolor(none) mlab(cname)) ///
    leg(off) xsc(r(1957)) scheme(s1color)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	28.0 KB
ID:	1542897



    Unfortunately, I do not know of a way to impose the same color to a subset of lines, but specifying the identifiers, you can use a loop, e.g., if I wanted to do this for companies 2,3, 7 and 9 below

    Code:
    local opts ""
    foreach i of numlist 2 3 7 9{
    local opts "`opts' plot`i'opts(lcolor(gs8%30))"
    }
    di "`opts'"
    which results in

    Code:
    . di "`opts'"
     plot2opts(lcolor(gs8%30)) plot3opts(lcolor(gs8%30)) plot7opts(lcolor(gs8%30)) plot9opts(lcolor(gs8%30))
    In the graph code, I would then use

    Code:
    xtline invest, overlay `opts'


    Comment


    • #3
      That looks very neat, will try that, thank you very much Andrew !

      Comment


      • #4
        Actually no, Andrew I have tried to get this together for a while now and I can't make it work. One reason being that I'm not a very advanced stata user - I don't understand all of your code and can therefore not adapt it to my conditions which are different.

        I have about 15,000 rows in my dataset. My panel variable region includes 290 regions with data from 52 years. Therefore I don't think that I can give you a better data example - but please let me know if I can.

        Either I try to use this code you proposed, but then I need to know; how do I make all the group's graphs in grey? I.e. the 285 that I don't want to highlight.

        Would you have an example that's easier or better suited for a dataset with many subgroups? I just need to make a simple graph where chosen regions (i.e. certain values of "region_code" in the data) are plotted over time. I have read the xtline help file but didn't find anything.

        Does anyone else have an idea on how to make this work?

        Best,
        Mikaela

        Comment


        • #5
          I think it is possible to work with your data in #1. First, I would create a new identifier. I would recommend plotting all series in gray with 30% opacity and then specifying exclusions individually as you state that these are just but a few.

          Code:
          *CREATE NEW IDENTIFIER
          encode region_name, gen(region)
          *XTSET USING THIS IDENTIFIER
          xtset region year
          *CREATE LOCAL WITH OPTIONS
          levelsof region, local(rcodes)
          local opts ""
          foreach i of numlist `rcodes'{
                  local opts "`opts' plot`i'opts(lcolor(gs8%30))"
          }
          *CHECK THE IDENTIFIERS OF EXCLUDED REGIONS
          lab list region
          *ASSUME WE WANT TO HIGHLIGHT REGIONS 1, 5, 17, 24, 29
          gen rname= region_name if inlist(region, 1, 5, 17, 24, 29)
          bys rname (year): replace rname="" if _n!=_N
          *GRAPH
          xtline log_allmannyttan, overlay `opts'  plot1opts(lcolor(red)) plot5opts(lcolor(red)) ///
          plot17opts(lcolor(red)) plot24opts(lcolor(red)) plot29opts(lcolor(red)) ///
          addplot(scatter log_allmannyttan year, mcolor(none) mlab(rname)) ///
          leg(off) xsc(r(2021)) scheme(s1color)
          Above, I have highlighted the selected series in red, but you can specify different colors. The # in plot#opts corresponds directly to a region's identifier (given by the variable region in my code). I put -xsc(r(2021))- to allow the full label to be displayed, but increase or reduce this number if the space is either too small or too large. Finally, to determine which region is at the top (has the highest value of "log_allmannyttan") or bottom (lowest value), use the summarize command.

          Code:
          sum log_allmannyttan
          list region log_allmannyttan if log_allmannyttan==r(min)
          list region log_allmannyttan if log_allmannyttan==r(max)
          Last edited by Andrew Musau; 26 Mar 2020, 11:56.

          Comment


          • #6
            Hi,

            Thank you very much for this answer, I truly appreciate it. However it still doesn't work, probably because there are too many regions (290).

            I adapted the code to fit the regions I want to highlight (I have changed transformation method from log to IHS (inverse hyperbolic sine) in the variable that's why the varname has changed). But still the graph doesn't look anything like your example - it is just a clutter of lines in different colors and no labels.

            When I clear/ reopen the data and run the code all together from the start, it gives error message "too many options".


            [P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 1003
            too many options
            The number of options specified exceeded 256. You cannot
            exceed this maximum.

            So I guess that is the problem then. Is there any way that I could remove the regions that I don't want to highlight? as a substitute I could use addplot and add a graph for another variable, in which I have calculated the population weighted mean of the regions for each year.

            Here is the code I used:

            Code:
            *CREATE NEW IDENTIFIER
            encode region_name, gen(region)
            *XTSET USING THIS IDENTIFIER
            xtset region year
            *CREATE LOCAL WITH OPTIONS
            levelsof region, local(rcodes)
            local opts ""
            foreach i of numlist `rcodes'{
                    local opts "`opts' plot`i'opts(lcolor(gs8%30))"
            }
            *CHECK THE IDENTIFIERS OF EXCLUDED REGIONS
            lab list region
            
            *WE WANT TO HIGHLIGHT REGIONS 
            
            * NACKA = 152
            * BORGHOLM = 18
            * ENKÖPING = 35
            * GÖTEBORG = 58
            * TIERP = 223
            * ANEBY = 4
            * STOCKHOLM = 200
            
            gen rname= region_name if inlist(region, 4, 18, 35, 58, 152, 200, 223)
            bys rname (year): replace rname="" if _n!=_N
            
            
            *GRAPH
            xtline IHS_allmannyttan_owner, overlay `opts'  plot4opts(lcolor(red)) plot18opts(lcolor(red)) ///
            plot35opts(lcolor(red)) plot58opts(lcolor(red)) plot152opts(lcolor(red)) /// 
            plot200opts(lcolor(red)) plot223opts(lcolor(red)) ///
            addplot(scatter IHS_allmannyttan_owner year, mcolor(none) mlab(rname)) ///
            leg(off) xsc(r(2021)) scheme(s1color)


            /Mikaela

            Comment


            • #7

              [P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 1003
              too many options
              The number of options specified exceeded 256. You cannot
              exceed this maximum.
              OK, there is a limit to how many options you can specify in the graph. You then will need to plot only a subset of the series (e.g., 240). So that the line numbers correspond to your identifiers, I will suggest that you first preserve the dataset, drop some regions, graph and then restore.

              as a substitute I could use addplot and add a graph for another variable, in which I have calculated the population weighted mean of the regions for each year.
              If you do this, you need to make sure that the range of the axes for this graph are not far from those of your series. Otherwise, it does not make sense to have both graphs combined. You now also cannot include the region labels using the marker labels as I show in my code. You will have to use the -text()- option or have a legend.

              Code:
              *PRESERVE DATASET
              preserve
              *DROP SOME REGIONS (MAYBE ABOUT 50)
              drop if ...
              
              *CREATE NEW IDENTIFIER
              encode region_name, gen(region)
              *XTSET USING THIS IDENTIFIER
              xtset region year
              *CREATE LOCAL WITH OPTIONS
              levelsof region, local(rcodes)
              local opts ""
              foreach i of numlist `rcodes'{
                      local opts "`opts' plot`i'opts(lcolor(gs8%30))"
              }
              *CHECK THE IDENTIFIERS OF EXCLUDED REGIONS
              lab list region
              *ASSUME WE WANT TO HIGHLIGHT REGIONS 1, 5, 17, 24, 29
              
              *GRAPH
              xtline IHS_allmannyttan_owner, overlay `opts'  plot4opts(lcolor(red)) plot18opts(lcolor(red)) ///
              plot35opts(lcolor(red)) plot58opts(lcolor(red)) plot152opts(lcolor(red)) ///
              plot200opts(lcolor(red)) plot223opts(lcolor(red)) ///
              addplot(line IHS_allmannyttan_owner year, lcolor(blue)) ///
              leg(order(1 "Region 1" 5 "Region 5")) xsc(r(2021)) scheme(s1color)
              
              *RESTORE DATA
              restore
              I think you want a line plot for the added series and not a scatter plot. You can now add all the selected regions in the legend, but now you probably need different colors. Note that the region identifiers now change once you drop some regions.

              Comment


              • #8
                Just try

                Code:
                line IHS year,  c(L) lc(gs8%30)
                and then superimpose selected regions.

                Comment

                Working...
                X