Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Format markers and x-label on scatterplot

    Hi. The code for my twoway scatterplot generates the graph below. I want to make the following changes to the graph:
    1. change the markers to white with black outline
    2. have the X-axis label show the years (2002, 2003, 2004) with smaller markers for the months in between each year marker
    3. remove the legend
    Any help will be much appreciated


    Code:
    set scheme s1color
     
    twoway (scatter one time), title("Teleconsults trend") subtitle("(2022-2024)") ///
    ytitle("Number of teleconsults") yscale(range(0 .)) ylabel(#6, labsize(small) ///
    angle(horizontal) nogrid) ///
     xtitle(Year) graphregion(fcolor(white)) || lfit one time
    
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(year month) long one float time
    2022  1 2233  1
    2022  2 2224  2
    2022  3 2326  3
    2022  4 2052  4
    2022  5 2059  5
    2022  6 1656  6
    2022  7 1574  7
    2022  8 1386  8
    2022  9 1361  9
    2022 10 1071 10
    2022 11 1154 11
    2022 12 1109 12
    2023  1 1082 13
    2023  2 1435 14
    2023  3 1447 15
    2023  4 1172 16
    2023  5 1176 17
    2023  6 1209 18
    2023  7 1289 19
    2023  8 1759 20
    2023  9 1480 21
    2023 10 1311 22
    2023 11 1186 23
    2023 12 1208 24
    2024  1 1345 25
    2024  2 1277 26
    2024  3 1185 27
    2024  4 1257 28
    2024  5 1209 29
    2024  6 1405 30
    2024  7 1491 31
    Attached Files

  • #2
    1 and 3 are easy details well documented.

    2 is more challenging. I've not done quite what I think you're asking, but you can adapt this. See for the main idea

    https://www.stata-journal.com/articl...article=gr0030

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(year month) long one float time
    2022  1 2233  1
    2022  2 2224  2
    2022  3 2326  3
    2022  4 2052  4
    2022  5 2059  5
    2022  6 1656  6
    2022  7 1574  7
    2022  8 1386  8
    2022  9 1361  9
    2022 10 1071 10
    2022 11 1154 11
    2022 12 1109 12
    2023  1 1082 13
    2023  2 1435 14
    2023  3 1447 15
    2023  4 1172 16
    2023  5 1176 17
    2023  6 1209 18
    2023  7 1289 19
    2023  8 1759 20
    2023  9 1480 21
    2023 10 1311 22
    2023 11 1186 23
    2023 12 1208 24
    2024  1 1345 25
    2024  2 1277 26
    2024  3 1185 27
    2024  4 1257 28
    2024  5 1209 29
    2024  6 1405 30
    2024  7 1491 31
    end 
    
    gen mdate = ym(year, month)
    
    set scheme s1color
    
    scatter one mdate, title("Teleconsults trend") subtitle("(2022-2024)") ///
    ytitle("Number of teleconsults") yscale(range(0 .)) ylabel(#6, labsize(small) ///
    angle(horizontal) nogrid) ms(Oh) mc(black) ///
    xtick(`=ym(2021, 12) + 0.5' `=ym(2022,12) + 0.5' `=ym(2023, 12) + 0.5',  tlength(*3)) ///
    xla(`=ym(2022, 6) + 0.5' "2022"  `=ym(2023, 6) + 0.5' "2023" `=ym(2024, 4)' "2024", noticks) ///
     xtitle("") graphregion(fcolor(white)) || lfit one mdate, legend(off) xli(`=ym(2022,12) + 0.5' `=ym(2023, 12) + 0.5', lc(gs12))
    Click image for larger version

Name:	rick.png
Views:	1
Size:	54.0 KB
ID:	1764657

    Comment


    • #3
      Nick Cox : Thanks a lot. That worked very well. If I may - could you please explain this bit?
      Code:
        
       xtick(`=ym(2021, 12) + 0.5' `=ym(2022,12) + 0.5' `=ym(2023, 12) + 0.5',  tlength(*3)) /// xla(`=ym(2022, 6) + 0.5' "2022"  `=ym(2023, 6) + 0.5' "2023" `=ym(2024, 4)' "2024", noticks) ///
      I'm adding on more data: 6 months for 2020 and 12 months for 2021, and if I use the same code I get the graph below. I'm trying to figure how to modify the code to allow for this additional data and understanding it would be very helpful. Thanks again!

      Code:
      Updated data:
      
      
      input float(year month) long one
      2020  7 1585
      2020  8 2460
      2020  9 2778
      2020 10 3284
      2020 11 3280
      2020 12 3262
      2021  1 2688
      2021  2 2849
      2021  3 2806
      2021  4 2517
      2021  5 4012
      2021  6 2665
      2021  7 2556
      2021  8 2618
      2021  9 2693
      2021 10 2247
      2021 11 2028
      2021 12 2217
      2022  1 2233
      2022  2 2224
      2022  3 2326
      2022  4 2052
      2022  5 2059
      2022  6 1656
      2022  7 1574
      2022  8 1386
      2022  9 1361
      2022 10 1071
      2022 11 1154
      2022 12 1109
      2023  1 1082
      2023  2 1435
      2023  3 1447
      2023  4 1172
      2023  5 1176
      2023  6 1209
      2023  7 1289
      2023  8 1759
      2023  9 1480
      2023 10 1311
      2023 11 1186
      2023 12 1208
      2024  1 1344
      2024  2 1277
      2024  3 1185
      2024  4 1257
      2024  5 1209
      2024  6 1405
      2024  7 1490
      Click image for larger version

Name:	Graph.png
Views:	1
Size:	157.2 KB
ID:	1764781

      Comment


      • #4
        There is nothing subtle here -- just fiddly calculations to work out where to put the labels and the ticks. Did you read the linked paper in #2?

        The year labels go in the middle of each year's observations, at a particular month if the number of observations is odd or between two months otherwise.

        The ticks go at the end of each year's observations -- so halfway between the December and the January markers -- and I add lines at the same positions.


        Code:
        clear 
        
        input float(year month) long one
        2020  7 1585
        2020  8 2460
        2020  9 2778
        2020 10 3284
        2020 11 3280
        2020 12 3262
        2021  1 2688
        2021  2 2849
        2021  3 2806
        2021  4 2517
        2021  5 4012
        2021  6 2665
        2021  7 2556
        2021  8 2618
        2021  9 2693
        2021 10 2247
        2021 11 2028
        2021 12 2217
        2022  1 2233
        2022  2 2224
        2022  3 2326
        2022  4 2052
        2022  5 2059
        2022  6 1656
        2022  7 1574
        2022  8 1386
        2022  9 1361
        2022 10 1071
        2022 11 1154
        2022 12 1109
        2023  1 1082
        2023  2 1435
        2023  3 1447
        2023  4 1172
        2023  5 1176
        2023  6 1209
        2023  7 1289
        2023  8 1759
        2023  9 1480
        2023 10 1311
        2023 11 1186
        2023 12 1208
        2024  1 1344
        2024  2 1277
        2024  3 1185
        2024  4 1257
        2024  5 1209
        2024  6 1405
        2024  7 1490
        end 
        
        
        gen mdate = ym(year, month)
        
        set scheme s1color
        
        * end year ticks 
        forval y = 2020/2023 {
            local tickpos = ym(`y', 12) + 0.5 
            local tickcall  `tickcall' `tickpos'
        }
        
        * year labels 
        local labelpos = ym(2020, 9) + 0.5 
        local labelcall `labelcall' `labelpos' "2020"
        
        forval y = 2021/2023 { 
            local labelpos = ym(`y', 6) + 0.5 
            local labelcall `labelcall' `labelpos' "`y'"
        }
        
        local labelpos = ym(2024, 4)
        local labelcall `labelcall' `labelpos' "2024"
        
        mac li 
        
        scatter one mdate, title("Teleconsults trend") subtitle("(2020-2024)") ///
        ytitle("Number of teleconsults") yscale(range(0 .)) ylabel(#6, labsize(small) ///
        angle(horizontal) nogrid) ms(Oh) mc(black) ///
        xtick(`tickcall',  tlength(*3)) ///
        xla(`labelcall', noticks) ///
         xtitle("") graphregion(fcolor(white)) || lfit one mdate, legend(off) xli(`tickcall', lc(gs12))

        Here are the relevant local macros:


        Code:
        _labelcall:     728.5 "2020" 737.5 "2021" 749.5 "2022" 761.5 "2023" 771 "2024"
        
        _tickcall:      731.5 743.5 755.5 767.5
        Sometimes it's easier just to work them out by hand and with mental arithmetic.

        PS The trend doesn't look very linear to me.

        Click image for larger version

Name:	rick2.png
Views:	2
Size:	73.6 KB
ID:	1764794
        Attached Files

        Comment


        • #5
          My impression is that many Stata users underestimate the scope for the display command to do little calculations on the fly. So, I would start with

          * What's halfway between June and July 2021?

          Code:
          . di ym(2021, 6) + 0.5
          737.5
          * 7 months' data for 2024, so plot at April 2024?

          Code:
          . di ym(2024, 4)
          771
          Here's a way to do it otherwise: Loop over the years from first to last, and find the mean monthly date for each year and the last monthly date for each year except the last.

          Code:
          clear 
          
          input float(year month) long one
          2020  7 1585
          2020  8 2460
          2020  9 2778
          2020 10 3284
          2020 11 3280
          2020 12 3262
          2021  1 2688
          2021  2 2849
          2021  3 2806
          2021  4 2517
          2021  5 4012
          2021  6 2665
          2021  7 2556
          2021  8 2618
          2021  9 2693
          2021 10 2247
          2021 11 2028
          2021 12 2217
          2022  1 2233
          2022  2 2224
          2022  3 2326
          2022  4 2052
          2022  5 2059
          2022  6 1656
          2022  7 1574
          2022  8 1386
          2022  9 1361
          2022 10 1071
          2022 11 1154
          2022 12 1109
          2023  1 1082
          2023  2 1435
          2023  3 1447
          2023  4 1172
          2023  5 1176
          2023  6 1209
          2023  7 1289
          2023  8 1759
          2023  9 1480
          2023 10 1311
          2023 11 1186
          2023 12 1208
          2024  1 1344
          2024  2 1277
          2024  3 1185
          2024  4 1257
          2024  5 1209
          2024  6 1405
          2024  7 1490
          end 
          
          
          gen mdate = ym(year, month)
          
          set scheme s1color
          
          su year, meanonly 
          local first = r(min)
          local last = r(max)
          
          forval y = `first'/`last' { 
              su mdate if year == `y', meanonly 
              local labelcall `labelcall' `r(mean)' "`y'"
              local tickpos = r(max) + 0.5 
              if `y' < `last' local tickcall `tickcall' `tickpos'
          }
          
          mac li 
          
          scatter one mdate, title("Teleconsults trend") subtitle("(2020-2024)") ///
          ytitle("Number of teleconsults") yscale(range(0 .)) ylabel(#6, labsize(small) ///
          angle(horizontal) nogrid) ms(Oh) mc(black) ///
          xtick(`tickcall',  tlength(*3)) ///
          xla(`labelcall', noticks) ///
           xtitle("") graphregion(fcolor(white)) || lfit one mdate, legend(off) xli(`tickcall', lc(gs12))

          Comment


          • #6
            Thanks a lot Nick Cox. You are right, it didn't strike me to use the display command!

            Comment


            • #7
              Meanwhile, I pushed your data through localp from SSC, which is just a wrapper for lpoly.

              https://www.statalist.org/forums/for...ial-regression

              The code is the same as my last except for

              Code:
              localp one mdate, title("Teleconsults trend") subtitle("(2020-2024)") ///
              ytitle("Number of teleconsults") yscale(range(0 .)) ylabel(#6, labsize(small) ///
              angle(horizontal) nogrid) ms(Oh) mc(black) ///
              xtick(`tickcall',  tlength(*3)) ///
              xla(`labelcall', noticks) ///
               xtitle("") graphregion(fcolor(white)) xli(`tickcall', lc(gs12))
              There is always scope for different taste and judgment, but this result supports any impression from eyeballing the data that the curve is levelling off.

              Click image for larger version

Name:	rick3.png
Views:	1
Size:	38.6 KB
ID:	1764921

              Comment


              • #8
                Nick Cox Thanks, Nick, this is great! One quick question - I ran your code without the title, since I don't want it. I now get the graph below with the R2 and RMSE in place of the title. Is there anyway for me to remove that, as well as the kernel/degree details at the bottom of the graph?


                Click image for larger version

Name:	trendline.png
Views:	1
Size:	87.0 KB
ID:	1765415

                Comment


                • #9
                  localp is just a wrapper for lpoly. In addition to a title() it is making use of subtitle() note().

                  My main point as programmer is that at best what localp shows is one smooth, specified by particular choices. Even if you or your readers don't care how that was done, it's a retrograde step just to present it as a smooth curve.

                  Comment

                  Working...
                  X