Plot a graph

Paris Rira

Join Date: Dec 2022
Posts: 383

27 May 2024, 23:28

Dear Profs and colleagues,

I am going to plot this graph: occupation skill is sk_rat_quartile in 4 groups. Could you please assist me to plot it.

Click image for larger version

Name: graph.png
Views: 1
Size: 62.2 KB
ID: 1754675

Code:

 tab sk_rat_quartile

4 quantiles |
of sk_ratio |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         80       25.00       25.00
          2 |         80       25.00       50.00
          3 |         80       25.00       75.00
          4 |         80       25.00      100.00
------------+-----------------------------------
      Total |        320      100.00



* Example generated by -dataex-. For more info, type help dataex
clear
input double(year immigrant) float native byte sk_rat_quartile
2013 1059 23060 1
2012 1104 23482 1
2014  999 24646 1
2015 1066 26193 1
2016 1276 28540 1
2011 1701 29488 1
2017 1458 32441 1
2010 2170 32992 1
2019 3209 32668 1
2018 2292 34049 1
2013 1120 15231 2
2012 1150 15218 2
2014 1247 16758 2
2011 1484 17692 2
2015 1501 17852 2
2010 1630 18247 2
2016 1562 19657 2
2017 1913 21684 2
2018 2432 22921 2
2019 3298 23724 2
2013  915 16680 3
2012 1053 17182 3
2014  933 18030 3
2015 1025 19935 3
2011 1296 20118 3
2010 1387 20767 3
2016 1168 21380 3
2017 1396 24060 3
2018 1941 25975 3
2019 2716 27139 3
2013  399 13818 4
2012  346 14010 4
2014  535 15494 4
2011  394 16119 4
2015  579 15992 4
2010  348 16577 4
2016  914 16872 4
2017 1117 17963 4
2018 1400 19486 4
2019 1646 20789 4
2014 2162 39776 1
2015 2257 39816 1
2013 2252 40300 1
2016 2855 41260 1
2012 2490 42716 1
2017 3361 43908 1
2018 4373 45524 1
2019 6277 44585 1
2011 3415 51246 1
2010 4081 56330 1
2013 2167 26351 2
2014 2371 26881 2
2012 2331 27278 2
2015 2669 27261 2
2016 3041 28429 2
2017 3490 29720 2
2011 2901 30423 2
2010 2805 30697 2
2018 4407 30405 2
2019 5884 30817 2
2014 1692 28236 3
2015 1834 28479 3
2013 1693 28745 3
2016 2046 29216 3
2012 1699 30589 3
2017 2307 30450 3
2018 2877 31326 3
2010 2132 33492 3
2019 3886 31799 3
2011 2301 34093 3
2015 1008 26800 4
2016 1287 26656 4
2017 1598 27396 4
2014  922 28380 4
2013  780 28775 4
2018 2011 28459 4
2012  767 31001 4
2019 2594 29419 4
2010  774 35558 4
2011  878 35588 4
2016 3331 51943 1
2015 3021 52589 1
2017 3745 52073 1
2018 4949 51578 1
2019 7033 49765 1
2014 2980 53987 1
2013 3189 54651 1
2012 3572 56585 1
2011 5107 64083 1
2010 6489 67491 1
2015 2655 32248 2
2016 3025 32363 2
2014 2525 32911 2
2013 2519 33037 2
2017 3610 32591 2
2012 2657 33707 2
2018 4554 32700 2
2019 5910 32603 2
2010 3375 35988 2
2011 3332 36250 2
end

Cheers,
Paris

Tags: None

ericmelse

Join Date: May 2014

Posts: 420
#2

31 May 2024, 12:17

Dear Paris,

Most likely, the user contributed package dstat, by Ben Jann, should be of use to you. To install that on your system, use:

Code:

ssc install dstat, replace which dstat // to check the installed version h dstat // to consult the help file

But, I am rather uncertain about what you want to get as a result comparing your example data with the example graph.
So, I just created 'something' using your example data as to show you the method of drawing distributions over a group (i.e. binary or category), like:

Code:

dstat density immigrant, over(sk_rat_quartile) total unconditional dstat graph , merge p1(lc(styellow) ciopts(color(styellow%30))) /// p2(lc(stblue%50) ciopts(color(stblue%20))) /// p3(lc(stred%40) ciopts(color(stred%20))) /// p4(lc(lavender%90) ciopts(color(lavender%40))) legend(span col(3) symx(8pt) region(lc(none)))

which results in (many options to control how Stata draws such a plot/graph are available but not used here):

I expect that the above is not what you need but I do hope it will get you going.

http://publicationslist.org/eric.melse
1 like
Comment

Paris Rira

Join Date: Dec 2022
Posts: 383

31 May 2024, 12:37

Dear Prof Eric,

Thank you so much for getting back to me and providing such practical commands. I updated moremata, though it still shows an error.

Code:

. ssc install moremata,replace
checking moremata consistency and verifying not already installed...
all files already exist and are up to date.

. do "C:\Users\35193\AppData\Local\Temp\STD4f80_000000.tmp"

.            dstat density immigrant, over(sk_rat_quartile) total unconditional 
moremata version 2.0.1 or newer is required; type ssc install moremata, replace
(error occurred while loading dstat.ado)
r(499);

end of do-file

Comment

ericmelse

Join Date: May 2014

Posts: 420
#4

31 May 2024, 22:19

Yes, indeed, moremata is required (also by Ben Jann).
I am not sure why the ssc install fails you here, but, as per his Github webpage instruction, try a direct installation using this code:

Code:

. net install moremata, replace from(https://raw.githubusercontent.com/benjann/moremata/master/)

http://publicationslist.org/eric.melse
Comment

Paris Rira

Join Date: Dec 2022
Posts: 383

01 Jun 2024, 17:40

Click image for larger version

Name: ciof.png
Views: 1
Size: 79.1 KB
ID: 1755133

Thank you Prof Erice, the link just worked.
What I don't understand is the interpretation of the figures, although I studied "dstat: A new command for the analysis of distributions Ben Jann". According to the data, pct_immigrant always has small values, while the fig shows that initially, its density is high.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(pct_immigrant pct_native) byte sk_rat
2.2070253 97.79298 1
2.2156487 97.78435 1
2.6842105 97.31579 1
4.0485873 95.95142 1
 2.407159 97.59284 1
 5.497675 94.50233 1
2.4140754 97.58592 1
 2.879492 97.12051 1
 1.357658 98.64234 1
1.2121212 98.78788 1
 6.233178 93.76682 1
 .7451391 99.25486 1
 16.45048 83.54952 1
 2.730811 97.26919 1
 3.512245 96.48775 1
 3.519114 96.48089 1
 3.743141 96.25686 1
 21.20336 78.79664 1
  3.09888 96.90112 1
 2.416295  97.5837 1
 9.131918 90.86808 1
3.3759124 96.62408 1
 7.980505 92.01949 1
4.5368295 95.46317 1
 7.231876 92.76812 1
 4.908836 95.09116 1
2.2998571 97.70014 1
 2.249422 97.75058 1
1.9372234 98.06277 1
 4.612546 95.38745 1
 8.208304 91.79169 1
13.745486 86.25452 1
1.4689105 98.53109 1
 5.678827 94.32117 1
 9.230382 90.76962 1
 40.93327 59.06673 1
 5.935473 94.06453 1
 4.070888 95.92912 1
 3.376372 96.62363 1
 3.310672 96.68933 1
 7.140886 92.85912 1
 8.357906 91.64209 1
2.3848684 97.61514 1
 .7202426 99.27975 1
 9.241383 90.75861 1
 6.329555 93.67045 1
 2.973108 97.02689 1
 3.730149 96.26985 1
 2.912067 97.08794 1
  2.67957 97.32043 1
 11.67745 88.32255 1
  5.09839 94.90161 1
2.2716646 97.72833 1
19.806076 80.19392 1
1.4549234 98.54507 1
 7.785244 92.21475 1
 1.856308 98.14369 1
 3.369673 96.63033 1
 2.685674 97.31432 1
 1.974059 98.02594 1
2.9416275 97.05837 1
3.8885214 96.11148 1
  4.69465 95.30535 1
16.736898  83.2631 1
 .8475581 99.15244 1
  .563024 99.43697 1
 4.672007   95.328 1
 2.774783 97.22522 1
2.9001815 97.09982 1
 4.073027 95.92697 1
  2.85219 97.14781 1
 6.608521 93.39148 1
 3.612582 96.38742 1
 3.492021 96.50798 1
 3.543169 96.45683 1
 3.347238 96.65276 1
4.3177094 95.68229 1
 8.046893 91.95311 1
 1.254689 98.74531 1
 20.19231 79.80769 1
 .7648485 99.23515 1
 7.684825 92.31518 2
 8.174962 91.82504 2
 25.76885 74.23115 2
 .9093238 99.09068 2
 1.659346 98.34065 2
 5.596814 94.40318 2
 4.640693 95.35931 2
  8.64712 91.35288 2
 2.900886 97.09911 2
 1.567398  98.4326 2
13.213993 86.78601 2
1.4918625 98.50814 2
 .6221637 99.37784 2
 4.900548 95.09945 2
 12.75556 87.24444 2
1.2009196 98.79908 2
1.6459594 98.35404 2
1.4625944 98.53741 2
11.348684 88.65131 2
end

Moreover, pct_immigrant pct_native, are in percent

Code:

(gen total = immigrant + native
gen pct_immigrant = (immigrant / total)*100
gen pct_native = (native / total)*100

So the summation should be 100. I can't figure it out through the figs.

Comment

ericmelse

Join Date: May 2014

Posts: 420
#6

03 Jun 2024, 08:36

Dear Paris,

To explain how to calculate the 'area under the distribution curve', we first have to install 'yet more matrix commands' that come with the user-community contributed package dm79 from Nicholas J. Cox, University of Durham, UK.
In your do file or in the command window, use this code:

Code:

. net describe dm79, from(http://www.stata.com/stb/stb56)

and select the option: (click here to install).
Should you also want to save the ANCILLARY FILES by using: (click here to get), you better first set your working folder to a location where you can save documentation and the do file, like (or any other location on your system):

Code:

. cd "C:/D_Stata18/Tutorial Matrix commands"

So, we continue with a most simple example using Ben Jann's dstat (instead of Stata's alternative kdensity):

Code:

. dstat density pct_immigrant, graph . graph export "Case_plot_a_graph.png", width(1200) as(png) replace // Change path as required

which results in:

The x and y values that are used to create the above plot (graph) are the result values of the distribution analysis and saved by dstat in the estimates matrix e(b).
We use this to create a matrix to work with by this code:

Code:

. mat DS = e(b)' // create a matrix in memory by inverting the result estimates matrix . mat list DS // inspect the results DS[99,1] y1 -2.059781 .00022767 -1.594313 .00111041 -1.128845 .00418432 -.6633768 .01231275 -.1979089 .02871256 .267559 .05416803 .7330269 .08507804 1.198495 .11546605 ... * etc.

As you can see in the Stata result window, what is a little problematic now is that the row labels of this matrix (used for the x-axis labels) are the values that we need for our x variable.
To 'grab' these labels as values of our new variable we use the command svmat2 from the package dm79.
With that we create two new variables (which I call dens_y & dens_x) so we can calculate the 'area under the distribution' with the Stata command integ (dydx and integ calculate derivatives and integrals of numeric "functions"):

Code:

svmat2 double DS, name(dens_y) r(dens_x) // create the variables destring dens_x , replace // destring the values integ dens_r scale_r // calculate the numeric integral

As you should see in the Stata result window, the result sum value is:

Code:

number of points = 99 integral = .99991055

which is pretty close to 1.

I suppose that with the above you should be able to extend it to calculating the same for groups within a certain distribution.
As such it does not matter if you inspect the distribution of measurements on any scale or their transformed values, like percentages (but be careful with the wording of your interpretation).

There is a voluminous literature on this subject. Have a look at the References in the dstat help file.
One book that you could consider borrowing from a library (or acquire) is: Handcock, M. S., & Morris, M. (1999). Relative distribution methods in the social sciences. Springer Science & Business Media.
A brief lemma on the Relative Distribution Method by Mark S Handcock can be downloaded from here are the UCLA Department of Statistics.
Actually, Mark Handcock is at UCLA and his website also has more on this and related subjects.
Also note the website of his co-author Martina Morris.
Mark Handcock wrote with Eric Mark Aldrich a paper about the implementation of his methods in R: Applying Relative Distribution Methods in R (December 2002). University of Washington Working Paper No. 27, Available at SSRN: https://ssrn.com/abstract=1515775 or http://dx.doi.org/10.2139/ssrn.1515775
All materials and code related to his book are available from his Github website.
This just as a source of inspiration!

http://publicationslist.org/eric.melse
1 like
Comment
Paris Rira

Join Date: Dec 2022

Posts: 383
#7

03 Jun 2024, 18:09

Thank you so much, Prof Eric, for the crystal clear explanation. Really appreciated it.
Comment

Announcement

Plot a graph

Comment

Comment

Comment

Comment

Comment

Comment