Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Histogram in stata

    Hi.

    I have a question:
    How do I make a histogram with a variable on the x-axis and a variable on the y-axis?
    When I type
    histogram mean_ci gestationsdage
    stata says "too many variables specified".

    When I make a scatterplot instead, there are no problem.
    I hope someone could help!

    Regards

  • #2
    What would that look like?

    A perspective view of a two-dimensional array of bars?

    A heat map shaded according to frequencies or on some other scale?

    Histograms along axes and something else in the main plot region?

    It's probably best to give us an example of your data, and then we can suggest graphics.

    Please do read and act on https://www.statalist.org/forums/help#stata

    Last edited by Nick Cox; 16 Apr 2018, 04:00.

    Comment


    • #3
      I would like to illustrate the distribution of some cardiac measurements in different gestational ages. On the x-axis I want gestational age, and on the y-axis I want the cardiac measurements. Both variables are continious variables.
      Thanks.

      Comment


      • #4
        Sorry, but "illustrate the distribution" is too vague for me to offer precise advice. Or rather I could add lots of precise advice but most of it would miss your target.

        Do you mean that gestational age could be 12.345678 weeks?

        As before, please give an example of data. If your data are confidential, fake data with the same form are fine. Just add some random numbers to your real data.

        Comment


        • #5
          Just a side note, after Nick's advice.

          I would like to illustrate the distribution of some cardiac measurements in different gestational ages. On the x-axis I want gestational age, and on the y-axis I want the cardiac measurements. Both variables are continious variables.
          Please check the examples in the Stata Manual, after typing - help histogram - for that matter. That said, if you have two continuous variables and you wish to "illustrate" the relationship between both, I fear histograms are not that best approach. What is more, dutifully enough, the command - histogram - wouldn't allow for more than one variable at a time, and that is the reason Stata gives the error message "too many variables specified".
          Best regards,

          Marcos

          Comment


          • #6
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input int gestationsdage float(mean_svr mean_co)
            280 1009.38431 7.0147291
            280 1113.97426 6.4385723
            280 1290.25242 5.9834751
            271 923.85481 6.9160694
            271 998.0857 7.2956475
            285 891.11355 7.0701717
            285 1159.87458 5.9019181
            285 1285.2488 5.5834116
            274 619.90545 9.316148
            274 894.50551 9.1746449
            274 872.97065 9.2963243
            284 930.08871 7.542786
            287 1112.14076 5.4396524
            287 1312.0278 5.4251245
            287 1197.26883 5.6842894
            292 1061.6838 6.6624285
            292 1158.70486 6.0765824
            292 1274.70582 5.7765065
            292 1407.1181 5.342857
            292 1127.68824 6.1843597
            292 1042.2873 7.1423317
            292 1165.33867 6.364958
            284 1011.60871 7.4328362
            284 1052.18565 6.7220922
            284 1148.29635 6.6285944
            284 1261.326767 6.3857242
            278 1026.86268 6.4684452
            278 1097.15367 6.284096
            278 1101.57865 6.498024
            273 1124.61856 6.9270183
            273 1028.93602 7.5955082
            273 1182.246656 6.5343572
            287 1006.91069 7.3063867
            287 993.23862 7.378155
            287 973.72361 7.57230334
            277 908.3795 6.7880173
            277 1022.31063 6.5468634
            277 1031.52064 6.622535
            269 940.43641 6.4184656
            269 1080.17698 6.6051416
            269 1152.09645 6.0557436
            286 961.32604 7.0660438
            286 991.79545 7.3831628
            286 1121.92235 6.8863119
            286 1109.81932 7.4125111
            end
            [/CODE]

            Comment


            • #7

              Thanks Marcos. Unfortunately, I have been asked to illustrate it with a histogram...

              Comment


              • #8
                OK, so your gestational age is in days, presumably, but what you're showing here are already means of some kind. Sorry, yet more questions, but you aren't giving much away:

                1. What's the total size of the dataset?

                2. Do you want distributions for each distinct age (#days), or are you willing to bin?

                3. What is the range of gestational age in your complete dataset?

                4. What are your means over? Replicate measurements for each individual baby? Or something else?

                I am not a clinician or a medical statistician, but I assuming that this is all about distributions conditional on gestational age, so that is the focus rather than the joint distribution.

                Comment


                • #9
                  Yes, gestational age are in days.
                  1: 346 persons are included in the dataset.
                  2: willing to bin
                  3: 238 days to 316 days
                  4: The mean_svr and mean_co are the mean over replicate heart function measurements of the mother.

                  Comment


                  • #10
                    OK. Thanks. So you have about 8 times more data than you show us. I am inclined to regard this as a matter of smoothly changing distribution rather than binning.

                    Here is an adaptation of an example in the help of rangestat (SSC). The idea is to show moving quantiles and the data too.

                    It is adaptable in that you can tune which quantiles and which window width you choose. The example shows 7 day windows.

                    It is not smart about weighting within windows (meaning, there is none).

                    In broad terms, the scatter here is necessarily quite large given other uncontrolled variables. No one should believe in the dip from 273 to 274 (and that age is an estimate in most cases, presumably??).

                    With more data, the results should be less irregular, but you may still need to smooth more.

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input int gestationsdage float(mean_svr mean_co)
                    280 1009.38431 7.0147291
                    280 1113.97426 6.4385723
                    280 1290.25242 5.9834751
                    271 923.85481 6.9160694
                    271 998.0857 7.2956475
                    285 891.11355 7.0701717
                    285 1159.87458 5.9019181
                    285 1285.2488 5.5834116
                    274 619.90545 9.316148
                    274 894.50551 9.1746449
                    274 872.97065 9.2963243
                    284 930.08871 7.542786
                    287 1112.14076 5.4396524
                    287 1312.0278 5.4251245
                    287 1197.26883 5.6842894
                    292 1061.6838 6.6624285
                    292 1158.70486 6.0765824
                    292 1274.70582 5.7765065
                    292 1407.1181 5.342857
                    292 1127.68824 6.1843597
                    292 1042.2873 7.1423317
                    292 1165.33867 6.364958
                    284 1011.60871 7.4328362
                    284 1052.18565 6.7220922
                    284 1148.29635 6.6285944
                    284 1261.326767 6.3857242
                    278 1026.86268 6.4684452
                    278 1097.15367 6.284096
                    278 1101.57865 6.498024
                    273 1124.61856 6.9270183
                    273 1028.93602 7.5955082
                    273 1182.246656 6.5343572
                    287 1006.91069 7.3063867
                    287 993.23862 7.378155
                    287 973.72361 7.57230334
                    277 908.3795 6.7880173
                    277 1022.31063 6.5468634
                    277 1031.52064 6.622535
                    269 940.43641 6.4184656
                    269 1080.17698 6.6051416
                    269 1152.09645 6.0557436
                    286 961.32604 7.0660438
                    286 991.79545 7.3831628
                    286 1121.92235 6.8863119
                    286 1109.81932 7.4125111
                    end
                    
                    * ssc inst moremata needed for -mm_quantile()- 
                    mata:  
                    mata clear
                    real rowvector myquantile(real colvector X) {
                         if (rows(X) < 3) return(.) 
                         return(mm_quantile(X, 1, (0.75, 0.5, 0.25)))
                    }
                    end 
                    
                    rangestat (myquantile) mean_svr, interval(gest -3 3) 
                    
                    label var myquantile3 "p25"
                    label var myquantile2 "p50"
                    label var myquantile1 "p75"
                    
                    set scheme s1color 
                    scatter mean_svr gest, ms(oh) mc(gs8) || ///
                    line myquantile? gest, sort legend(order(2 3 4) col(1) pos(5) ring(0)) ///
                                ytitle("mean_svr") yla(, ang(h))

                    Click image for larger version

Name:	gestation.png
Views:	1
Size:	32.6 KB
ID:	1439602

                    Comment


                    • #11
                      Thank you very much Nick for your help!

                      Comment


                      • #12
                        Here's another way to approach it with stripplot (SSC). I choose bin width 4 days (lower limits shown) for no reason at all.

                        Code:
                        . gen bin = 4 * floor(gest/4)
                        
                        . tab bin
                        
                                bin |      Freq.     Percent        Cum.
                        ------------+-----------------------------------
                                268 |          5       11.11       11.11
                                272 |          6       13.33       24.44
                                276 |          6       13.33       37.78
                                280 |          3        6.67       44.44
                                284 |         18       40.00       84.44
                                292 |          7       15.56      100.00
                        ------------+-----------------------------------
                              Total |         45      100.00
                        
                        . stripplot mean_svr , over(bin) stack vertical height(1) width(20) ms(Sh)
                        Click image for larger version

Name:	gestation2.png
Views:	1
Size:	17.5 KB
ID:	1439610


                        Last edited by Nick Cox; 16 Apr 2018, 05:51.

                        Comment


                        • #13
                          Thanks again. I like this graph!

                          Comment


                          • #14
                            dotplot as an official command should be mentioned.

                            Comment


                            • #15
                              I have a question:
                              How do I make a histogram with a variable on the x-axis and a variable on the y-axis?
                              [...] I have been asked to illustrate it with a histogram...
                              I agree the graphics provided in #10 ad #12 are rather insightful solutions. That said, at least for the sake of debate, I wonder whether a "histogram" with two continuous variables, one in each axis, as demanded, wouldn't still be taken as an impossible mission, conceptually speaking, given the conflict with the "solitude" of a continuous variable, necessary to provide histograms.
                              Best regards,

                              Marcos

                              Comment

                              Working...
                              X