Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crop yaxis when using sts graph?

    Dear all,

    Does anyone know how to crop (or not show in some way) the highest part of the yaxis in a sts graph? Let me explain my data and problem:
    I am using sts graph, hazard to estimate risk of a type of cancer (cancer==1) in participants who are exposed (exposed==1) or unexposed (exposed==0) to a certain risk factor. I have a large dataset with about 3,330,000 participants, and I since the exposure is time-dependent, I have split the file (using stsplit), so there are 6,622,000 observations in the dataset.
    As the data is sensitive, I have to use Stata on a secure server where I cannot copy anything or use the internet, so I am unfortunately not able to use dataex. In addition, I wanted to attach the graphs I have made, but I am not allowed to export graphic files (only pdfs) and when I took a picture of the graph with my phone, saved it as a .png-file and tried to upload it, I got ; "Error uploading image"). I hope someone is still able to help me.

    I want to compare two graphs, one using stset with birthyear as origin, and one using study entry as origin.
    stset date_eof_cancer, failure(cancer==1) entry(study_entry) origin(birth_year) sc(365.25) id(id)
    and
    stset date_eof_cancer, failure(cancer==1) entry(study_entry) origin(study_entry) sc(365.25) id(id)

    My code for the graph is:

    sts graph, by(exposed) hazard ci ///
    ylabel(0 "0" 0.0001 "10" 0.0002 "20" 0.0003 "30" 0.0004 "40" 0.0005 "50") /// //and in addition 0.001 "100" 0.0015 "150" for the one with birth_year as origin
    xlabel(0(5)35) // and some other options for esthetics which I suppose is not relevant to the question

    The problem is that when I use the birth_year as origin, the confidence interval in the start of the curve for the exposed group is very high (150 cases per 100 000 person-years) as compared to the rest of the two graphs, where the maximum values at the yaxis is about 50 cases per 100 000 person-years.
    As I would like to compare the two graphs and use them in a publication, I would prefer if I could have the same scale and labels on the yaxis. The relevant information would still be included in the graphs, and I would write a foot-note to the table describing the maximum value of the confidence interval (unless there is a way to show it in the graph without showing the whole curve of the confidence interval). As you probably know, it does not help to change the ylabel.

    I have searched the Statlist forum and other websites, but not found the right way to do this. Is it even possible?
    In one previous post from the Statalist (https://www.stata.com/statalist/arch.../msg00224.html), David Harrison writes "If you want to restrict the range so that some of the curve is not visible, you will need to use -sts gen- to generate the full curve and then plot only the bit you want using a standard twoway graph." Maybe that is what I need to do? However, I have not managed to make the graph this way (I can give more information about how I have tried doing this, if this is the best way to solve my problem).

    I am thankful for any help,

    Best regards,
    Marie S. Sandvei
    Resident physician and researcher

  • #2
    Here is my code in the right format. I am also trying to make an example dataset and use dataex, but I am not sure I will be allowed to copy and paste it here.
    Code:
    * stset codes
    stset date_eof_cancer, failure(cancer==1) entry(study_entry) origin(birth_year) sc(365.25) id(id)
    *and
    stset date_eof_cancer, failure(cancer==1) entry(study_entry) origin(study_entry) sc(365.25) id(id)
    Code:
     * Graph code:
    sts graph, by(exposed) hazard ci ///
    ylabel(0 "0" 0.0001 "10" 0.0002 "20" 0.0003 "30" 0.0004 "40" 0.0005 "50") ///
    //and in addition 0.001 "100" 0.0015 "150" for the one with birth_year as origin
    xlabel(0(5)35) // and some other options for esthetics which I suppose is not relevant to the question

    Comment


    • #3
      There are several Stata data sets illustrating how to create this kind of graph. Here is an example:

      Code:
      webuse drug2, clear
      sts graph, hazard ci by(drug) scheme(s1color)
      Click image for larger version

Name:	Graph.png
Views:	1
Size:	34.6 KB
ID:	1495062



      Is your question whether you can, for example, restrict the y axis to 0.3? If you have the inputs to the graph, the answer is yes via twoway. As far as I can see, this involves some sort of smoothing.

      Comment


      • #4
        Thank you for your answer and the tip about the Stata data set for illustration! And yes, that is exactly what I want.
        I have now tried using sts gen and twoway. It doesn´t say in the PDF manual how to make confidence intervals (lower and upper bounds) for the hazard (h). I think the graph using this parameter looks not so different from the output from the sts graph above, exept in the ends of the lines, which may be due to the smooting of the sts graph?
        Is this the right parameter to use, or should I use other output from the sts gen to make the graph? And is there a way to have a confidence interval for the hazard? I am not a statistician and find this difficult.
        Code:
        webuse drug2, clear
        
        sts gen haz_drug = h, by(drug) // couldn´t find a way to make confidence interval for the hazard
        
        * Graph of the hazard
        twoway line haz_drug _t if drug==0, sort ///
        ||  ///
        line haz_drug _t if drug==1, sort ///
        scheme(s1color)  name(haz_plot, replace) ///
        title("Hazard plot, no max")
        
        * Graph of the hazard, max 0.6
        twoway line haz_drug _t if drug==0&haz_drug<0.6, sort ///
        ||  ///
        line haz_drug _t if drug==1, sort ///
        scheme(s1color)  name(haz_plot_6, replace) ///
        title("Hazard plot, max 0.6")
        Click image for larger version

Name:	haz_plot_nomax.png
Views:	1
Size:	25.6 KB
ID:	1495163 Click image for larger version

Name:	haz_plot_max6.png
Views:	1
Size:	27.3 KB
ID:	1495164

        When I put if haz_drug<0.6 the curve is cut when the hazard is above this value, so this could possibly work, if I could do this for the confidence interval of my graph. It would though be better if it was possible to crop only the plot at the yaxis and keep the rest of the plot? I would also have to use some kind of smoothing, do you have any suggestion?

        Thank you again
        for all and any help!
        Best, Marie


        Addition:
        I also tried using the Nelson-Aalen estimate of the cumulative hazards function, where I could get confidence intervals (lower or upper bonds), but as I understand, this does not correspond to the
        sts graph, failure , right? Just in case this is appropriate, I copied my code for this:
        Code:
        sts gen cumhaz_drug = na, by(drug)
        sts gen lb_drug = lb(na), by(drug)
        sts gen ub_drug = ub(na), by(drug)
        
        * Graph of the cumulative hazard
        twoway rarea lb_drug ub_drug _t if drug==0, sort bcolor(pink)  ///
            yscale(range(0 4)) ///
        ||  ///
        rarea lb_drug ub_drug _t if drug==1, sort bcolor(green)  ///
        ||  ///
        line cumhaz_drug _t if drug==0, sort lcolor(black) ///
        ||  ///
        line cumhaz_drug _t if drug==1, sort lcolor(black) ///
        name(cumhaz_plot, replace)

        Comment


        • #5
          Code:
          webuse drug2
          sts graph, hazard ci by(drug)  outfile(inputs)
          use inputs
          tw (line hazard _t if !drug) (line hazard _t if drug)
          ADDED IN EDIT: The variance term is given in the output. I will post a complete solution.
          Last edited by Andrew Musau; 26 Apr 2019, 06:13.

          Comment


          • #6
            Code:
            webuse drug2, clear
            sts graph, hazard ci by(drug) outfile(inputs, replace)
            use inputs
            
            *USING THE NORMAL DISTRIBUTION
            *1.96 = approx. value of the 97.5 percentile point
            gen ub = hazard + (1.96*(Vhazard^0.5))
            gen lb = hazard - (1.96*(Vhazard^0.5))
             
            twoway (rcap lb ub _t if drug, sort pstyle(ci) color(red%25)) ///
            (rcap lb ub _t  if  !drug, sort pstyle(ci) color(green%25)) ///
            (line hazard _t if drug, lcolor(red)) (line hazard _t if !drug, ///
            lcolor(green) scheme(s1color) title(Smoothed hazard estimates)  ///
            xtitle("analysis time") ytitle("") leg(order(1 "95% CI" 2 "95% CI" ///
            3 "placebo" 4 "drug" )))

            Comment

            Working...
            X