Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ploting a line graph of age in our study

    I find Stata graphing to be a bit challenging sometimes.

    I have a data set of study participants, and would like a more granular view of their age at the time they enrolled in the study. I would like to show this command:

    histogram age, freq

    as a line graph showing the number of persons who are 18 years old, 19 years old, 20 years old, etc., represented not as a series of bars but as a line connecting the dots represented by the frequency of each age in my dataset. Then I would like to show two lines, one for persons in study arm A, one for persons in study arm B. I got close with this command:

    twoway (kdensity age if study_arm==0) (kdensity age if study_arm==1)

    but it's not exactly showing me the frequencies of persons at each age, but rather kernel-density plots. Maybe this is close enough, if I'm able to relabel axes?

    twoway seems to want me to graph age by another variable. Same with the line command.

    I feel like this should be a lot easier than it is, but I'm struggling here.

  • #2
    Use -bysort- to count the number of observations within each age.

    Code:
    bysort age: gen freq = _N
    line freq age

    Comment


    • #3
      It's not especially easy because you want what by most modern statistical or Stata standards is a non-standard graph, historically often called a frequency polygon, and that requires some tricks.

      I have to say that if I were your advisor I would recommend sticking to a histogram.

      Kernel density is not at all close to what you want.

      #2 starts in the right direction, but the strategy is flawed. Suppose there are people who are 36 and 38 but no-one aged 37. You should want 0 to be shown for 37 but as there are no such observations in the data no such point will be shown.

      Code:
      sysuse auto, clear
      
      rename (foreign mpg) (study_arm age)
      
      * you start here
      preserve
      
      contract study_arm age, zero
      
      tsset study_arm age
      
      tsfill
      
      replace _freq = 0 if _freq == .
      
      separate _freq, by(study_arm)
      
      line _freq? age, xtitle(Age) ytitle(Frequency) legend(order(1 "Study arm A" 2 "Study arm B"))
      
      restore

      Comment


      • #4
        The more I tried to wrap my head around this, the more I'm agreeing with you, Nick... I think I'll stick with histogram

        Thanks

        Comment


        • #5
          It's not difficult to understand what's happening if you pepper the code with list instructions or keep an eye on the data editor.

          In any case, although histograms are where you started, the best way to compare two distributions that should (or could) be similar is in my view a quantile-quantile plot or side-by-side quantile plot.

          Comment


          • #6
            I ended up going with:

            twoway (histogram age if study_arm==0, freq) (histogram age if study_arm==1, freq color(red%30)) , legend(subtitle("Age") order(1 "Study group" 2 "Comparison")) xtitle("Age") ytitle("Frequency")

            And it gets at what I'm after. A qqplot does show differences too, but I think the histogram will work best for my purposes. Cheers!

            Comment

            Working...
            X