Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question regarding incidence (pr. 100'000)

    Hello everyone.

    First of all, I am a young cancer researcher and doing a presentation on my data on a rare skin cancer (MCC) in less than 2 days for doctors involved in the treatment of this cancer.
    I have taken a short stata course, and I got Stata this evening.
    I have all the data I need now except TWO things.
    I have tried searching youtube etc., but I couldn't solve it.
    As I am new with stata, and I have a tight deadline, I hope you more experienced guys can help me solve this last piece.

    I have uploaded a screen shot.

    Click image for larger version

Name:	Stata Question2.jpg
Views:	2
Size:	208.7 KB
ID:	1411928



    1)
    I am looking at patients diagnosed with a certain skin cancer from 1996 to 2015 (20 year period). You can see the variable time of diagnosis.
    I would like to calculate an incidence rate (xx pr. 100'000 people) based on the Danish population size for all patients in this period.
    How do I do this? I simply cannot figure it out.

    2)
    Further more, I would like to subdivide the patients in groups of 5 year-periods so.
    Group 1: 1996 - 2000
    Group 2: 2001 - 2005
    Group 3: 2006 - 2010
    Group 4: 2011 - 2015

    I would like to 1) calculate incidence for each groups, so that I can see how the incidence changes during time 2) Get a graph of this data.
    If there is a more smart way of doing this, I am open to suggestions.

    As I said, I am very new and I do not know if I am using the forum correctly.
    I hope I am, otherwise please let me know.
    Because of time pressure and my lack of knowledge on Stata, I do hope there is someone kind who can help me with this problem.

    Many regards
    Simon

    Attached Files

  • #2
    I would love to help you out. Unfortunately, as is so often the case, your screen shots are unreadable, at least on my computer. Please read the Forum FAQ, especially #12 for advice on how to get and use the -dataex- command to show example data. Using -dataex- assures that your results are clearly communicated and that those who want to help you can create a complete and faithful replica of your data in Stata using a simple copy/paste operation. (Note that even if your screen shots were readable, if it proves necessary to work with the data to develop and test code, there is no way to import it from an image.)

    Comment


    • #3
      Dear Clyde.
      Thank you very much for your reply.
      I have tried to attach the picture differently, perhaps you can see it now on your computer?
      Click image for larger version

Name:	Stata Question3.png
Views:	1
Size:	659.6 KB
ID:	1411931

      Comment


      • #4
        No. Still unreadable. Run -ssc install dataex-. Then read -help dataex- for instructions. It will only take you a few minutes, and -dataex- is very easy to use. There is NO satisfactory substitute for it. And as I said before, even if I could read your screenshot, if I need to actually test code on the data, you can't do that with an image. -dataex-, please.

        Comment


        • #5
          I was hoping not to post the actual data it self, as I am not fully aware of the rules regarding this, which are very strict here in Denmark.
          So if advice is possible by seeing the picture (if you can see it now?) and variables, that would be very appreciated.

          Comment


          • #6
            The exact values of the numbers in the data are not important. If you want to add some random noise to them, that's fine. Or make up fake data with the same variables, storage types, etc. Also, you don't have to post everything, just the relevant variables.

            Added: When I say make up fake data with the same variables, storage types, etc., what I mean is take your data set (make sure you save it first) and then in Stata's data editor just overwrite the real data with some made-up values for the same variables, then post that.
            Last edited by Clyde Schechter; 25 Sep 2017, 17:09.

            Comment


            • #7
              God damn, it sucks being newbie.
              I have now written -ssc install dataex-, and I see a folder being generated which contains a file, stata.trk, and another folder titled 'd', which contains 2 files: 'dataex.ado' and 'dataex'.
              I have read the helpdata ex but unsure what next step is.

              ^I will try to delete informations which is not relevant to avoid posting confidential data.

              Comment


              • #8
                Wait, I think I did it now. TYped Dataex [name of variable one] [name of variable two],
                Now i just gotta figure out how to save it and send it to you.

                It contains the first 100 observations I have it seems.

                Comment


                • #9
                  So decide which variables you want to post, and which observations you want to show. (The default is the first 100 observations, but probably you don't need to show that many. Probably 20 will be sufficient.) So run

                  Code:
                  dataex list_here_the_variables_to_show in 1/20
                  Stata will respond with something that looks like:

                  Code:
                  ----------------------- copy starting from the next line -----------------------
                  [C O D E]
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input int(price mpg) str18 make float headroom
                   4099 22 "AMC Concord"       2.5
                   4749 17 "AMC Pacer"           3
                   3799 22 "AMC Spirit"          3
                   4816 20 "Buick Century"     4.5
                   7827 15 "Buick Electra"       4
                   5788 18 "Buick LeSabre"       4
                   4453 26 "Buick Opel"          3
                   5189 20 "Buick Regal"         2
                  10372 16 "Buick Riviera"     3.5
                   4082 19 "Buick Skylark"     3.5
                  11385 14 "Cad. Deville"        4
                  14500 14 "Cad. Eldorado"     3.5
                  15906 21 "Cad. Seville"        3
                   3299 29 "Chev. Chevette"    2.5
                   5705 16 "Chev. Impala"        4
                   4504 22 "Chev. Malibu"      3.5
                   5104 22 "Chev. Monte Carlo"   2
                   3667 24 "Chev. Monza"         2
                   3955 19 "Chev. Nova"        3.5
                   3984 30 "Dodge Colt"          2
                  end
                  [/C O D E ]
                  ------------------ copy up to and including the previous line ------------------
                  Copy everything between, but not including the lines. Be sure to include the [C O D E] at the beginning and the [/C O D E] at the end. Then paste all of that into the Forum editor.

                  Comment


                  • #10
                    No Idea If I am doing this correctly, but the way I understand it I have to post it here?
                    Just to make it clear, the field Record ID which contains numbers such as 1,2,3,4,5,6,7,13 etc each represent One patient.
                    So the number 1 is one patients
                    the number 2 is patients to etc... and the number 13 is eight patient. Hopes this makes sense.


                    . dataex RecordID DateofDiagnosis

                    ----------------------- copy starting from the next line -----------------------
                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str3 RecordID str10 DateofDiagnosis
                    "1"   "2014-11-17"
                    "2"   "2015-11-12"
                    "3"   "2015-12-14"
                    "4"   "2015-08-12"
                    "5"   "2015-03-31"
                    "6"   "2009-06-02"
                    "7"   "2003-07-11"
                    "13"  "2015-02-25"
                    "14"  "2015-02-26"
                    "15"  "2014-03-26"
                    "16"  "2014-03-06"
                    "17"  "2014-02-13"
                    "18"  "2013-12-19"
                    "19"  "2012-11-30"
                    "20"  "2013-05-28"
                    "21"  "2013-06-14"
                    "22"  "2013-05-22"
                    "23"  "2012-05-18"
                    "24"  "2010-12-09"
                    "25"  "2010-07-15"
                    "26"  "2010-08-25"
                    "27"  "2010-06-18"
                    "28"  "2010-02-24"
                    "29"  "2009-03-23"
                    "30"  "2009-08-04"
                    "31"  "2009-03-10"
                    "32"  "2008-10-28"
                    "33"  "2008-02-28"
                    "34"  "2007-08-17"
                    "35"  "2007-04-16"
                    "36"  "2007-04-02"
                    "37"  "2007-02-16"
                    "38"  "2006-10-25"
                    "39"  "2006-03-17"
                    "40"  "2005-04-08"
                    "41"  "2004-12-17"
                    "42"  "2004-11-17"
                    "43"  "2003-02-14"
                    "44"  "2002-11-04"
                    "45"  "2001-12-28"
                    "46"  "2001-01-08"
                    "47"  "2001-01-04"
                    "48"  "2000-12-04"
                    "49"  "2000-11-29"
                    "50"  "1999-05-20"
                    "51"  "2000-06-30"
                    "52"  "2000-05-04"
                    "53"  "2000-01-04"
                    "54"  "2000-01-12"
                    "55"  "1999-12-16"
                    "56"  "1998-12-17"
                    "57"  "1998-07-16"
                    "58"  "1998-06-12"
                    "59"  "1998-02-26"
                    "60"  "1997-05-13"
                    "61"  "1996-12-23"
                    "62"  "1996-07-15"
                    "63"  "1996-12-10"
                    "64"  "2014-08-05"
                    "65"  "1999-05-12"
                    "66"  "2002-09-19"
                    "68"  "2002-03-12"
                    "69"  "2014-11-06"
                    "70"  "2006-03-10"
                    "71"  "2014-10-03"
                    "72"  "2014-12-03"
                    "73"  "2001-01-24"
                    "74"  "2013-01-18"
                    "75"  "2012-10-19"
                    "76"  "2002-05-21"
                    "77"  "2011-05-03"
                    "78"  "2007-07-04"
                    "79"  "2014-09-24"
                    "81"  "1998-06-09"
                    "82"  "2014-07-03"
                    "83"  "2015-04-23"
                    "84"  "2009-11-16"
                    "85"  "2010-12-15"
                    "86"  "1997-09-16"
                    "87"  "2010-12-22"
                    "88"  "1999-08-12"
                    "89"  "2005-04-15"
                    "90"  "2014-10-01"
                    "91"  "1996-01-08"
                    "93"  "2009-05-01"
                    "94"  "1997-04-23"
                    "95"  "2003-02-24"
                    "96"  "2014-10-10"
                    "97"  "1998-06-19"
                    "98"  "2012-07-17"
                    "99"  "2013-03-13"
                    "100" "2006-11-15"
                    "101" "2012-10-18"
                    "102" "1996-03-21"
                    "103" "2013-02-04"
                    "104" "2012-01-25"
                    "105" "2013-01-07"
                    "106" "1999-01-11"
                    "108" "2007-03-02"
                    "109" "1996-09-17"
                    end
                    ------------------ copy up to and including the previous line ------------------

                    Listed 100 out of 374 observations
                    Use the count() option to list more

                    . ssc install dataex
                    checking dataex consistency and verifying not already installed...
                    all files already exist and are up to date.

                    Comment


                    • #11
                      The -dataex- worked fine. Thank you.

                      So I can get you part of the way there. Every incidence rate has a numerator and a denominator. The numerator is the count of newly diagnosed cases from the relevant time period and population. The denominator is the total population at the middle of that time period. The information you have shown is sufficient to calculate the numerators. But there is no information about the population of Denmark in any of these years, so I can't help you with the denominators. Anyway, this code will get you the numerators:

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input str3 RecordID str10 DateofDiagnosis
                      "1"   "2014-11-17"
                      "2"   "2015-11-12"
                      "3"   "2015-12-14"
                      "4"   "2015-08-12"
                      "5"   "2015-03-31"
                      "6"   "2009-06-02"
                      "7"   "2003-07-11"
                      "13"  "2015-02-25"
                      "14"  "2015-02-26"
                      "15"  "2014-03-26"
                      "16"  "2014-03-06"
                      "17"  "2014-02-13"
                      "18"  "2013-12-19"
                      "19"  "2012-11-30"
                      "20"  "2013-05-28"
                      "21"  "2013-06-14"
                      "22"  "2013-05-22"
                      "23"  "2012-05-18"
                      "24"  "2010-12-09"
                      "25"  "2010-07-15"
                      "26"  "2010-08-25"
                      "27"  "2010-06-18"
                      "28"  "2010-02-24"
                      "29"  "2009-03-23"
                      "30"  "2009-08-04"
                      "31"  "2009-03-10"
                      "32"  "2008-10-28"
                      "33"  "2008-02-28"
                      "34"  "2007-08-17"
                      "35"  "2007-04-16"
                      "36"  "2007-04-02"
                      "37"  "2007-02-16"
                      "38"  "2006-10-25"
                      "39"  "2006-03-17"
                      "40"  "2005-04-08"
                      "41"  "2004-12-17"
                      "42"  "2004-11-17"
                      "43"  "2003-02-14"
                      "44"  "2002-11-04"
                      "45"  "2001-12-28"
                      "46"  "2001-01-08"
                      "47"  "2001-01-04"
                      "48"  "2000-12-04"
                      "49"  "2000-11-29"
                      "50"  "1999-05-20"
                      "51"  "2000-06-30"
                      "52"  "2000-05-04"
                      "53"  "2000-01-04"
                      "54"  "2000-01-12"
                      "55"  "1999-12-16"
                      "56"  "1998-12-17"
                      "57"  "1998-07-16"
                      "58"  "1998-06-12"
                      "59"  "1998-02-26"
                      "60"  "1997-05-13"
                      "61"  "1996-12-23"
                      "62"  "1996-07-15"
                      "63"  "1996-12-10"
                      "64"  "2014-08-05"
                      "65"  "1999-05-12"
                      "66"  "2002-09-19"
                      "68"  "2002-03-12"
                      "69"  "2014-11-06"
                      "70"  "2006-03-10"
                      "71"  "2014-10-03"
                      "72"  "2014-12-03"
                      "73"  "2001-01-24"
                      "74"  "2013-01-18"
                      "75"  "2012-10-19"
                      "76"  "2002-05-21"
                      "77"  "2011-05-03"
                      "78"  "2007-07-04"
                      "79"  "2014-09-24"
                      "81"  "1998-06-09"
                      "82"  "2014-07-03"
                      "83"  "2015-04-23"
                      "84"  "2009-11-16"
                      "85"  "2010-12-15"
                      "86"  "1997-09-16"
                      "87"  "2010-12-22"
                      "88"  "1999-08-12"
                      "89"  "2005-04-15"
                      "90"  "2014-10-01"
                      "91"  "1996-01-08"
                      "93"  "2009-05-01"
                      "94"  "1997-04-23"
                      "95"  "2003-02-24"
                      "96"  "2014-10-10"
                      "97"  "1998-06-19"
                      "98"  "2012-07-17"
                      "99"  "2013-03-13"
                      "100" "2006-11-15"
                      "101" "2012-10-18"
                      "102" "1996-03-21"
                      "103" "2013-02-04"
                      "104" "2012-01-25"
                      "105" "2013-01-07"
                      "106" "1999-01-11"
                      "108" "2007-03-02"
                      "109" "1996-09-17"
                      end
                      
                      //    FIRST CREATE A REAL STATA INTERNAL DATE VARIABLE
                      gen diagnosis_date = daily(DateofDiagnosis, "YMD")
                      format diagnosis_date %td
                      
                      //    AND GET THE YEAR
                      gen year = yofd(diagnosis_date)
                      
                      //    NOW DEFINE THE YEAR GROUPS
                      gen byte era = ceil((year-1995)/5) if inrange(year, 1996, 2015)
                      label define era    1    "1996-2000"    ///
                                          2    "2001-2005"    ///
                                          3    "2006-2010"    ///
                                          4    "2011-2015"
                      label values era era
                      
                      //    NOW CALCULATE NUMERATORS FOR THE INCIDENCE RATE
                      by era, sort: egen era_case_count = count(diagnosis_date)
                      
                      //    CALCULATE NUMERATOR FOR THE ENTIRE 1996-2015 PERIOD
                      count if inrange(year, 1996, 2015) & !missing(diagnosis_date)
                      local total_case_count `r(N)'
                      sort era year
                      To finish the job you will have to find (in your data set, or from some other source) the population of Denmark in 2001 (for the overall 1996-2015 period), and in 1998, 2003, 2008, and 2013 for the four groups. Then you just multiply the numerator by 100,000 and divide by the denominator.
                      Last edited by Clyde Schechter; 25 Sep 2017, 17:37.

                      Comment


                      • #12
                        Dear Clyde. The time is 01:39 night time here.
                        I will go to bed as I have early shift tomorrow, but thank you very very very much for your help. I sincerely appreciate it and will now be able to present the data.
                        I will post tomorrow if I figure it out, using your example.
                        Thank you very much again.

                        Comment


                        • #13
                          OK. By the way, just to be clear, the method of multiplying the numerator by 100,000 and dividing by the denominator gives you the full period incidence rate. If you want annualized incidence rates over the period then you have to divide the result by 5 for each of the groups, or by 20 for the whole thing.
                          Last edited by Clyde Schechter; 25 Sep 2017, 18:21. Reason: Correct error.

                          Comment

                          Working...
                          X