Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Respahe to wide?

    Dear StataList-ers!

    I have the following dataset, focused on sexually-transmitted disease in USA (state level) by year and gender. Instead of a single variable "Disease", I would like to deal with the numerous types of disease as separate variables (e.g. "Gonorrhea rate"; "Primary Syphilis rate", "Total Syphilis rate", etc.). I read the help files for reshape and collapse, and watched some tutorials. This seems like a very basic question but I am still not sure how to manage this Please, help!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str20 State byte StateCode str30 Disease int DiseaseCode str6 Gender int Year long Count double Rate
    "Alabama" 1 "Secondary Syphilis"             312 "Male"   1984   211  11.13
    "Alabama" 1 "Primary and Secondary Syphilis" 310 "Male"   1984   446  23.52
    "Alabama" 1 "Secondary Syphilis"             312 "Female" 1984   186   9.05
    "Alabama" 1 "Primary Syphilis"               311 "Male"   1984   235  12.39
    "Alabama" 1 "Early Latent Syphilis"          313 "Male"   1984   309  16.29
    "Alabama" 1 "Early Latent Syphilis"          313 "Female" 1984   181    8.8
    "Alabama" 1 "Late and Late Latent Syphilis"  314 "Male"   1984   103   5.43
    "Alabama" 1 "Total Syphilis"                 320 "Female" 1984   509  24.76
    "Alabama" 1 "Total Syphilis"                 320 "Male"   1984   860  45.35
    "Alabama" 1 "Primary Syphilis"               311 "Female" 1984    41   1.99
    "Alabama" 1 "Primary and Secondary Syphilis" 310 "Female" 1984   227  11.04
    "Alabama" 1 "Late and Late Latent Syphilis"  314 "Female" 1984   101   4.91
    "Alabama" 1 "Gonorrhea"                      280 "Male"   1984 11021 581.18
    "Alabama" 1 "Gonorrhea"                      280 "Female" 1984 11917 579.64
    "Alabama" 1 "Early Latent Syphilis"          313 "Male"   1985   290  15.21
    "Alabama" 1 "Gonorrhea"                      280 "Female" 1985 12153 587.83
    "Alabama" 1 "Late and Late Latent Syphilis"  314 "Male"   1985   131   6.87
    "Alabama" 1 "Gonorrhea"                      280 "Male"   1985 11025 578.42
    "Alabama" 1 "Total Syphilis"                 320 "Female" 1985   502  24.28
    "Alabama" 1 "Secondary Syphilis"             312 "Male"   1985   185   9.71
    end
    Thanks heaps! Much appreciated!

    Vania

  • #2
    This works. As you don't spell out what you want to do with all of your variables, I am not clear that it is exactly what you want.

    Code:
    drop Disease 
    reshape wide Count Rate, i(State Gender Year) j(DiseaseCode) 
    list, sepby(State Year)

    Comment


    • #3
      The code that Nick proposes in #2 will create variables with names like Rate312, etc, which incorporate the disease code. I believe the original request is for variable names that incorporate the disease name. Taken literally, this cannot be done. Variable names cannot include blanks. And even if the blanks were removed or replaced with underscore (_) characters, some of the resulting names would exceed the 32 character limit. But if you are willing to do some transformations on the names, then your wish can be accommodated. For example:

      Code:
      replace Disease = proper(Disease)
      replace Disease = subinstr(Disease, "Primary", "1ary", .)
      replace Disease = subinstr(Disease, "Secondary", "2ary", .)
      replace Disease = subinstr(Disease, "Late Latent", "LL", .)
      replace Disease = subinstr(Disease, " ", "", .)
      
      drop DiseaseCode
      reshape wide Count Rate, i(State Gender Year) j(Disease) string
      These variable names might be easier to work with, as you don't have to have the disease code table memorized to understand them.

      Comment


      • #4
        This also works pretty well.

        Code:
        replace Disease = strtoname(subinstr(Disease, " and ", "&", .)) 
        drop DiseaseCode 
        reshape wide Count Rate, i(State Gender Year) j(Disease) string 
        
        foreach v of var * {
            local varlabel : var label `v'
            local varlabel : subinstr local varlabel "_" " ", all
            label var `v' "`varlabel'"
        }
        describe

        The main detail of note compared with #4 is the use of strtoname().

        Comment

        Working...
        X