Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reformatting a data set

    Hello, I have the data set shown below, I want to transform the dataset in the following way, I want to make all the categories of the metric variable into individual variables with their respective counts, so it would look like the format below the ***



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str24 metric str9 month int(year count)
    "Continuing Services - MH" "January"   2020  3
    "Continuing Services - MH" "February"  2020  0
    "Continuing Services - MH" "March"     2020  1
    "Continuing Services - MH" "April"     2020  0
    "Continuing Services - MH" "May"       2020  0
    "Continuing Services - MH" "June"      2020  0
    "Continuing Services - MH" "July"      2020  0
    "Continuing Services - MH" "August"    2020  0
    "Continuing Services - MH" "September" 2020  0
    "Continuing Services - MH" "October"   2020  0
    "Continuing Services - MH" "November"  2020  0
    "Continuing Services - MH" "December"  2020  1
    "Emergency Detention"      "January"   2020  9
    "Emergency Detention"      "February"  2020 14
    "Emergency Detention"      "March"     2020 14
    "Emergency Detention"      "April"     2020  8
    "Emergency Detention"      "May"       2020 15
    "Emergency Detention"      "June"      2020  3
    "Emergency Detention"      "July"      2020  3
    "Emergency Detention"      "August"    2020  7
    "Emergency Detention"      "September" 2020 14
    "Emergency Detention"      "October"   2020 11
    "Emergency Detention"      "November"  2020 15
    "Emergency Detention"      "December"  2020  7
    "Jail Court - MH"          "September" 2020  0
    "Jail Court - MH"          "October"   2020  0
    "Jail Court - MH"          "November"  2020  1
    "Jail Court - MH"          "December"  2020  0
    "Release to Center - MH"   "January"   2020 52
    "Release to Center - MH"   "February"  2020 35
    "Release to Center - MH"   "March"     2020 26
    "Release to Center - MH"   "April"     2020  9
    "Release to Center - MH"   "May"       2020 18
    "Release to Center - MH"   "June"      2020 12
    "Release to Center - MH"   "July"      2020 20
    "Release to Center - MH"   "August"    2020 15
    "Release to Center - MH"   "September" 2020 16
    "Release to Center - MH"   "October"   2020 15
    "Release to Center - MH"   "November"  2020 11
    "Release to Center - MH"   "December"  2020 12
    "Release to Center - SA"   "January"   2020 11
    "Release to Center - SA"   "February"  2020  8
    "Release to Center - SA"   "March"     2020 13
    "Release to Center - SA"   "April"     2020  4
    "Release to Center - SA"   "May"       2020  3
    "Release to Center - SA"   "June"      2020  3
    "Release to Center - SA"   "July"      2020  0
    "Release to Center - SA"   "August"    2020  0
    "Release to Center - SA"   "September" 2020  1
    "Release to Center - SA"   "October"   2020  1
    "Release to Center - SA"   "November"  2020  2
    "Release to Center - SA"   "December"  2020  4
    "Continuing Services - MH" "January"   2021  8
    "Continuing Services - MH" "February"  2021  7
    "Continuing Services - MH" "March"     2021  7
    "Continuing Services - MH" "April"     2021  7
    "Continuing Services - MH" "May"       2021 18
    "Continuing Services - MH" "June"      2021 10
    "Continuing Services - MH" "July"      2021 14
    "Continuing Services - MH" "August"    2021 18
    "Continuing Services - MH" "September" 2021 30
    "Continuing Services - MH" "October"   2021 12
    "Continuing Services - MH" "November"  2021 14
    "Continuing Services - MH" "December"  2021 13
    "Continuing Services - SA" "January"   2021  0
    "Continuing Services - SA" "February"  2021  0
    "Continuing Services - SA" "March"     2021  0
    "Continuing Services - SA" "April"     2021  0
    "Continuing Services - SA" "May"       2021  0
    "Continuing Services - SA" "June"      2021  0
    "Continuing Services - SA" "July"      2021  1
    "Continuing Services - SA" "August"    2021  3
    "Continuing Services - SA" "September" 2021  0
    "Continuing Services - SA" "October"   2021  0
    "Continuing Services - SA" "November"  2021  2
    "Continuing Services - SA" "December"  2021  0
    "Emergency Detention"      "January"   2021  9
    "Emergency Detention"      "February"  2021 12
    "Emergency Detention"      "March"     2021  7
    "Emergency Detention"      "April"     2021  7
    "Emergency Detention"      "May"       2021 14
    "Emergency Detention"      "June"      2021  9
    "Emergency Detention"      "July"      2021  9
    "Emergency Detention"      "August"    2021 17
    "Emergency Detention"      "September" 2021 20
    "Emergency Detention"      "October"   2021 11
    "Emergency Detention"      "November"  2021 13
    "Emergency Detention"      "December"  2021 10
    "Jail Court - MH"          "January"   2021  0
    "Jail Court - MH"          "February"  2021  0
    "Jail Court - MH"          "March"     2021  0
    "Jail Court - MH"          "April"     2021  0
    "Jail Court - MH"          "May"       2021  0
    "Jail Court - MH"          "June"      2021  1
    "Jail Court - MH"          "July"      2021  0
    "Jail Court - MH"          "August"    2021  0
    "Jail Court - MH"          "September" 2021  0
    "Jail Court - MH"          "October"   2021  0
    "Jail Court - MH"          "November"  2021  0
    "Jail Court - MH"          "December"  2021  0
    end
    **** example of the format I want****

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float year str10 month float(ContinuingServices_MH EmergencyDetention JailCourt_MH)
    2020 "January"  3  9 0
    2020 "February" 0 14 0
    end
    Last edited by Luis Mijares Castaneda; 30 Dec 2024, 16:28.

  • #2
    What you ask for can be done with:
    Code:
    replace metric = strtoname(metric)
    replace metric = subinstr(metric, "___", "_", .)
    rename count _
    reshape wide _, i(month year) j(metric) string
    rename _* *
    In addition, once that is done, I strongly recommend making a true Stata internal format monthly date variable out of your variables month and year. Separately you will find them awkward to use for most purposes.
    Code:
    egen temp = concat(month year)
    gen mdate = monthly(temp, "MY")
    assert !missing(mdate)
    format mdate %tm
    drop temp
    drop month year

    Comment


    • #3
      I agree strongly with Clyde Schechter's advice, as usual.

      As a detail,

      Code:
      gen mdate = monthly(month + strofreal(year))
      is another way here to get a monthly date variable.

      Sometimes a separate month variable is useful, say for describing or graphing or modelling seasonality. But for most purposes it needs to be numeric. Then again value labels are often helpful.

      You could just define value labels directly and then apply encode, but here is some small trickery that may be of use or interest. The key detail is that Stata holds the month names internally.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str9 month
      "January"  
      "February" 
      "March"    
      "April"    
      "May"      
      "June"     
      "July"     
      "August"   
      "September"
      "October"  
      "November" 
      "December" 
      end
      
      tokenize "`c(Months)'" 
      
      forval m = 1/12 { 
          label def nmonth `m' "``m''", add
      }
      
      encode month, gen(nmonth) label(nmonth)
      
      list if nmonth > 12
      The check for values > 12 is important, as misspellings will get encoded too.

      Naturally, you could pull a month variable out of a monthly date variable too: e.g. month(dofm(mdate))

      Code:
      creturn list
      shows what Stata holds. Even very experienced users can be surprised by how much there is.

      https://journals.sagepub.com/doi/pdf...867X0400400213 gives some headline news.

      Comment


      • #4
        I tried doing something similar but I used the command reshape wide count, but with this method the count values where becoming variables, how did renaming count to _ fix this problem?

        Comment


        • #5
          Well, as you don't show what the code for "something similar" was, I can't know what went wrong with it.

          The renaming of count to _ served only one purpose (and, depending on the full data set it may have served no purpose at all): variable names are limited to 32 characters in Stata. -reshape wide- creates new variables whose names are the original variable named after -wide- concatenated with the values in the variable designated in the -j()- option. Now count takes up 5 characters. Some of the values of the variable metric, which appears in the -j()- option, are well over 20 characters long. If any of them happen to be 32+1-5 = 28 characters long, or longer, then the resulting variable name will be illegal, and the -reshsape- command will fail. By changing the name from count to the single character _, the code makes it possible to proceed even if some value of variable metric is as long as 31 characters. None of the metric values in the example shown are 28 characters or longer, but I wanted the code to still work if the full data set did not follow this constraint. If, in fact, there were no 28+ character values of metric in the full data set, then the renaming accomplished nothing.

          Comment

          Working...
          X