Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeated Time Values and Line Graphs

    Hello Statalisters,

    I have a dataset with court filings some with repeated dates (i.e., cases were filed in different states on the same day). I want to generate a line graph, but obviously after trying to xtset the data I am running into issues with repeated time values in the sample. I've attached a data snippet. Basically what I would like to create is a line graph that shows the number of filings in total on each day of the dataset. The second graph that I want to produce is the number of cases by subcategory over time (subcatcode). Any assistance is greatly appreciated.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id int FilingDate str18 Category str114 Subcategory byte SubCatCode long state2
     1 21944 "Labor & Employment" "Conditions of employment (including lack of PPE, exposure to COVID-19 at work, wrongful death and personal injury)"  1 46
     2 21986 "Labor & Employment" "Leaves of Absense (FMLA, FFCRA, State Law)"                                                                          4  6
     3 21986 "Labor & Employment" "Leaves of Absense (FMLA, FFCRA, State Law)"                                                                          4  6
     4 21991 "Labor & Employment" "Other"                                                                                                              10  2
     5 21994 "Labor & Employment" "Conditions of employment (including lack of PPE, exposure to COVID-19 at work, wrongful death and personal injury)"  1 24
     6 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
     7 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
     8 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
     9 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
    10 21998 "Labor & Employment" "Conditions of employment (including lack of PPE, exposure to COVID-19 at work, wrongful death and personal injury)"  1  2
    end
    format %tdnn/dd/CCYY FilingDate
    label values state2 state2
    label def state2 2 "Alaska", modify
    label def state2 6 "California", modify
    label def state2 24 "Michigan", modify
    label def state2 46 "Texas", modify

  • #2
    You have to collapse the data to the level you want

    Code:
    g nfilings=1
    collapse (sum)nfilings,by(FilingDate state2)
    xtset state2 FilingDate
    xtline nfilings
    For the second graph you also need a collapse, but a egen tag might come useful

    Code:
    egen ncases=tag(Subcategory FilingDate)
    collapse (sum) ncases, by(FilingDate Subcategory)
    encode Subcategory, gen(subcategory_code)
    xtset subcategory_code FilingDate
    xtline ncases
    This can also be done with
    Code:
    duplicates drop
    instead of egen tag

    Comment


    • #3
      Following the advice in #2, you can collapse the data in a different frame and show the frequencies using a bar graph for #2. You need to sort out the labels for the sub categories.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float id int FilingDate str18 Category str114 Subcategory byte SubCatCode long state2
       1 21944 "Labor & Employment" "Conditions of employment (including lack of PPE, exposure to COVID-19 at work, wrongful death and personal injury)"  1 46
       2 21986 "Labor & Employment" "Leaves of Absense (FMLA, FFCRA, State Law)"                                                                          4  6
       3 21986 "Labor & Employment" "Leaves of Absense (FMLA, FFCRA, State Law)"                                                                          4  6
       4 21991 "Labor & Employment" "Other"                                                                                                              10  2
       5 21994 "Labor & Employment" "Conditions of employment (including lack of PPE, exposure to COVID-19 at work, wrongful death and personal injury)"  1 24
       6 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
       7 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
       8 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
       9 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
      10 21998 "Labor & Employment" "Conditions of employment (including lack of PPE, exposure to COVID-19 at work, wrongful death and personal injury)"  1  2
      end
      format %tdnn/dd/CCYY FilingDate
      label values state2 state2
      label def state2 2 "Alaska", modify
      label def state2 6 "California", modify
      label def state2 24 "Michigan", modify
      label def state2 46 "Texas", modify
      
      *#1
      frame put *, into(total)
      frame total{
          collapse (sum) freq=id, by(FilingDate)
          tsset FilingDate
          tsline freq, ytitle(Frequency) xtitle("") xlab(`=td(30jan2020)' (15) `=td(30mar2020)') xsc(r(. `=td(2apr2020)'))
      }
      frame drop total
      
      *#2
      frame put *, into(total)
      frame total{
          collapse (sum) freq=id, by(FilingDate SubCat)
          graph hbar freq, over(SubCat, sort(1)) ytitle(Frequency)
      }
      frame drop total
      Click image for larger version

Name:	Graph.png
Views:	1
Size:	31.1 KB
ID:	1747131

      Click image for larger version

Name:	Graph2.png
Views:	1
Size:	13.9 KB
ID:	1747130

      Last edited by Andrew Musau; 18 Mar 2024, 16:08.

      Comment


      • #4
        Thank you both. This is exactly what I needed. Say I wanted to do a line graph with the subcategory codes displayed, would I need to collapse again or would the new nfilings variable capture it if I included the subcatcode variable

        Comment


        • #5
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float id int FilingDate str18 Category str114 Subcategory byte SubCatCode long state2
           1 21944 "Labor & Employment" "Conditions of employment (including lack of PPE, exposure to COVID-19 at work, wrongful death and personal injury)"  1 46
           2 21986 "Labor & Employment" "Leaves of Absense (FMLA, FFCRA, State Law)"                                                                          4  6
           3 21986 "Labor & Employment" "Leaves of Absense (FMLA, FFCRA, State Law)"                                                                          4  6
           4 21991 "Labor & Employment" "Other"                                                                                                              10  2
           5 21994 "Labor & Employment" "Conditions of employment (including lack of PPE, exposure to COVID-19 at work, wrongful death and personal injury)"  1 24
           6 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
           7 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
           8 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
           9 21997 "Labor & Employment" "Under-, over-, and non-payment issues/wage issues (FLSA, state, local law)"                                          6 46
          10 21998 "Labor & Employment" "Conditions of employment (including lack of PPE, exposure to COVID-19 at work, wrongful death and personal injury)"  1  2
          end
          format %tdnn/dd/CCYY FilingDate
          label values state2 state2
          label def state2 2 "Alaska", modify
          label def state2 6 "California", modify
          label def state2 24 "Michigan", modify
          label def state2 46 "Texas", modify
          
          cap frame drop total
          *#3
          
          frame put *, into(total)
          frame total{
              collapse (sum) freq=id, by(FilingDate SubCat)
              xtset SubCat FilingDate
              bys SubCat (Filing): gen lastob=_n==_N & _N>1
              xtline freq, overlay ytitle(Frequency) xtitle("") xlab(`=td(30jan2020)' (15) `=td(30mar2020)') xsc(r(. `=td(2apr2020)')) ///
                  addplot(scatter freq FilingDate if lastob, msy(none) mlab(SubCat) mlabpos(3) mlabsize(4)) leg(off) 
          }
          frame drop total
          Click image for larger version

Name:	Graph.png
Views:	1
Size:	23.7 KB
ID:	1747147

          Comment


          • #6
            I'll speak up to comment that contract is a complement to collapse. Indeed

            Code:
            bysort FilingDate : gen freq = _N 
            by FilingDate : gen tag = _n == 1
            line freq FilingDate if tag
            gets you started without either.


            Also, do you want days with zero cases shown as such? Yet further, day of the week may be a natural and not necessarily trivial complication.

            Comment

            Working...
            X