Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Non mutually exclusive categorical variable?

    Hi all!

    I am working with some data describing how schools operated during the pandemic, specifically looking at the type of curriculum provided to virtual students. Districts could provide online, electronic, or physical curriculum, or any combination of the above. My existing format variable is a categorical variable with all possible combinations, and I have already created dummies for each of the three curriculum types.

    I am not currently interested in the combination of curriculum options, only whether or not each was provided, and I was wondering if anyone had any ideas on how to create a categorical variable (or something similar) that would accomplish this task? Ideally it would take on some value if a district provided online curriculum, regardless of if they provided curriculum only online, or in conjunction with another format. Similarly I would want to know whether or not a district ever provided electronic or physical curriculum. Eventually I want to be able to make summary tables (recognizing that my totals would be greater than 100%) that compare the formats to different characteristics, and I am finding that very difficult with the dummy variables.

    Below is a snippet of my code. Any help, suggestions, or confirmation that this isn't possible would be greatly appreciated!

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long format float(online electronic physical)
    1 1 1 1
    4 0 0 0
    5 1 0 0
    6 1 1 0
    1 1 1 1
    1 1 1 1
    5 1 0 0
    1 1 1 1
    6 1 1 0
    5 1 0 0
    4 0 0 0
    1 1 1 1
    5 1 0 0
    6 1 1 0
    6 1 1 0
    end
    label values format format
    label def format 1 "all three formats", modify
    label def format 4 "no information", modify
    label def format 5 "online", modify
    label def format 6 "online and electronic", modify
    label values online dummy
    label values electronic dummy
    label values physical dummy
    label def dummy 0 "No", modify
    label def dummy 1 "Yes", modify

  • #2
    Originally posted by Lizzy Padhi View Post
    I am not currently interested in the combination of curriculum options, only whether or not each was provided, and I was wondering if anyone had any ideas on how to create a categorical variable (or something similar) that would accomplish this task?
    It appears that you already have a categorical variable that gives you all the combinations, although perhaps not exhaustive. For overlapping options, you create indicators as you have done. I am assuming that each observation in your dataset is a school district. Using tuples from SSC, your original categorical variable should have 7 levels, not including a "no information" category:

    Code:
    tuples online electronic physical
    macro list
    Res.:

    Code:
    _ntuples:       7
    _tuple7:        online electronic physical
    _tuple6:        online electronic
    _tuple5:        online physical
    _tuple4:        electronic physical
    _tuple3:        online
    _tuple2:        electronic
    _tuple1:        physical


    Eventually I want to be able to make summary tables (recognizing that my totals would be greater than 100%) that compare the formats to different characteristics, and I am finding that very difficult with the dummy variables.
    You should go directly to your objective and provide a data example including a specific characteristic, explaining what you expect. Making some assumptions, the following may be useful:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long format float(online electronic physical)
    1 1 1 1
    4 0 0 0
    5 1 0 0
    6 1 1 0
    1 1 1 1
    1 1 1 1
    5 1 0 0
    1 1 1 1
    6 1 1 0
    5 1 0 0
    4 0 0 0
    1 1 1 1
    5 1 0 0
    6 1 1 0
    6 1 1 0
    end
    label values format format
    label def format 1 "all three formats", modify
    label def format 4 "no information", modify
    label def format 5 "online", modify
    label def format 6 "online and electronic", modify
    label values online dummy
    label values electronic dummy
    label values physical dummy
    label def dummy 0 "No", modify
    label def dummy 1 "Yes", modify
    
    gen district=_n, before(format)
    set seed 03312022
    gen female_o= runiformint(10, 30) if online 
    gen female_e= runiformint(30, 50) if electronic
    gen female_p= runiformint(70, 90) if physical
    egen female= rowmax(female_?)  
    drop female_?
    rename (online electronic physical) var=
    reshape long var, i(district) j(cat) string
    gr bar female if var, over(cat) scheme(s1mono) bar(1, fcolor(black*0.4) lcolor(black)) ytitle("Female (%)")
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	28.6 KB
ID:	1657110
    Last edited by Andrew Musau; 31 Mar 2022, 01:40.

    Comment


    • #3
      Hi Andrew Musau ,

      Thank you so much for responding!

      Quick question about the tuples: did you only include bc my format didn't show all expected values? I do have 7 coded in my full dataset, but when I randomly pulled a sample through dataex it didn't include an observation covering everything. Just wanted to make sure I didn't miss anything re: that code.

      I was able to use your structure to create a table and graph indicating the values by internet quartile and included the code below. The only thing I struggled with was how to include the true total of schools in the column of each table, so 8, 4, 1, 2, respectively. If you have any insight on how I could make this work, I would love to know. If not, I'm not too worried about it right now.

      Again, thank you so much!!

      Code:
      clear
      input long format float(online electronic physical q_internet)
      1 1 1 1 0
      4 0 0 0 1
      5 1 0 0 2
      6 1 1 0 0
      1 1 1 1 1
      1 1 1 1 0
      5 1 0 0 3
      1 1 1 1 1
      6 1 1 0 0
      5 1 0 0 0
      4 0 0 0 0
      1 1 1 1 1
      5 1 0 0 0
      6 1 1 0 3
      6 1 1 0 0
      end
      label values format format
      label def format 1 "all three formats", modify
      label def format 4 "no information", modify
      label def format 5 "online", modify
      label def format 6 "online and electronic", modify
      label values online dummy
      label values electronic dummy
      label values physical dummy
      label def dummy 0 "No", modify
      label def dummy 1 "Yes", modify
      label values q_internet quartiles
      label def quartiles 0 "Bottom", modify
      label def quartiles 1 "BottomMid", modify
      label def quartiles 2 "TopMid", modify
      label def quartiles 3 "Top", modify
      
      gen district=_n, before(format)
      
      gen internet_o = (online==1)
      gen internet_e = (electronic==1) 
      gen internet_p = (physical==1) 
      
      egen internet=rowmax(internet_?)
      rename (online electronic physical) var=
      reshape long var, i(district) j(cat) string
      gr hbar (count) internet if var, over(cat) over(q_internet) scheme(s1mono) bar(1, fcolor(black*0.4) lcolor(black)) 
      
      
      tabout cat q_internet if var==1 using internet_ex.txt, cell(N internet) ptotal(none) sum replace






      Comment


      • #4
        Originally posted by Lizzy Padhi View Post

        Quick question about the tuples: did you only include bc my format didn't show all expected values?
        Correct


        The only thing I struggled with was how to include the true total of schools in the column of each table, so 8, 4, 1, 2, respectively. If you have any insight on how I could make this work, I would love to know. If not, I'm not too worried about it right now.
        You can create a total category and it's just a constant.

        Code:
        clear
        input long format float(online electronic physical q_internet)
        1 1 1 1 0
        4 0 0 0 1
        5 1 0 0 2
        6 1 1 0 0
        1 1 1 1 1
        1 1 1 1 0
        5 1 0 0 3
        1 1 1 1 1
        6 1 1 0 0
        5 1 0 0 0
        4 0 0 0 0
        1 1 1 1 1
        5 1 0 0 0
        6 1 1 0 3
        6 1 1 0 0
        end
        label values format format
        label def format 1 "all three formats", modify
        label def format 4 "no information", modify
        label def format 5 "online", modify
        label def format 6 "online and electronic", modify
        label values online dummy
        label values electronic dummy
        label values physical dummy
        label def dummy 0 "No", modify
        label def dummy 1 "Yes", modify
        label values q_internet quartiles
        label def quartiles 0 "Bottom", modify
        label def quartiles 1 "BottomMid", modify
        label def quartiles 2 "TopMid", modify
        label def quartiles 3 "Top", modify
        gen district=_n, before(format)
        gen total=1
        gen internet_o = (online==1)
        gen internet_e = (electronic==1)
        gen internet_p = (physical==1)
        egen internet=rowmax(internet_?)
        rename (online electronic physical total) var=
        reshape long var, i(district) j(cat) string
        gr hbar (count) internet if var, over(cat) over(q_internet) scheme(s1mono) bar(1, fcolor(black*0.4) lcolor(black))
        tabout cat q_internet if var==1 using internet_ex.txt, cell(N internet) ptotal(none) sum replace
        Added note: tabout is from SSC (FAQ Advice #12).

        Comment

        Working...
        X