Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating Composite Variable (from non-mutually exclusive categorial variables)

    Hi All,

    I am quite new to Stata. I have five variables which each ask whether cannabis has been consumed in a certain way in the past 12 months. They are: vaped, dabbed, smoked, eaten, and drank. Obviously these are not mutually exclusive (i.e. if someone has vaped that does not mean they haven't used cannabis by any other method in the past 12 months). I am wondering if there is a way to create a meaningful composite variable? I would like to know which of these methods a person has used in the past 12 months. Is there a way to create a composite variable to measure this? I imagine it would have to be a variable with a bunch of different categories to account for different drug combinations. In such a variable, my guess is that the cell count would be super low for certain sections (like dabbed, vaped, and smoked, for example). Is there anything I can do here?

    Best,
    Nick
    Last edited by Nick Wineberg; 23 Nov 2020, 09:13.

  • #2
    One approach is suggested in this article in the Stata Journal. You could value label each of your categorical variables with something like:

    Code:
    label define Vaped 0 "Did not vape" 1 "Vaped"
    label values vaped Vaped
    And then use:

    Code:
    order vaped dabbed smoked eaten drank
    egen wanted = group(vaped-drank),label
    To generate a composite variable listing the ways each individual consumed/did not consume cannabis throughout the past 12 months.
    Last edited by Ali Atia; 23 Nov 2020, 09:48.

    Comment


    • #3
      The Stata aspect of this is almost certainly a non-issue: Whatever you want to do can be done. A helpful answer, though, especially for a person new to Stata, will require seeing an example from your data set. To learn how to generate a useful example, take another look at the StataList FAQ for new participants, and search for -dataex-, a command that provides a convenient way to supply a data example. Seeing your example will enable an answer that countenances the data types and variable names in your data set.

      That said, the *real* issue is deciding what you and people in your research area think is a meaningful way to measure cannabis use, in the context of your research problem. For example, other researchers might use a simple count of different modes of cannabis use, or they might weight certain modes higher, etc. I would not suspect that StataList is a great place to get advice from other cannabis researchers, but it is a good place to learn how to create whatever measure you want, given knowledge of the structure of your data set.

      Comment


      • #4
        Originally posted by Ali Atia View Post
        One approach is suggested in this article in the Stata Journal. You could value label each of your categorical variables with something like:

        Code:
        label define Vaped 0 "Did not vape" 1 "Vaped"
        label values vaped Vaped
        And then use:

        Code:
        order vaped dabbed smoked eaten drank
        egen wanted = group(vaped-drank),label
        To generate a composite variable listing the ways each individual consumed/did not consume cannabis throughout the past 12 months.
        Thanks Ali, this was super helpful!

        Comment


        • #5
          I agree with Mike Lacy and cannot myself offer substantive expertise or experience in this territory.

          In Stata terms a variable like this might also be useful, especially for tables and graphs. The order of variables might go better from most common to least common.
          Code:
          gen habits = "" 
          
          foreach v in vaped dabbed smoked eaten drank {   
                replace habits = habits + "`v' " if `v' == 1 
          } 
          
          replace habits = trim(habits) 
          replace habits = "(none)" if missing(habits)

          Comment

          Working...
          X