Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can i split a variable?

    Hello everybody

    I have a problem I am looking for how to solve it. In my database I have a variable that contains several observations that I want to split and that each
    observation becomes a new binary variable to say 1 for yes and 0 for no. Attach the screenshot.
    thank you for helping me
    Attached Files

  • #2
    Your picture is not useful for testing code, so the example below uses invented data and you will learn from it and modify it to solve your problem.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int id str15 names
    101 "Alice Bob"      
    102 "Alice Chris"    
    103 "Bob Alice"      
    104 "Alice Bob Chris"
    105 "Fred"          
    end
    split names, generate(name)
    list, clean
    reshape long name, i(id) j(j)
    drop if missing(name)
    drop j
    list, clean
    generate byte value = 1
    reshape wide value, i(id) j(name) string
    list, clean abbreviate(12)
    mvencode value*, mv(0)
    rename (value*) (*)
    list, clean
    Code:
    . split names, generate(name)
    variables created as string:
    name1  name2  name3
    
    . list, clean
    
            id             names   name1   name2   name3  
      1.   101         Alice Bob   Alice     Bob          
      2.   102       Alice Chris   Alice   Chris          
      3.   103         Bob Alice     Bob   Alice          
      4.   104   Alice Bob Chris   Alice     Bob   Chris  
      5.   105              Fred    Fred                  
    
    . reshape long name, i(id) j(j)
    (note: j = 1 2 3)
    
    Data                               wide   ->   long
    -----------------------------------------------------------------------------
    Number of obs.                        5   ->      15
    Number of variables                   5   ->       4
    j variable (3 values)                     ->   j
    xij variables:
                          name1 name2 name3   ->   name
    -----------------------------------------------------------------------------
    
    . drop if missing(name)
    (5 observations deleted)
    
    . drop j
    
    . list, clean
    
            id             names    name  
      1.   101         Alice Bob   Alice  
      2.   101         Alice Bob     Bob  
      3.   102       Alice Chris   Alice  
      4.   102       Alice Chris   Chris  
      5.   103         Bob Alice     Bob  
      6.   103         Bob Alice   Alice  
      7.   104   Alice Bob Chris   Alice  
      8.   104   Alice Bob Chris     Bob  
      9.   104   Alice Bob Chris   Chris  
     10.   105              Fred    Fred  
    
    . generate byte value = 1
    
    . reshape wide value, i(id) j(name) string
    (note: j = Alice Bob Chris Fred)
    
    Data                               long   ->   wide
    -----------------------------------------------------------------------------
    Number of obs.                       10   ->       5
    Number of variables                   4   ->       6
    j variable (4 values)              name   ->   (dropped)
    xij variables:
                                      value   ->   valueAlice valueBob ... valueFred
    -----------------------------------------------------------------------------
    
    . list, clean abbreviate(12)
    
            id   valueAlice   valueBob   valueChris   valueFred             names  
      1.   101            1          1            .           .         Alice Bob  
      2.   102            1          .            1           .       Alice Chris  
      3.   103            1          1            .           .         Bob Alice  
      4.   104            1          1            1           .   Alice Bob Chris  
      5.   105            .          .            .           1              Fred  
    
    . mvencode value*, mv(0)
      valueAlice: 1 missing value recoded
        valueBob: 2 missing values recoded
      valueChris: 3 missing values recoded
       valueFred: 4 missing values recoded
    
    . rename (value*) (*)
    
    . list, clean
    
            id   Alice   Bob   Chris   Fred             names  
      1.   101       1     1       0      0         Alice Bob  
      2.   102       1     0       1      0       Alice Chris  
      3.   103       1     1       0      0         Bob Alice  
      4.   104       1     1       1      0   Alice Bob Chris  
      5.   105       0     0       0      1              Fred
    To improve the quality of your future posts, please now take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    Added afterwards: I see now that some of your data contains names that are not likely to work as Stata variable names. Perhaps after the first reshape you can include a acommand that will alter the inappropriate values. Perhaps something like this will help
    Code:
    replace name = ustrtoname(name)
    although I have not tested it on your data because I do not have your data.

    Another reason to provide usable example data with your question.
    Last edited by William Lisowski; 14 Mar 2020, 18:19.

    Comment


    • #3
      Hello William

      Thanks a lot for your help.
      I take note.

      Comment


      • #4
        Hello William

        please what is the Id and J in this piece of code .reshape long name, i (id) j (j)? I can't quite understand it and that's where I'm stuck.
        thank you for helping me

        Comment


        • #5
          Have you read the output of help reshape?

          The i(id) option specifies that the existing variable id in my example data is the required distinct identifier for each observation. If you don't have an identifier (or several variables that taken together are a distinct identifier, you can create one with
          Code:
          generate id = _n
          which assigns the observation number to the identifier variable. The variable does not need to be named id.

          The j(j) option creates a variable in the reshaped data that indicates whether in the reshaped data the name came from name1 or from name2 or from name3 (in this example). That is of no use for what we are doing, so I drop it from the dataset. Again the variable does not need to be named j.

          Comment


          • #6
            I am a beginner on stata, please I send you my database if you can help me to divide q404_ou_avezvous_entendu that each modality becomes a new variable and join the do file it is from do file that j will have the ease to follow your scheme and understand the process. these different modalities are separated by space. please
            Attached Files

            Comment


            • #7
              Thank you
              With great difficulty I succeeded, it works now.
              God bless you.

              You can always send me the do file so that I can compare the two results. thank you very much once again.

              Comment


              • #8
                Here is my code, with annotation.
                Code:
                // note the option "encoding(utf8)" to correctly read non-ASCII characters
                import delimited "~/Downloads/Base des données étude CAP Tesky KOBA ESP 2020.csv", encoding(utf8)
                // generate an id variable for reshape
                generate int id = _n
                // split into entendu1 entendu2 ... entendu10 for the 10 possible values
                split q404_ou_avezvous_entendu, generate(entendu)
                // reshape long - 10 observations for each original observation - one for each value
                reshape long entendu, i(id) j(j)
                // drop observations with a blank value - because there weren't that many values given
                drop if entendu==""
                // we don't need the j variable - it doesn't matter what order the values were in
                drop j
                // generate an indicator variable
                generate byte entendu_ = 1
                // reshape wide - one observation for each id, as it was when we began
                reshape wide entendu_, i(id) j(entendu) string
                // replace missing values with 0 - those were the values that were not chosen
                mvencode entendu_*, mv(0)
                // don't need the id variable - it was only to put the pieces together again
                drop id
                // reshape assigns value labels that are not helpful to you, so we remove them
                foreach v of varlist entendu_* {
                    label variable `v'
                    }
                // this is what we have
                describe entendu*, fullnames
                Here are the results.
                Code:
                . // note the option "encoding(utf8)" to correctly read non-ASCII characters
                . import delimited "~/Downloads/Base des données étude CAP Tesky KOBA ESP 2020.csv", encoding(
                > utf8)
                (184 vars, 348 obs)
                
                . // generate an id variable for reshape
                . generate int id = _n
                
                . // split into entendu1 entendu2 ... entendu10 for the 10 possible values
                . split q404_ou_avezvous_entendu, generate(entendu)
                variables created as string: 
                entendu1   entendu3   entendu5   entendu7   entendu9
                entendu2   entendu4   entendu6   entendu8   entendu10
                
                . // reshape long - 10 observations for each original observation - one for each value
                . reshape long entendu, i(id) j(j)
                (note: j = 1 2 3 4 5 6 7 8 9 10)
                
                Data                               wide   ->   long
                -----------------------------------------------------------------------------
                Number of obs.                      348   ->    3480
                Number of variables                 195   ->     187
                j variable (10 values)                    ->   j
                xij variables:
                        entendu1 entendu2 ... entendu10   ->   entendu
                -----------------------------------------------------------------------------
                
                . // drop observations with a blank value - because there weren't that many values given
                . drop if entendu==""
                (2,453 observations deleted)
                
                . // we don't need the j variable - it doesn't matter what order the values were in
                . drop j
                
                . // generate an indicator variable
                . generate byte entendu_ = 1
                
                . // reshape wide - one observation for each id, as it was when we began
                . reshape wide entendu_, i(id) j(entendu) string
                (note: j = amies autres ecole eglise frères lecture_personnelle nesaitpas parents personnel_medi
                > cal radio_tv reco reseaux_sociaux sœurs)
                
                Data                               long   ->   wide
                -----------------------------------------------------------------------------
                Number of obs.                     1027   ->     342
                Number of variables                 187   ->     198
                j variable (13 values)          entendu   ->   (dropped)
                xij variables:
                                               entendu_   ->   entendu_amies entendu_autres ... entendu_sœurs
                -----------------------------------------------------------------------------
                
                . // replace missing values with 0 - those were the values that were not chosen
                . mvencode entendu_*, mv(0)
                entendu_am~s: 139 missing values recoded
                entendu_au~s: 336 missing values recoded
                entendu_ec~e: 191 missing values recoded
                entendu_eg~e: 296 missing values recoded
                entendu_fr~s: 324 missing values recoded
                entendu_le~e: 229 missing values recoded
                entendu_ne~s: 329 missing values recoded
                entendu_pa~s: 293 missing values recoded
                entendu_pe~l: 247 missing values recoded
                entendu_ra~v: 212 missing values recoded
                entendu_reco: 330 missing values recoded
                entendu_re~x: 205 missing values recoded
                entendu_sœ~s: 288 missing values recoded
                
                . // don't need the id variable - it was only to put the pieces together again
                . drop id
                
                . // reshape assigns value labels that are not helpful to you, so we remove them
                . foreach v of varlist entendu_* {
                  2.     label variable `v'
                  3.     }
                
                . // this is what we have
                . describe entendu*, fullnames
                
                              storage   display    value
                variable name   type    format     label      variable label
                ------------------------------------------------------------------------------------------------
                entendu_amies   byte    %8.0g                 
                entendu_autres  byte    %8.0g                 
                entendu_ecole   byte    %8.0g                 
                entendu_eglise  byte    %8.0g                 
                entendu_frères  byte    %8.0g                 
                entendu_lecture_personnelle
                                byte    %8.0g                 
                entendu_nesaitpas
                                byte    %8.0g                 
                entendu_parents byte    %8.0g                 
                entendu_personnel_medical
                                byte    %8.0g                 
                entendu_radio_tv
                                byte    %8.0g                 
                entendu_reco    byte    %8.0g                 
                entendu_reseaux_sociaux
                                byte    %8.0g                 
                entendu_sœurs   byte    %8.0g

                Comment


                • #9
                  Thank you very much William for your help. God bless you.
                  Everything is clear now thanks you.

                  Comment

                  Working...
                  X