Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cut name variable into two variables to get caste

    Hello

    I am trying to segregate the name variable into two separate name and caste variables. (The segregation is to get a list of castes so as to classify them according to affirmative action policy in India).

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str77 farmersname
    "AJAJ"                              
    "IRFAN"                            
    "MANTI DEVI"                        
    "MD LATIF"                          
    "MD. JABIR"                                          
    "SURESH KR. RAM"                    
    "Umesh sharma urf Umesh kumar singh"
    "ASHFAQUE"                          
    "HALIMUDDIN"                        
    "IMTIYAZ ANSARI"                    
    "MOHAMMAD RIZWAN"                  
    "MUJAHID ALAM"                      
    "SALAHUDDIN ANSARI"                                          
    "JOHARA KHATUN"                    
    "MD .AKHTAR"                        
    "MD AKHATAR"                        
    "MD FAKHRUDDIN"                    
    "md ismail"                        
    "Md Nasim Akhtar"                                
    "URFAN"                            
    "VAKAR AHMAD"                      
    "Yoganand sah"                      
    "YOGANAND SAH"                      
    "Arun Kumar"                        
    "ISARAR"                                                    
    "KHURSHID ALAM"                    
    "MATIN"                            
    "MD MANJAR ALAM"                                  
    "MD ZAHID HUSSAIN"                  
    "MD. SHOEB ALALM"                  
    "NIYAJ"                                        
    "ARSHABA KHATUN"                                              
                                
    end
    The issue I am facing is that some entries in the variable only have the name and not the caste. The variable has to be cut in a way such that only the last word enters the to-be created caste variable(eg, in MD LATIF, LATIF would be the caste). So, that in observations where only the first word appears, it becomes an entry in the name variable and the caste variable goes empty (eg, AJAJ would enter the name variable and the caste variable would be empty).
    I can't understand if substring command can work here. Or does this require a different approach ?
    Please help.

    Thanks
    Smriti
    Last edited by Saini Smriti; 26 Apr 2019, 01:20.

  • #2
    the following gives me a new variable with the last word of your variable iff there is more than one word:
    Code:
    gen numw=wordcount(farmersname)
    gen caste=word(farmersname,numw) if numw>1
    for more on these functions, see:
    Code:
    help word

    Comment


    • #3
      Hello Rich

      This works well.

      Thank You
      Smriti

      Comment

      Working...
      X