I'm not so familiar with Stata's data management commands and this one has been really tough to figure out with the manual. Below I included a data set of responses college students submitted to a questionnaire. You'll see various meta columns denoting how they took the survey, hours of computer use etc. I've managed pretty well so far but having trouble with the Undergraduate Major component. This was maybe a flaw in my survey design, but I let the participants type in text to denote their Major. So you'll see numerous different ways students can write "Political Science" including for ex POSCI, or POSC; and similarly for almost all the majors many students abbreviated the title. So it creates a problem for me to simply encode major, gen(newvar) -- which would be how I naturally would want to go about it. How do I got about the easiest way to fix closely related words? Almost a find a replace would work but perhaps more effective? In addition to that, my second problem is after I fix the names and get all majors correctly unified and spelled homogeneously I need to further group them. My sample size isn't nearly large enough to test significance of individual majors against each other. So I wanted to group them into larger categories for example "Humanities" to include the following Majors "etc, etc". Or better yet, is there a way to have Stata examine which groups might belong together based on the outcome variable? I'm thinking theoretically something between an anova and factor analysis to give a more objective idea of which majors "move together"?
Terribly sorry about such a convoluted question but any help is very much appreciated.
Kind regards,
Ali
Terribly sorry about such a convoluted question but any help is very much appreciated.
Kind regards,
Ali
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double osdi int byr str53 mjr str13 browser str15 os str9 BrowserMetaInfoScreenResolut str3 idrps str11 cpu float age long female byte col_age 11.3636363636363 50 "Theater " "Safari iPhone" "iPhone" "375x667" "" "6 - 8 hours" 50 1 0 14.5833333333333 1986 "Political Science" "Chrome" "Windows NT 10.0" "1707x960" "" "6 - 8 hours" 31 1 0 4.16666666666666 1990 "Biomedical engineering " "Chrome" "Android 7.0" "360x740" "" "6 - 8 hours" 27 0 1 45.8333333333333 1983 "music" "Chrome" "Macintosh" "1280x800" "" "6 - 8 hours" 34 1 0 60.4166666666666 1990 "Animation and Digital Arts" "Chrome" "Macintosh" "2560x1440" "" "6 - 8 hours" 27 1 1 6.81818181818181 1991 "Environmental Studies" "Chrome" "Windows NT 10.0" "1536x864" "" "4 - 6 hours" 26 1 1 0 1989 "Engineering" "Chrome" "Macintosh" "1680x1050" "" "6 - 8 hours" 28 1 1 14.5833333333333 44 "Political Science" "Safari" "Macintosh" "1440x900" "" "4 - 6 hours" 44 1 0 4.16666666666666 1987 "Theatre" "Safari iPhone" "iPhone" "414x736" "" "6 - 8 hours" 30 1 1 27.7777777777777 1990 "Molecular biology " "Safari iPhone" "iPhone" "320x568" "" "6 - 8 hours" 27 1 1 62.5 1990 "Communications " "Safari iPhone" "iPhone" "414x736" "" "6 - 8 hours" 27 1 1 6.25 1980 "Computer science " "Safari iPhone" "iPhone" "375x667" "No" "4 - 6 hours" 37 1 0 20 1990 "Anthropology and Spanish" "Chrome" "Windows NT 10.0" "1920x1080" "No" "8 - 9 hours" 27 1 1 6.25 1995 "Accounting" "Safari iPhone" "iPhone" "320x568" "No" "8 - 9 hours" 22 1 1 27.7777777777777 1996 "Human Biology" "Chrome iPhone" "iPhone" "375x667" "Yes" "4 - 6 hours" 21 1 1 20 1999 "Psychology/Spanish" "Safari iPhone" "iPhone" "375x667" "Yes" "2 - 3 hours" 18 0 1 6.81818181818181 1996 "Sociology" "Chrome iPhone" "iPhone" "375x667" "No" "4 - 6 hours" 21 0 1 13.6363636363636 1996 "Human Biology and Spanish" "Chrome" "Macintosh" "1280x800" "No" "2 - 3 hours" 21 1 1 20.4545454545454 1985 "Media Studies" "Firefox" "Macintosh" "1440x900" "No" "6 - 8 hours" 32 1 0 4.16666666666666 1996 "POSC, LHC, SPAN" "Chrome" "Windows NT 10.0" "1366x768" "Yes" "2 - 3 hours" 21 1 1 29.5454545454545 1996 "Law, History, and Culture; Spanish" "Safari iPhone" "iPhone" "375x667" "No" "4 - 6 hours" 21 1 1 38.6363636363636 1997 "International Relations and Spanish " "Chrome" "Windows NT 10.0" "1366x768" "Yes" "6 - 8 hours" 20 1 1 11.3636363636363 1996 "POSC" "Safari iPhone" "iPhone" "320x568" "Yes" "8 - 9 hours" 21 1 1 45 1998 "Anthropology" "Safari iPhone" "iPhone" "320x568" "Yes" "4 - 6 hours" 19 1 1 4.16666666666666 1995 "human biology" "Chrome" "Macintosh" "1280x800" "No" "4 - 6 hours" 22 0 1 10.4166666666666 1991 "Political Science & Communication " "Safari iPhone" "iPhone" "414x736" "Yes" "10 hours" 26 0 1 29.5454545454545 1996 "Philosophy" "Chrome" "Android 6.0" "360x640" "Yes" "6 - 8 hours" 21 0 1 10.4166666666666 1994 "Anthropology" "Safari iPhone" "iPhone" "375x667" "No" "10 hours" 23 1 1 83.3333333333333 1997 "English and Anthro" "Safari iPhone" "iPhone" "375x667" "Yes" "6 - 8 hours" 20 1 1 8.33333333333333 1986 "sociology " "Safari iPhone" "iPhone" "320x568" "Yes" "8 - 9 hours" 31 1 0 6.25 1993 "Electronics and communications" "Chrome iPhone" "iPhone" "375x667" "Yes" "10 hours" 24 1 1 6.25 1982 "Music composition" "Safari iPad" "iPad" "768x1024" "No" "4 - 6 hours" 35 1 0 16.6666666666666 1995 "Sociology" "MSIE" "Windows NT 10.0" "1680x1050" "Yes" "8 - 9 hours" 22 1 1 12.5 1981 "Sociology" "Chrome" "Macintosh" "1920x1080" "Yes" "10 hours" 36 1 0 13.6363636363636 1996 "Political science" "Safari iPhone" "iPhone" "375x667" "No" "4 - 6 hours" 21 1 1 30 1987 "political science" "Chrome" "Windows NT 10.0" "1536x864" "Yes" "10 hours" 30 1 1 37.5 1996 "Sociology/Social Psychology" "Safari" "Macintosh" "1440x900" "Yes" "6 - 8 hours" 21 1 1 18.75 1994 "human biology" "Safari" "Macintosh" "1280x800" "No" "8 - 9 hours" 23 0 1 91.6666666666666 1996 "Anthropology" "Safari iPhone" "iPhone" "375x667" "Yes" "10 hours" 21 1 1 15 1983 "International Development Studies" "Chrome" "Macintosh" "1280x800" "No" "4 - 6 hours" 34 0 0 25 1986 "Women's Studies" "Chrome" "Macintosh" "1366x768" "Yes" "6 - 8 hours" 31 1 0 77.5 1992 "Communication" "Chrome" "Macintosh" "1280x800" "Yes" "6 - 8 hours" 25 0 1 14.5833333333333 1996 "Sociology" "Chrome" "Macintosh" "1920x1080" "No" "4 - 6 hours" 21 1 1 33.3333333333333 1995 "Dramatic Arts (Acting)" "Safari iPhone" "iPhone" "375x667" "Yes" "8 - 9 hours" 22 0 1 20.4545454545454 1995 "Psychology / neuroscience" "Safari iPhone" "iPhone" "414x736" "Yes" "10 hours" 22 1 1 8.33333333333333 1980 "Psychology " "Safari iPhone" "iPhone" "375x667" "No" "4 - 6 hours" 37 1 0 6.25 1985 "Creative writing " "Safari iPhone" "iPhone" "320x568" "No" "4 - 6 hours" 32 1 0 50 1992 "Biology" "Chrome" "Macintosh" "1280x800" "Yes" "2 - 3 hours" 25 1 1 18.75 1996 "BFA Acting" "Safari iPhone" "iPhone" "375x667" "Yes" "4 - 6 hours" 21 0 1 18.75 1980 "engineering" "Chrome" "Windows NT 10.0" "1280x720" "Yes" "8 - 9 hours" 37 1 0 8.33333333333333 1988 "Neuroscience " "Safari iPhone" "iPhone" "375x667" "Yes" "4 - 6 hours" 29 1 1 25 1996 "dramatic arts" "Safari iPhone" "iPhone" "375x667" "Yes" "6 - 8 hours" 21 1 1 16.6666666666666 1988 "Computer science" "Chrome" "Windows NT 6.1" "1920x1080" "Yes" "10 hours" 29 0 1 end label values female female label def female 0 "Male", modify label def female 1 "Female", modify
Comment