Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Grouping observations into four dummy variables

    Hello, I've been struggling lately with how to create/group observations into four dummy variables in the attached data set. I want to group the programs into four dummy variables called LCT, Information, Lenient and Strict. I've watched countless videos and tried different commands and syntax but can't seem to figure it out. I would be incredibly grateful for some guidance and expertise. Kind regards, Jon. Please comment if you need more information.
    LCT Information Lenient Strict
    Kenya: CT-OVC Zimbabwe: Manicaland Argentina: AUHPC Cambodia: ESSS
    Indonesia: JPS Burkina Faso: OVC Malawi: CCT for Schooling Philippines: Pantawid
    Honduras: PRAF Nicaragua: SAC Tanzania: CCT Mexico: Progresa
    Programs Dominican Republic: PS Paraguay: Tekopor Colombia: Familias en Accion Mexico: Oportunidades
    Morocco: Tayssir Cambodia: JFPRS Indonesia: KH
    Bangladesh: Shombhob Cambodia: Scholarship Pilot Jamaica: PATH
    Brazil: PETI Nicaragua: RPS
    Colombia: SCB (x 3)
    Attached Files

  • #2
    Jonathan:
    if your data are in -long- format, you do not need four categorical variables, but a four-level categorical variable; see -help fvvarlist- for further details.
    As an aside, please note that posting screenshots is by no means the best approach to share what's the matter with your data (code, outcome or the like).
    Please take a look at the FAQ and see how to post more effectively.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hi Carlo, thank you for your reply. I attached screenshots as I'm very new to stata and econometrics in general, and don't know how to properly describe the data or issues with it. I'll elaborate a bit more on what I'm trying to do. I have a data set with 28 conditional cash transfer programs, and the effect sizes of each for primary and secondary school attendance. I want to see what happens to these effect sizes when the programs are grouped into the four categories listed before, which refer to the type of compliance mechanism employed to enforce conditionalities when children have poor attendance rates. My rough working model is as follows: Attendancei = a + βLCT’si + βEducationi + βLenienti + βStricti + χi + ei - where i refers to program, the four dummy variables refer to the type of enforcement, X is a vector of controls, and e is the error term. To this end I need to create four dummy variables in the data set on stata, but don't know how to code this.

      Comment


      • #4
        I think what Carlos was pointing you to was the Statalist FAQ (12.2 explains how to post subsets of your data). Posting your data with dataex may answer some of the questions below:

        --It seems you want to group different programs into 5 categories (LCT, Education, Lenient, Strict, something else). Do you already have a variable with this grouping? Is it numeric or string. If you have this grouping, life will be easy. You just need to enter it into your statistical model with factor variable notation rather than as dummy variables (essentially, Stata will automatically "dummy them out" for you). There are a lot of advantages to doing this instead of creating your own dummies. This is what Carlo suggests in #2 when he advises you to look at -help fvvarlist-. Your model might look something like:
        Code:
        regress attendance i.group_var
        --If you do not have a variable with the 5 category grouping, then you might have an issue. It looks like you have a variable with the name of the program as a string variable. Complicating things, it looks like the data were in Unicode but are now not (note the "boxes" instead of accented characters). The version of Stata you are using can affect the handling of Unicode characters. Do you have a numeric variable indicating the program (with or without labels)? If not, you are going to have to deal with this list of programs to create your grouping variable. This will take some effort.
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          There are no variables with this grouping, and no numeric variables indicating the program. There are numeric variables e.g. effect size, that are unique to each program, so perhaps these could be used to indicate each program? I want to group programs into the four categories listed, although I would presumably create three dummies to avoid multicollinearity?

          Comment


          • #6
            Could I simply create a new numerical variable called 'label' for each program (i.e. 1-28), and then enter something along the lines of...
            gen LCT = 0
            replace LCT = 1 if label = 4, 5, 8, 12, 26 etc

            Then repeat to create the other dummies?

            Comment


            • #7
              Apologies, here is an example using dataex:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str35 first_author str14 second_author str53 program_name str18 country int(year yrbaseline yrfollowup) byte yrs_treatment double(primaryattendanceef primaryattendancese) float(secondaryattendanceef secondaryattendancese)
              "Salvia, A."                          "Tu�on, I."      "Asignaci�n Universal por Hijo para Protecci�n Social"  "Argentina"      2014 2010 2012  2   .02199999988079071 .007000000216066837      .117        .04
              "Filmer, D."                          "Schady, N."     "Cambodia Education Sector Support Scholarship Program" "Cambodia"       2011 2005 2006  1                    .                   .      .278       .059
              "Ward, P."                            "Hurrell, A."    "Cash Transfer for Orphans and Vulnerable Children"     "Kenya"          2010 2007 2009  2 -.018200000748038292  .01140000019222498     .0507       .051
              "Baird, S."                           "McIntosh, C."   "CCT for Schooling in Malawi"                           "Malawi"         2011 2007 2009  2                    .                   .       .08       .035
              "Evans, D."                           "Hausladen, S."  "Tanzania Community Based CCT"                          "Tanzania"       2014 2009 2012  3 -.009999999776482582  .03999999910593033      -.01        .04
              "Departamento Nacional de Planeaci�n" "."              "Familias en Acci�n"                                    "Colombia"       2006 2002 2003  1   .03871169313788414 .028635870665311813 .17151244  .05859277
              "Filmer, D."                          "."              "Japan Fund for Poverty Reduction Scholarship Program"  "Cambodia"       2008 2004 2005  1                    .                   .      .313        .04
              "Sparrow, R."                         "."              "Jaring Pengamanan Social"                              "Indonesia"      2007 1998 1999  1  .012000000104308128 .006000000052154064      .018       .005
              "Robertson, L."                       "Mushati, P."    "Manicaland HIV/STD Prevention Project"                 "Zimbabwe"       2013 2009 2011  2   .07599999755620956 .032999999821186066      .104      .0033
              "Akresh, R. "                         "de Walque, D. " "Orphans and�Vulnerable�Children "                      "Burkina Faso"   2013 2008 2010  2    .1340000033378601  .04899999871850014         .          .
              "Parker, S."                          "Todd, P."       "Oportunidades"                                         "Mexico"         2006 2002 2003  1  .017000000923871994 .004999999888241291      .017       .005
              "Chaudhury, N."                       "Friedman, J. "  "Pantawid Pamilyang Pilipino Program"                   "Phillipines"    2013 2008 2011  3   .03799999877810478 .017000000923871994         .          .
              "Chaudhury, N."                       "Friedman, J. "  "Pantawid Pamilyang Pilipino Program"                   "Phillipines"    2013 2008 2011  3                    .                   . .06184186  .01448276
              "Barrera, F."                         "Filmer, D."     "Cambodia Scholarship Pilot (Merit Targeting)"          "Cambodia"       2012 2008 2010  2   .06850054115056992  .16699999570846558         .          .
              "Alatas, V."                          "."              "Keluarga Harapan"                                      "Indonesia"      2011 2007 2009  2 -.004999999888241291 .009999999776482582      .006        .01
              "De Souza, P."                        "Olinto, P."     "Programa de Asignaci�n Familiar II"                    "Honduras"       2005 2000 2002  2   .04600000008940697 .012000000104308128         .          .
              "Cardoso, E."                         "Souza, A."      "Programa Erradicacao do Trabalho Infantil"             "Brazil"         2004 1992 2002 10  .030500000342726707 .001414214028045535     .0305 .001414214
              "Dominican Republic Government"       "No author"      "Programa Solidaridad"                                  "Dominican Rep." 2008 2002 2007  5  .010999999940395355  .03099999949336052      .145       .059
              "Levy, D."                            "Ohls, J."       "Program of Advancement Through Health and Education"   "Jamaica"        2010 2004 2005  1   .02500000037252903 .010999999940395355      .025       .011
              "Skoufias, E."                        "Parker, S."     "Progresa"                                              "Mexico"         2001 1997 1999  2  .011093960143625736 .005734622944146395     .0765 .014849242
              "Dammert, A."                         "."              "Red de Protecci�n Social"                              "Nicaragua"      2009 2000 2002  2    .1293846219778061 .016641005873680115         .          .
              "Macours, K."                         "Vakis, R."      "Sistema de Atenci�n en Crisis"                         "Nicaragua"      2009 2005 2006  1   .05000000074505806  .01899999938905239       .05       .019
              "Barrera, F."                         "Bertrand, M"    "Subsidios Condicionados Bogota (Basic)"                "Colombia"       2011 2005 2006  1                    .                   .      .032       .007
              "Barrera, F."                         "Bertrand, M"    "Subsidios Condicionados Bogota (Savings)"              "Colombia"       2011 2005 2006  1                    .                   .      .027       .007
              "Barrera, F."                         "Bertrand, M"    "Subsidios Condicionados Bogota (Tertiary)"             "Colombia"       2011 2005 2006  1                    .                   .      .056        .02
              "Benhassine, N."                      "Devoto, F."     "Tayssir"                                               "Morocco"        2013 2007 2009  2  .014000000432133675 .014700000174343586         .          .
              "Perez, R."                           "Veras, F."      "Tekopor�"                                              "Paraguay"       2011 2005 2006  1  .032999999821186066 .029999999329447746         .          .
              "Ferr�, C."                           "Sharif, I."     "Shombhob Project"                                      "Bangladesh"     2014 2012 2013  1                .0365 .027857142857142858         .          .
              end

              Comment


              • #8
                Yes, what you suggest would be doable. But,:
                1) Make sure that the spelling is consistent for the program names.
                Code:
                tab program_name
                Do you get 28 categories? If Stata properly groups the same programs together in the tab command, then you can use
                Code:
                egen prog_num=group(program_name), label
                And from there you recode into dummy variables if you like.

                2) Be careful of missing values when creating dummy variables. If there are missing values on program_name, you need to decide whether these should be missing on all dummies or into another category and code accordingly:
                Code:
                gen LCT=0
                replace LCT=1 if inlist(prog_num, 4, 5, 8, 10, 28)
                replace LCT=. if prog_num==.
                Stata/MP 14.1 (64-bit x86-64)
                Revision 19 May 2016
                Win 8.1

                Comment


                • #9
                  Thank you Carole, your code has worked perfectly!

                  Comment

                  Working...
                  X