Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a dummy variable - new to Stata

    Hi everyone, Masters Economics student here, struggling with using Stata. Probably a very simple question to those that are competent in using Stata, but it's got me confused.

    I have a dataset (British Household Panel Survey), with an independent variable "qmastat", which is an individual's self-reported marital status. The variable is numeric and the measurements are nominal. The values are as follows:

    - 9 = missing/wild
    - 8 = inapplicable
    - 2 = refused
    - 1 = not answered
    0 = child under 16
    1 = married
    2 = living as couple
    3 = widowed
    4 = divorced
    5 = separated
    6 = never married
    7 = civil partnership
    8 = dissolved civil partnership

    I am trying to create a new dummy variable called "married", which should take on the value of 1 if married (qmastat = 1), 0 if not married (qmastat = 2, 3, 4, 5, 6, 7 or 8), and . if missing (qmastat < 0).

    I know that if I were to create a simpler dummy variable (such as gender), I would use the commands: generate female = 1 if qsex == 2 ; replace female = 0 if qsex == 1; replace female = . if qsex < 0.

    Please could someone offer some advice with regards to how I would create a dummy variable for married? I'm guessing it would be something along the lines of generate married = 1 if qmastat = 1; replace married = 0 if qmastat > 1 ?

    Also, if i were, for example, to want to create a similar variable for married or living as a couple (qmastat = 1 or 2), how would I go about programming that?

    Many thanks in advanced.

    Will

  • #2
    -help recode-

    Comment


    • #3
      Code:
      gen married = qmastat == 1 if qmastat > 0
      assigns 1 if married, 0 if known not to be married and missing otherwise.

      Code:
      gen married2 = qmastat == 1
      lumps the unknowns in with the known non-marrieds.

      You do can both of these with a single generate; no harm in the replace, but no need for it either.

      Note that

      Code:
      search dummy variable
      points to resources, e.g.

      FAQ . . . . . . . . . . . . . . . . . . . . . . . Creating dummy variables
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. Gould
      7/16 How do I create dummy variables?
      http://www.stata.com/support/faqs/data-management/
      creating-dummy-variables/



      Comment


      • #4
        Did you try:

        gen d_married = 0 // Generates a zero for all values of qmastat given qmastat exists for all variables.
        replace d_married = 1 if qmastat == 1 // Generates the dummy if a individual reported to be married.


        In general Stata Syntax needs == to read = with the if condition.

        Similar:

        gen d_couple = 0
        replace d_couple = 1 if qmastat == 1 | qmastat == 2

        | reads as or in the Stata Syntax. Do not confuse it with & (binding and). For more arguments also see the "inlist" command via: help inlist

        Comment


        • #5
          Fantastic - thanks very much for the help everyone!

          How would this change if I just wanted to create a dummy variable based on one measurement in the original variable? If I have a question in the dataset on highest academic qualification attained (numeric variable, nominal measurement), called qqfachi, which takes on the following measurements:

          - 9 = Missing/wild
          - 9 = Inapplicable
          - 7 = Proxy respondent
          1 = Higher degree
          2 = 1st degree
          3 = Vocational
          4 = Advance level
          3 = Ordinary level
          6 = CSE
          7 = No formal qualification

          If I wanted co create a dummy called degree for those who hold a degree, would it be:

          Code:
          gen degree = 1 if qqfachi == 2
          replace degree = 0 if qqfachi == 1 | > 2
          replace degree = . if qqfachi < 0

          Sorry for all the questions -only just beginning to get used to this. Many thanks in advance

          Comment


          • #6
            Using the generate and replace approach, the following is clearest.
            Code:
            generate degree = 0
            replace degree = 1 if qqfachi==2
            replace degree = . if qqafchi<0
            The code you provide will not work because your or condition is specified incorrectly. To correct your code,
            Code:
            gen degree = 1 if qqfachi == 2
            replace degree = 0 if qqfachi == 1 | qqfachi> 2
            And the last line is not needed because the generate command will set qqfachi missing if it is not 2.

            Comment


            • #7
              Good afternoon to everyone,
              I need your help for a similar problem. I have a panel dataset and I need to create a dummy variable standing for the economic shock. I cannot figure out how to create a dummy variable taking value = 1 when current economic growth exceeds 3% lagged economic growth.

              The following is an example of my dataset:
              Code:
              input str32 origin_name int year float growth_rate_origin str32 destination_name float(growth_rate_destination origin_gdp_growth destination_gdp_growth lgrowth_rate_origin)
              "Afghanistan" 1980     . "Afghanistan"     . . .     .
              "Afghanistan" 1981     . "Afghanistan"     . . .     .
              "Afghanistan" 1982     . "Afghanistan"     . . .     .
              "Afghanistan" 1983     . "Afghanistan"     . . .     .
              "Afghanistan" 1984     . "Afghanistan"     . . .     .
              "Afghanistan" 1985     . "Afghanistan"     . . .     .
              "Afghanistan" 1986     . "Afghanistan"     . . .     .
              "Afghanistan" 1987     . "Afghanistan"     . . .     .
              "Afghanistan" 1988     . "Afghanistan"     . . .     .
              "Afghanistan" 1989     . "Afghanistan"     . . .     .
              "Afghanistan" 1990     . "Afghanistan"     . . .     .
              "Afghanistan" 1991     . "Afghanistan"     . . .     .
              "Afghanistan" 1992     . "Afghanistan"     . . .     .
              "Afghanistan" 1993     . "Afghanistan"     . . .     .
              "Afghanistan" 1994     . "Afghanistan"     . . .     .
              "Afghanistan" 1995     . "Afghanistan"     . . .     .
              "Afghanistan" 1996     . "Afghanistan"     . . .     .
              "Afghanistan" 1997     . "Afghanistan"     . . .     .
              "Afghanistan" 1998     . "Afghanistan"     . . .     .
              "Afghanistan" 1999     . "Afghanistan"     . . .     .
              "Afghanistan" 2000     . "Afghanistan"     . . .     .
              "Afghanistan" 2001     . "Afghanistan"     . . .     .
              "Afghanistan" 2002     . "Afghanistan"     . . .     .
              "Afghanistan" 2003   8.7 "Afghanistan"   8.7 . .     .
              "Afghanistan" 2004    .7 "Afghanistan"    .7 . .   8.7
              "Afghanistan" 2005  11.8 "Afghanistan"  11.8 . .    .7
              "Afghanistan" 2006   5.4 "Afghanistan"   5.4 . .  11.8
              "Afghanistan" 2007  13.3 "Afghanistan"  13.3 . .   5.4
              "Afghanistan" 2008   3.9 "Afghanistan"   3.9 . .  13.3
              "Afghanistan" 2009  20.6 "Afghanistan"  20.6 . .   3.9
              "Afghanistan" 2010   8.4 "Afghanistan"   8.4 . .  20.6
              "Afghanistan" 2011   6.5 "Afghanistan"   6.5 . .   8.4
              "Afghanistan" 2012    14 "Afghanistan"    14 . .   6.5
              "Afghanistan" 2013   5.7 "Afghanistan"   5.7 . .    14
              "Afghanistan" 2014   2.7 "Afghanistan"   2.7 . .   5.7
              "Afghanistan" 2015     1 "Afghanistan"     1 . .   2.7
              "Afghanistan" 2016   2.2 "Afghanistan"   2.2 . .     1
              "Afghanistan" 2017   2.7 "Afghanistan"   2.7 . .   2.2
              "Afghanistan" 2018   2.3 "Afghanistan"   2.3 . .   2.7
              "Afghanistan" 2019     3 "Afghanistan"     3 . .   2.3
              "Afghanistan" 2020   3.5 "Afghanistan"   3.5 . .     3
              "Afghanistan" 2021     4 "Afghanistan"     4 . .   3.5
              "Afghanistan" 2022   4.5 "Afghanistan"   4.5 . .     4
              "Afghanistan" 2023     5 "Afghanistan"     5 . .   4.5
              "Albania"     1980   2.7 "Albania"       2.7 . .     5
              "Albania"     1981   5.7 "Albania"       5.7 . .   2.7
              "Albania"     1982   2.9 "Albania"       2.9 . .   5.7
              "Albania"     1983   1.1 "Albania"       1.1 . .   2.9
              "Albania"     1984     2 "Albania"         2 . .   1.1
              "Albania"     1985  -1.5 "Albania"      -1.5 . .     2
              "Albania"     1986   5.6 "Albania"       5.6 . .  -1.5
              "Albania"     1987   -.8 "Albania"       -.8 . .   5.6
              "Albania"     1988  -1.4 "Albania"      -1.4 . .   -.8
              "Albania"     1989   9.8 "Albania"       9.8 . .  -1.4
              "Albania"     1990   -10 "Albania"       -10 . .   9.8
              "Albania"     1991   -28 "Albania"       -28 . .   -10
              "Albania"     1992  -7.2 "Albania"      -7.2 . .   -28
              "Albania"     1993   9.6 "Albania"       9.6 . .  -7.2
              "Albania"     1994   9.4 "Albania"       9.4 . .   9.6
              "Albania"     1995   8.9 "Albania"       8.9 . .   9.4
              "Albania"     1996   9.1 "Albania"       9.1 . .   8.9
              "Albania"     1997 -10.9 "Albania"     -10.9 . .   9.1
              "Albania"     1998   8.8 "Albania"       8.8 . . -10.9
              "Albania"     1999  12.9 "Albania"      12.9 . .   8.8
              "Albania"     2000   6.9 "Albania"       6.9 . .  12.9
              "Albania"     2001   8.3 "Albania"       8.3 . .   6.9
              "Albania"     2002   4.5 "Albania"       4.5 . .   8.3
              "Albania"     2003   5.5 "Albania"       5.5 . .   4.5
              end

              I was thinking about
              Code:
               gen lgdpgrowth=l.gdpgrowth
              Code:
               forvalues i=1(1)8888{
              gen dummy`i'=1 if lgrowth_rate_origin<=0,03*growth_rate_origin
              replace dummy`i'=0 if lgrowth_rate_origin>0,03*growth_rate_origin
              }
              but it does not give me the right output.
              Can someone help me, please?
              Last edited by Giulia Rap; 18 Oct 2018, 07:56.

              Comment


              • #8
                For this kind of question, you are better off using Stata's panel data tools to account for missing years across panels. I assume that your absolute growth rate values are in percent (at least that is how it looks to me).

                Code:
                encode origin_name, gen(origin)
                xtset origin year
                gen wanted= growth_rate_origin>=(L.growth_rate_origin +3) & !missing(L.growth_rate_origin)
                ADDED IN EDIT: Your variables are stored as floats, so you may run into precision issues. You can thus also try


                Code:
                gen wanted= float(growth_rate_origin)>=(float(L.growth_rate_origin +3)) & !missing(L.growth_rate_origin)
                Last edited by Andrew Musau; 18 Oct 2018, 09:27.

                Comment


                • #9
                  Thank Andrew for your quick reply and help.
                  Yes, I have annual percentage growth rate taken from IMF database.

                  Thank you very much, it was much easier than I thought!

                  Comment


                  • #10
                    Hi everyone,

                    Can you give me an advise creating the dummy variables?

                    This is a panel data set. It has multiple years from 2000- 2010. In each year the month data has been collected is different. I added the month in which data collected to every year using
                    Code:
                     gen data_month=6 if Year==2010
                    and so on.

                    Now I want to create dummy variables and assign "0" for months which data has not been collected.

                    Can you please advice?

                    Many thanks!!



                    Comment


                    • #11
                      Hi Statalist. I am trying to generate a dummy for a categorical variable which is similar to the original post in this thread in that it relates to a survey question which provides the respondent with many options from which to respond. In my example, the question asks if the respondent is religious or not, and if so, to state their religion. As such, there are very many options for different religions and one option for "no religion". Each response option is given values, for example "1000 Buddhist", "2010 Anglican" "2030 Baptist" ...... "2330 Uniting Church" .... "3000 Hinduism" "4000 Islam" "5000 Judaism". In all there are about 30 response options, of which one is "7000 No Religion". (I note that I only have positive values as I removed the negative values associated with non-response).

                      How could I generate a dummy that '==1' for Religion and '==0' for No Religion? In this respect there are many values associated with not being aligned with one of the many listed religions and even after reading all the posts I could find on Statalist, I did not find a solution to this specific problem (not to say there isn't one). I am confident my attempt is flawed (see below)
                      Code:
                       gen relig=0 if religb==7000
                      replace relig=1 if religb!=7000
                      Any help is appreciated. Regards, Chris

                      Comment


                      • #12
                        Hi Chris,

                        Code:
                        gen relig = 1
                        replace relig = 0 if religb == 7000
                        This should work out for you if I got your plan correct and you dropped all non-response.

                        Comment


                        • #13
                          We can't be clear what the answer to #11 is because there is no data example. Thus the explanation is that "7000 No Religion" is a value of the variable, which if correct implies that the variable concerned is string, in which case

                          CODE]gen wanted= religb != "7000 No religion" [/CODE]

                          is a way to do it. On the other hand, Melanie Boekholt is guessing that you don't mean what you say and that "7000 No Religion" is a value label, in which case her code will work if and only if 7000 is the corresponding numeric value.

                          Either way, note that

                          Code:
                          gen wanted = <true or false condition>
                          is a concise way to get values of 1 if the stated condition is true and 0 if it is false. For more, see

                          https://www.stata.com/support/faqs/d...rue-and-false/

                          https://www.stata.com/support/faqs/d...mmy-variables/

                          https://journals.sagepub.com/doi/ful...36867X19830921

                          For how to give a decent data example please see as always https://www.statalist.org/forums/help#stata

                          Comment


                          • #14
                            Thank you Melanie Boekholt that looks correct. Yes I dropped all non-responses (#11). Just the reverse of what I did - I guess I should have figured that out. ...with time and practice I guess....

                            Nick Cox, my apologies for not providing more information. I've tried to capture the list of response options in the attached .png file as shown to me when I tabulate this variable. As you can see the values I listed in #11 with each response option are value labels (my apologies for not clarifying). After reading your links in #13, I thought I should note this is a survey question from the Hilda survey and many respondents do not answer it. Their responses are recorded as non-responses and the value labels for the various non-response options take on negative values which I've removed and are therefore not listed below. I hope this clarifies my post in #11. Thank you again.
                            Click image for larger version

Name:	religb_Stata.png
Views:	1
Size:	22.8 KB
ID:	1532365

                            Comment


                            • #15
                              After creating a dummy based on #12, 'relig' had four times the observations (176,564) compared to when I tabulate 'religb' (41,031) on which 'relig' is based. This is in spite of removing the non-responses via
                              Code:
                              foreach var of varlist _all {
                                  replace `var' = . if `var' < 0
                              }
                              #14 shows the list of value labels associated with this variable of which there are 41,031 observations. There are 135,533 missing values (non-responses) and it seems that based on the code in #12 the dummy variable (relig) includes these. [added] I now realise that I didn't remove the non-responses, I just converted them to '.' - missings.

                              So to remove the 'missings' I added the following code to #12 (below) and it seems to have addressed the problem.
                              Code:
                              gen relig = 1 if religb !=.
                              replace relig = 0 if religb == 7000 & religb !=.
                              Can someone please confirm that this is the correct approach?

                              Regards, Chris

                              Comment

                              Working...
                              X