Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • replace versus recode message

    A student approached me and noticed that when I showed two different ways to generate a new dummy variable, the first yielded a message that made sense while the second had a message that didn't seem to make sense. Here are the frequencies for the variable we converted to a dummy.
    veteran | Freq.
    ------------+----------
    1 | 4
    3 | 1
    5 | 9
    6 | 101
    . | 20
    ------------+------------
    Total | 135 100.00
    Here are the two methods I demonstrated to create a dummy variable for veteran (1=yes/0=no). Both methods "worked" such that they yielded the dummy variable as intended.
    * Method 1
    ge vetcat1 = .
    replace vetcat1 = 1 if veteran==1 | veteran==3 | veteran==5
    replace vetcat1 = 0 if veteran==6 * Method 2
    ge vetcat2 = veteran
    recode vetcat2 1/5=1 6=0
    The first method gave this output. . ge vetcat1 = .
    (135 missing values generated)
    . replace vetcat1 = 1 if veteran==1 | veteran==3 | veteran==5
    (14 real changes made)
    . replace vetcat1 = 0 if veteran==6
    (101 real changes made)
    The second method gave this output.
    . ge vetcat2 = veteran
    (20 missing values generated)
    . recode vetcat2 1/5=1 6=0
    (vetcat2: 111 changes made)

    So the question is why does method 1 display 14 and 101 changes made (which makes sense) while method 2 displays 111 changes made (which is not the sum 14 + 101)?
    I couldn't figure it out. Have you run across this?

  • #2
    Because there are four observations that do not need recoding. Namely, those that are already equal to 1. So 111 (recoded) + 4 (that are not recoded) + 20 (missing) = 135.

    Please follow advice on formatting your posts. That is very difficult to read.
    You should:

    1. Read the FAQ carefully.

    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

    Comment


    • #3
      Your syntax was a bit difficult to dis-entangle, since you re-arranged things and didn't use code blocks. But your difference is due to the fact that four cases *already* had a value of 1, so they weren't changed with your -recode- syntax
      .
      Code:
      clear
      
      set obs 135
      gen veteran=.
      replace veteran=1 if _n<=4
      replace veteran=3 if _n==5
      replace veteran=5 if _n>5 & _n<15
      replace veteran=6 if _n>=15 & _n<116
      tab veteran, mi 
      
      * Method 1
      ge vetcat1 = .
      replace vetcat1 = 1 if veteran==1 | veteran==3 | veteran==5
      replace vetcat1 = 0 if veteran==6
      * Method 2
      ge vetcat2 = veteran
      recode vetcat2 1/5=1 6=0

      Comment


      • #4
        Well, if the value doesn't change, Stata doesn't count it into the values that change.
        In method 1 all values were missing to begin with, and were assigned non-missing values. So number of values changed is _N.
        In method 2 some values were preserved as original (ones remained ones), so they are not counted: 135 total = 111 changes + 20 missings unchanged + 4 ones unchanged.
        Best, Sergiy Radyakin.

        Comment

        Working...
        X