Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace function not working appropriately

    Hi I have the following string variables:
    Click image for larger version

Name:	string.PNG
Views:	1
Size:	3.8 KB
ID:	1660589



    Which have the following data: 0 or 1 or Null
    Click image for larger version

Name:	list.PNG
Views:	1
Size:	3.8 KB
ID:	1660590


    I have encoded it:

    encode thromboc_aspirin, gen(thromboc_aspirin_n)

    Click image for larger version

Name:	_n.PNG
Views:	1
Size:	3.8 KB
ID:	1660591

    I then generated a tab code for thromboc_aspirin_n

    Click image for larger version

Name:	tab.PNG
Views:	1
Size:	5.0 KB
ID:	1660592

    I would like to generate a new column with the aim of incorporating only those in thromboc_aspirin_n == 1

    code used:

    generate VTE_ChemProphylaxis = 0
    replace VTE_ChemProphylaxis = 1 if thromboc_aspirin_n == 1

    Click image for larger version

Name:	replace = 1.PNG
Views:	1
Size:	2.7 KB
ID:	1660593


    And instead of having just 180,034 values replaced - STATA replaces all the values therefore STATA says 1.1-6.991 real changes made

    I do not understand what I am doing wrong and why isn't stata just replacing the 180,034 values that contain a 1 in my thromboc_aspirin_n variable



  • #2
    replace is a command, not a function. That's a detail.

    Everything hinges on the labelling created by encode. Consider this reproducible example.

    Code:
    . clear
    
    . set obs 3
    Number of observations (_N) was 0, now 3.
    
    . gen problem = word("0 1 NULL", _n)
    
    . list
    
         +---------+
         | problem |
         |---------|
      1. |       0 |
      2. |       1 |
      3. |    NULL |
         +---------+
    
    . encode problem, gen(solution)
    
    . list
    
         +--------------------+
         | problem   solution |
         |--------------------|
      1. |       0          0 |
      2. |       1          1 |
      3. |    NULL       NULL |
         +--------------------+
    
    . label list solution
    solution:
               1 0
               2 1
               3 NULL
    The list looks OK, but you need to look underneath the value labels.

    With nothing else said, encode maps "0" to 1, "1" to 2 and "NULL" to 3. That's not what you're guessing, but it is implicitly what you asked for. If you wanted something different, you needed to define your value labels in advance of the encode.

    Note that destring, force would have mapped differently and here would work more like how you were expecting encode to work.

    Comment


    • #3
      Thanks for pointing this out.
      Although I do not understand why I should use 'force' command over 'encode' ?

      Comment


      • #4
        force is not a command. It's an option of destring.

        Code:
        destring problem, gen(SOLUTION) force 
        would with the example data in #2 have mapped

        "0" to 0
        "1" to 1
        "NULL" to system missing (.).

        That seems to be what you were expecting encode to do. Naturally you could use any legal variable name. I use SOLUTION here to make it emphatic that it is not the same as solution.

        Otherwise put, you have expressed surprise that replace did not work as you expected, and the answer lies in what encode did. You needed to do one of the following

        1. adjust to what encode did

        2. apply encode differently -- defining value labels in advance

        3. apply destring, force


        Last edited by Nick Cox; 20 Apr 2022, 07:04.

        Comment

        Working...
        X