Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • replace var = .c

    Hello,

    Reading someone else's code, I have run into for the first time the expression "replace var = .c", instead of "replace var = .". At first, I assumed this was a mistake, but running the code myself, it works. My question is thus, what exactly does this do, and why can't I find any reference to it online or in the help files?

    Code:
    sysuse auto, clear
    tab rep78, miss
    replace rep78 = .c if rep78 == 5
    tab rep78, miss
    Running this, yields the following:

    . sysuse auto, clear
    (1978 automobile data)
    r; t=0.00 9:26:42

    . tab rep78, miss

    Repair |
    record 1978 | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 2 2.70 2.70
    2 | 8 10.81 13.51
    3 | 30 40.54 54.05
    4 | 18 24.32 78.38
    5 | 11 14.86 93.24
    . | 5 6.76 100.00
    ------------+-----------------------------------
    Total | 74 100.00
    r; t=0.00 9:26:42

    . replace rep78 = .c if rep78 == 5
    (11 real changes made, 11 to missing)
    r; t=0.00 9:26:42

    . tab rep78, miss

    Repair |
    record 1978 | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 2 2.70 2.70
    2 | 8 10.81 13.51
    3 | 30 40.54 54.05
    4 | 18 24.32 78.38
    . | 5 6.76 85.14
    .c | 11 14.86 100.00
    ------------+-----------------------------------
    Total | 74 100.00
    r; t=0.00 9:26:42

    .
    end of do-file
    Thus, one purpose seems to be to differentiate missing values already in the data from those replaced by us. Is there any other purpose?

    Thanks,
    Best,
    Hélder

  • #2
    Such values are referred to as extended missing values. They are useful in a number of situations, e.g., say you want to differentiate between "refused to answer" and "don't know" responses in a survey. See

    Code:
    help missing

    Comment


    • #3
      Thank you Andrew!

      Comment


      • #4
        One place where the distinction between system missing values (i.e. the dot .) and extended missing values (.a, .b, etc.) often causes problems is multiple imputation (see [MI] Multiple Imputation). Within the mi framework, system missing values are regarded as "soft missing" while extended missing values are regarded as "hard missing". The latter are not imputed. I do not think this behavior is useful, but there you go.

        Comment


        • #5
          As you have found, that is perfectly legal, and as you have guessed, it is usually intended to flag some reason for being missing. In social research on individuals, researchers often want to distinguish did not answer, inapplicable, "Don't know", and so on. But in any field there can be specific reasons. Records lost, records illegible, flood over-topped gauge, test-tube dropped on floor, contamination of sample: the list goes on and on.

          The scope for extended missing values is documented.

          Code:
          help missing 
          leads to a version of [U] 12.2.1.

          EDIT: I had this open while doing something else and so did not see the overlapping answers from Andrew Musau and daniel klein until later.

          As the Bellman said in "The Hunting of the Snark", what we tell you three times is true.
          Last edited by Nick Cox; 04 Mar 2024, 05:22.

          Comment

          Working...
          X