Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to correct/delete observations to create a Panel Data?

    Hello,

    I am working with a panel data comprising of two waves written as Survey 1 and 2. The observations in survey 1 are correctly specified. However, in survey 2 some observations have been recorded twice or more due to some error.

    As a result, I am getting more observations in the second wave.

    IHDS1 |
    (2005) or |
    IHDS2 |
    (2012) | Freq. Percent Cum.
    ------------+-----------------------------------
    IHDS1 1 | 29,397 49.30 49.30
    IHDS2 2 | 30,231 50.70 100.00
    ------------+-----------------------------------
    Total | 59,628 100.00

    I have attached an example of the data set using dataex. From dataex, id 8 is appearing four times instead of twice. Is there a way for me to correct this? I have tried 'duplicates report' but it did not work.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int SURVEY double HHBASE float id1
    1 1010201010  1
    2 1010201010  1
    1 1010201020  2
    2 1010201020  2
    1 1010201030  3
    2 1010201030  3
    1 1010201040  4
    2 1010201040  4
    1 1010201050  5
    2 1010201050  5
    1 1010201070  6
    2 1010201070  6
    1 1010201080  7
    2 1010201080  7
    1 1010201090  8
    2 1010201090  8
    2 1010201090  8
    2 1010201090  8
    1 1010201100  9
    2 1010201100  9
    1 1010201120 10
    2 1010201120 10
    1 1010201130 11
    2 1010201130 11
    1 1010201140 12
    2 1010201140 12
    1 1010201160 13
    2 1010201160 13
    1 1010201170 14
    2 1010201170 14
    1 1010201180 15
    2 1010201180 15
    1 1010201190 16
    2 1010201190 16
    1 1010201200 17
    2 1010201200 17
    1 1010202010 18
    2 1010202010 18
    1 1010202020 19
    2 1010202020 19
    1 1010202030 20
    2 1010202030 20
    1 1010202040 21
    2 1010202040 21
    1 1010202060 22
    2 1010202060 22
    1 1010202070 23
    2 1010202070 23
    1 1010202100 24
    2 1010202100 24
    1 1010202110 25
    2 1010202110 25
    1 1010202140 26
    2 1010202140 26
    1 1010202150 27
    2 1010202150 27
    1 1010202160 28
    2 1010202160 28
    1 1010202170 29
    2 1010202170 29
    1 1010202180 30
    2 1010202180 30
    1 1010202190 31
    2 1010202190 31
    1 1010202200 32
    2 1010202200 32
    1 1010203020 33
    2 1010203020 33
    1 1010203050 34
    2 1010203050 34
    1 1010203060 35
    2 1010203060 35
    1 1010203070 36
    2 1010203070 36
    1 1010203090 37
    2 1010203090 37
    1 1010203100 38
    2 1010203100 38
    2 1010203100 38
    2 1010203100 38
    1 1010203110 39
    2 1010203110 39
    1 1010203120 40
    2 1010203120 40
    1 1010203130 41
    2 1010203130 41
    1 1010203140 42
    2 1010203140 42
    1 1010203150 43
    2 1010203150 43
    1 1010203160 44
    2 1010203160 44
    1 1010203170 45
    2 1010203170 45
    1 1010203190 46
    2 1010203190 46
    1 1010204010 47
    2 1010204010 47
    1 1010204040 48
    2 1010204040 48
    end
    label values SURVEY SURVEY
    label def SURVEY 1 "IHDS1 1", modify
    label def SURVEY 2 "IHDS2 2", modify

  • #2
    See 12.1 in https://www.statalist.org/forums/help#stata for advice never to say just "did not work". What happened (or did not happen)? Why was it not what you wanted or expected?


    Code:
    duplicates report
    works fine here, as does

    Code:
    duplicates drop
    to clean up your data example.

    Detail: "a panel dataset" (or data set) works fine as idiomatic English; "a panel data" not so well in my hearing.

    Comment

    Working...
    X