Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Some invalid responses in data

    Hi,

    I am using a panel data where individuals are asked to answer their savingsinaccount and cash saving in terms of dollar amount. then I created total saving as a sum of these two. However, when I look at my data, I see that some values do not make sense in terms of savings. For instance if you look at input 745 savinginaccount is 1300268000 which is crazy. Or input 3092 where savinginaccount is 140000225000. What would be the best to do under this circumstances? I added both screenshots and data results. I am not willing to drop the observations so any help would be appreciated. Thank you!
    Click image for larger version

Name:	Screen Shot 2021-11-29 at 1.33.06 PM.png
Views:	1
Size:	104.2 KB
ID:	1638670


    input id float year double savinginaccount long cashsaving float totalsaving
    129 1 450000 125000 575000
    514 2 500000 400 500400
    514 3 500000 500 500500
    526 3 605000 5000 610000
    711 3 500000 6000 506000
    745 1 1300268000 200 1300268160
    762 3 1450025 400 1450425
    948 1 500000 10000 510000
    948 2 700000 10000 710000
    1226 1 2000000 50000 2050000
    1507 1 288467 288467 576934
    1543 2 300000 300000 600000
    1561 3 818250 460 818710
    1695 2 75000 300030000 300104992
    1790 3 20000400 2500 20002900
    2030 1 80001000 1200 80002200
    2210 2 500000 20000 520000
    2671 2 400000 400000 800000
    2671 3 400000286 400000 400400288
    3071 3 1000217 1300 1001517
    3092 1 14000225000 500 14000225280
    3289 1 2500000 5000 2505000
    3441 3 15300950 500 15301450
    3490 1 67000 1000000 1067000
    3552 3 1500120 125 1500245
    3570 1 500000 5000 505000
    3685 1 500000 2500 502500
    3744 1 700000 4000 704000
    3744 2 900000 500 900500
    3750 2 9300090 1100 9301190
    3842 3 678000 39500 717500
    3938 3 450000 100000 550000
    4143 1 500000 5000 505000
    4143 2 500000 1500 501500
    4312 2 500000 35 500035
    4327 3 10411041 400 10411441
    4437 1 2000000 10000 2010000
    4437 2 500000 10000 510000
    4437 3 800000 10000 810000
    4571 2 300060000 5000 300064992
    4817 1 500 500000 500500
    4838 3 500000 500000 1000000
    5061 1 350000 350000 700000
    5063 1 500000 200000 700000
    5063 3 1000000 10000 1010000
    5138 3 549000 500 549500
    5635 1 12000000 500000 1.25e+07
    5712 3 1000100 100 1000200
    6013 3 603000 6000 609000
    6158 2 120000 1000000 1120000
    6348 3 4972106 1500 4973606
    6553 3 2000000 2000000 4000000
    6723 3 600000 1000 601000
    6807 3 850000 300 850300
    6819 3 530000 1000 531000
    7498 2 1000000 500000 1500000
    7523 3 1300000 0 1300000
    7835 3 520000 75000 595000
    7927 3 500000 500 500500
    7958 3 -1000 600000 599000

  • #2
    Yaseminn:
    it would seem that the data entry was not that accurate.
    For instance:
    Code:
    4327 3 10411041 400 10411441
    sounds strange indeed.

    I would send a query out to those who looked after that database setup asking for clarifications.
    If your shot is unsuccessful, it's probably better to consider this values as missing and deal with them in the most appropriate way (-ipolate- or -mi-).
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      I find the following two observations interesting.
      Code:
      2671 2 400000 400000 800000
      2671 3 400000286 400000 400400288
      Same ID in two years. The savingsaccount in year 3 is the same as the value in year two with "238" stuck onto the end. Clear some sort of data entry difficulty, or some problem importing the data into Stata. You might want to look at these two observations in your input data and see if there's anything to be learned.

      Let me add that when you create totalsaving you will want to use
      Code:
      generate double totalsaving = savinginaccount + cashsaving
      because the default if you don't specify the storage type is the same as if you used
      Code:
      generate float totalsaving = savinginaccount + cashsaving
      and float cannot store integers larger than 16 million with complete accuracy, which is why in the year 3 observation above, totalsaving should have been 400400286 but was shown as 400400288.

      Here are the limits on storage of decimal integers with full accuracy in the various numeric storage types. The fixed-point variables lose the 27 largest positive values to missing value codes; the similar loss for floating point variables occurs only for the largest exponent, so it doesn't affect the much smaller integer values.
      byte - 7 bits -127 100
      int - 15 bits -32,767 32,740
      long - 31 bits -2,147,483,647 2,147,483,620
      float - 24 bits -16,777,216 16,777,216
      double - 53 bits -9,007,199,254,740,992 9,007,199,254,740,992
      Last edited by William Lisowski; 29 Nov 2021, 16:15.

      Comment


      • #4
        Thank you, Carlo and William! I ended up using these values as missing since my request for clarification was not answered.

        Comment

        Working...
        X