Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Puzzled by what Stata returns as mean value

    I used tabstat to summarize this inflation data. Stata says the mean is 277. I checked in Excel and the mean is 2.67%( more reasonable). Obviously I am doing something wrong in Stata. I can't figure out. I am not sure how to fix this. Please advise.

    . summarize inflation, detail

    inf var
    -------------------------------------------------------------
    Percentiles Smallest
    1% 6 2
    5% 27 3
    10% 55 4 Obs 467
    25% 137 5 Sum of wgt. 467

    50% 280 Mean 277.7131
    Largest Std. dev. 160.027
    75% 413 567
    90% 495 568 Variance 25608.63
    95% 526 571 Skewness -.004418
    99% 566 572 Kurtosis 1.834808


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long country float year double inflation
    1 2010 390
    1 2011 414
    1 2012 241
    1 2013 346
    1 2014 350
    1 2015 206
    1 2016 182
    1 2017 264
    1 2018 260
    1 2019 220
    1 2020 127
    1 2021 386
    2 2010 252
    2 2011 413
    2 2012 349
    2 2013 298
    2 2014 219
    2 2015 133
    2 2016 132
    2 2017 308
    2 2018 271
    2 2019 212
    2 2020 190
    2 2021 376
    3 2010 321
    3 2011 435
    3 2012 383
    3 2013 160
    3 2014  79
    3 2015 100
    3 2016 270
    3 2017 314
    3 2018 303
    3 2019 196
    3 2020 119
    3 2021 343
    4 2010 491
    4 2011 532
    4 2012 505
    4 2013 522
    4 2014 524
    4 2015 564
    4 2016 560
    4 2017 429
    4 2018 440
    4 2019 443
    4 2020 410
    4 2021 554
    5 2010 243
    5 2011 389
    5 2012 210
    5 2013 137
    5 2014 259
    5 2015 162
    5 2016 194
    5 2017 217
    5 2018 325
    5 2019 265
    5 2020 118
    5 2021 423
    6 2010 193
    6 2011 420
    6 2012 393
    6 2013 245
    6 2014 484
    6 2015 471
    6 2016 445
    6 2017 319
    6 2018 342
    6 2019 359
    6 2020 398
    6 2021 475
    7 2010 406
    7 2011 507
    7 2012 364
    7 2013 365
    7 2014 263
    7 2015 197
    7 2016 297
    7 2017 216
    7 2018 305
    7 2019 388
    7 2020 341
    7 2021 140
    8 2010 326
    8 2011 427
    8 2012 404
    8 2013 300
    8 2014 387
    8 2015 489
    8 2016 542
    8 2017 470
    8 2018 411
    8 2019 433
    8 2020 355
    8 2021 432
    9 2010 149
    9 2011 327
    9 2012 426
    9 2013 323
    end
    format %ty year
    label values country code2
    label def code2 1 "Australia", modify
    label def code2 2 "Austria", modify
    label def code2 3 "Belgium", modify
    label def code2 4 "Brazil", modify
    label def code2 5 "Canada", modify
    label def code2 6 "Chile", modify
    label def code2 7 "China", modify
    label def code2 8 "Colombia", modify
    label def code2 9 "Croatia", modify
    label values inflation inflation
    label def inflation 79 "0.340002833", modify
    label def inflation 100 "0.561429153", modify
    label def inflation 118 "0.716999632", modify
    label def inflation 119 "0.740791812", modify
    label def inflation 127 "0.846905537", modify
    label def inflation 132 "0.891591753", modify
    label def inflation 133 "0.896563335", modify
    label def inflation 137 "0.938291898", modify
    label def inflation 140 "0.981015136", modify
    label def inflation 149 "1.030555053", modify
    label def inflation 160 "1.11309594", modify
    label def inflation 162 "1.125241361", modify
    label def inflation 182 "1.276990945", modify
    label def inflation 190 "1.381910634", modify
    label def inflation 193 "1.41071108", modify
    label def inflation 194 "1.428759547", modify
    label def inflation 196 "1.43681957", modify
    label def inflation 197 "1.437023809", modify
    label def inflation 206 "1.508366722", modify
    label def inflation 210 "1.515678231", modify
    label def inflation 212 "1.530895642", modify
    label def inflation 216 "1.593136001", modify
    label def inflation 217 "1.596884129", modify
    label def inflation 219 "1.60581183", modify
    label def inflation 220 "1.610767873", modify
    label def inflation 241 "1.762780156", modify
    label def inflation 243 "1.776871541", modify
    label def inflation 245 "1.78955554", modify
    label def inflation 252 "1.81353439", modify
    label def inflation 259 "1.906635907", modify
    label def inflation 260 "1.911400944", modify
    label def inflation 263 "1.921641628", modify
    label def inflation 264 "1.948647409", modify
    label def inflation 265 "1.949269024", modify
    label def inflation 270 "1.973852647", modify
    label def inflation 271 "1.998379814", modify
    label def inflation 297 "2.000001822", modify
    label def inflation 298 "2.000156169", modify
    label def inflation 300 "2.016992243", modify
    label def inflation 303 "2.053164999", modify
    label def inflation 305 "2.0747904", modify
    label def inflation 308 "2.081269114", modify
    label def inflation 314 "2.12597086", modify
    label def inflation 319 "2.182718469", modify
    label def inflation 321 "2.189299204", modify
    label def inflation 323 "2.216582064", modify
    label def inflation 325 "2.268225672", modify
    label def inflation 326 "2.272002279", modify
    label def inflation 327 "2.272727273", modify
    label def inflation 341 "2.419421895", modify
    label def inflation 342 "2.434889814", modify
    label def inflation 343 "2.440248511", modify
    label def inflation 346 "2.449888641", modify
    label def inflation 349 "2.485675622", modify
    label def inflation 350 "2.487922705", modify
    label def inflation 355 "2.526635001", modify
    label def inflation 359 "2.557544757", modify
    label def inflation 364 "2.619524326", modify
    label def inflation 365 "2.621050017", modify
    label def inflation 376 "2.766666667", modify
    label def inflation 383 "2.839663434", modify
    label def inflation 386 "2.863910422", modify
    label def inflation 387 "2.898837878", modify
    label def inflation 388 "2.899234164", modify
    label def inflation 389 "2.912135089", modify
    label def inflation 390 "2.918340027", modify
    label def inflation 393 "3.007448402", modify
    label def inflation 398 "3.045490848", modify
    label def inflation 404 "3.169301888", modify
    label def inflation 406 "3.175324753", modify
    label def inflation 410 "3.211768038", modify
    label def inflation 411 "3.240569329", modify
    label def inflation 413 "3.286579149", modify
    label def inflation 414 "3.303850156", modify
    label def inflation 420 "3.341216943", modify
    label def inflation 423 "3.395193185", modify
    label def inflation 426 "3.412073491", modify
    label def inflation 427 "3.415033448", modify
    label def inflation 429 "3.44637335", modify
    label def inflation 432 "3.495057574", modify
    label def inflation 433 "3.523019327", modify
    label def inflation 435 "3.532082107", modify
    label def inflation 440 "3.664850284", modify
    label def inflation 443 "3.732976212", modify
    label def inflation 445 "3.786193559", modify
    label def inflation 470 "4.314313257", modify
    label def inflation 471 "4.348773532", modify
    label def inflation 475 "4.524568383", modify
    label def inflation 484 "4.718675279", modify
    label def inflation 489 "4.989831158", modify
    label def inflation 491 "5.038726901", modify
    label def inflation 505 "5.40349914", modify
    label def inflation 507 "5.553898923", modify
    label def inflation 522 "6.204310666", modify
    label def inflation 524 "6.329040155", modify
    label def inflation 532 "6.636449622", modify
    label def inflation 542 "7.513460246", modify
    label def inflation 554 "8.301659756", modify
    label def inflation 560 "8.739143523", modify
    label def inflation 564 "9.029901024", modify

  • #2
    Your inflation variable is fubar. Well, actually not beyond all recognition, but not usable in its current form.

    Whoever created this data set initially imported it as a string variable. So far so good. Then they made the lethal mistake of -encode-ing it to try to make it numeric. But -encode-ing a string variable that is actually a string representation of the numeric values always ends in tears. The proper approach is to -destring- it. This is a common mistake people make when creating Stata data sets. -encode- has a different purpose altogether.

    By having -encode-d the string version of inflation, the resulting variable did not contain the values of inflation. Instead, it contained consecutive numbers from 1 through the number of observations in your data set, along with a value label associating those numbers with the actual value for the sole place of display. The -summ- command does not know about those display labels and calculated the average of 1 through the number of observations. (Based on the results, it seems that some of the observations in the originally imported data set were not used in the -summarize- command.)

    To fix this you have to undo the problem. So first you need to -decode- the inflation variable, and then -destring-- the result of that:

    Code:
    decode inflation, gen(infl)
    drop inflation
    destring infl, gen(inflation)
    drop infl
    summ inflation
    and this will give you a usable inflation variable.
    Code:
    . summ inflation
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
       inflation |        100     2.78995    1.720171   .3400028   9.029901
    For further information about this read -help label define-, -help encode-, and -help destring-. Pay careful attention to the difference between -encode- and -destring-. And more generally, remember that in Stata what you see is sometimes different from what Stata calculates on, because Stata gives you ways of displaying numbers in ways that are easier for you to grasp than the underlying numbers that are better for Stata's calculations.

    Comment


    • #3
      Anytime you do something to a variable using a command you're not sure of, make sure to look at the Data Editor to make sure you got what you wanted.

      And always with destring.

      Comment


      • #4
        For those wanting to go beyond US slang, fubar naturally means fouled up beyond all recognition.

        For those wanting an overview of how to deal with strings, Clyde Schechter is too modest but I am not to mention

        https://journals.sagepub.com/doi/pdf...6867X180180041

        where the false entrapping enchantment of encode is analysed in favour of the deserving desirability of destring.

        Comment


        • #5
          The link in #4 is broken. Sorry about that. Let me try again https://journals.sagepub.com/doi/pdf...867X1801800413

          Comment


          • #6
            Thank you all very much. What would learners like us to do without all your valuable help and support!!. I shall remember encode and destring for sure!

            Comment

            Working...
            X