Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rounding to the nearest hundred while generating a table

    Hi,

    I'm working with confidential data and I've been asked to round all numbers to the nearest hundred (e.g., 1234 becomes 1200 and 1989 becomes 2000) in all my tables.

    Considering the large amount of tables I have, I would prefer a solution where I don't have rewrite all my code to build all my tables by hand. Is there something I could add as an option (I'm using tabout for export, but I can settle for something that print nicely in the log) or before generating a table? We're using a fairly recent version of Stata, 16 I think.

    I've asked if I could just round in Excel after the tables are generated and they've said no.

    I don't actually care if it's actual rounding. I can settle for a close approximation.

    Thank you,
    Samy
    Last edited by Samy Gallienne; 04 Jan 2023, 17:19.

  • #2
    Hi Samy,

    You say you "don't actually care if it's actual rounding." Would truncation work for you?

    Can you please provide a data example generated with the -dataex- command along with your code?

    Comment


    • #3
      Check out the help file for -format- and the help for the function -round()-.

      Comment


      • #4
        Originally posted by Daniel Schaefer View Post
        You say you "don't actually care if it's actual rounding." Would truncation work for you?
        Yes. Anything is fine as long as it's close enough to rounding.

        Originally posted by Daniel Schaefer View Post
        Can you please provide a data example generated with the -dataex- command along with your code?
        Sadly, I can't. I can only access that data from a computer without access to the Internet and getting anything out of there requires a very annoying and lengthy review process.

        I can give you a rough idea of what the code looks like, though:

        HTML Code:
        tabout Quebec year using "$table_path\Count of ids per year, weighted.tex", c(sum RPW) f(3) sum replace
        Where :
        • Quebec is a binary variable
        • year is a integer in [2008; 2017]
        • RPW is a a weight
        I have a lot of similar tables, which is why I'm trying to find a solution where I don't have to build the tables myself.

        Comment


        • #5
          Problems with confidentiality, privacy, security and the like are genuine but already addressed in the FAQ Advice you're asked to read before posting:

          If your dataset is confidential, then provide a fake example instead.
          I have been seeing several versions of this request recently and I doubt that there is a good solution short of StataCorp providing a rounding format for integers.

          One small trick is however possible.

          Code:
          . webuse nlswork, clear
          (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
          
          . tab occ_code
          
           Occupation |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    1 |      3,008       10.59       10.59
                    2 |      1,494        5.26       15.84
                    3 |     10,974       38.62       54.47
                    4 |      1,323        4.66       59.12
                    5 |        438        1.54       60.67
                    6 |      4,309       15.17       75.83
                    7 |        571        2.01       77.84
                    8 |      4,300       15.13       92.98
                    9 |          6        0.02       93.00
                   10 |        144        0.51       93.50
                   11 |        194        0.68       94.19
                   12 |          7        0.02       94.21
                   13 |      1,645        5.79      100.00
          ------------+-----------------------------------
                Total |     28,413      100.00
          
          . bysort occ_code : gen wanted = round(_N, 100)
          
          
          
          . egen tag = tag(occ_code)
          
          . tab occ_code if tag [fw=wanted]
          
           Occupation |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    1 |      3,000       10.60       10.60
                    2 |      1,500        5.30       15.90
                    3 |     11,000       38.87       54.77
                    4 |      1,300        4.59       59.36
                    5 |        400        1.41       60.78
                    6 |      4,300       15.19       75.97
                    7 |        600        2.12       78.09
                    8 |      4,300       15.19       93.29
                   10 |        100        0.35       93.64
                   11 |        200        0.71       94.35
                   13 |      1,600        5.65      100.00
          ------------+-----------------------------------
                Total |     28,300      100.00
          
          .
          You will note that in this case and many others, the total of rounded values is not a rounded version of the total. Also, percents and cumulative percents are necessarily wrong (otherwise, one could recover much of the information you are trying to suppress).

          Another line of attack is to change the units to hundreds and then display integers.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            One small trick is however possible.
            Thanks, that was really helpful. That did the trick with a few minor tweaks.

            Comment

            Working...
            X