Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Truncate the variable (0,1)

    Dear Stata Expert,

    May I seek your help in variable transformation? I need to truncate the variable into 0 and 1. Please suggest to me the code for it. The variable statistics are as follows:

    ETR5 is the cash effective tax rate.

    Variable | Obs Mean Std. dev. Min Max
    -------------+---------------------------------------------------------------------------------------
    ETR5 | 179,287 .1145609 16.64328 -3055.415 3911.789



    Thanks in advance

  • #2

    What you're asking for appears to be clip(ETR5, 0, 1) but with a variable that has a range enormously greater it is hard for me to imagine that is really what you want.

    Comment


    • #3
      Thanks Nick Cox , Should I winsorize the variable before truncate it?
      does also it gonna be the same code for STATA v 16?

      Comment


      • #4
        Probably not. It really depends on what is exactly in your variable and why you want to do that. My guess was
        Code:
        gen new = (ETR5-r(min))/(r(max)-r(min))
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          I agree with Maarten Buis that what you should do depends on what you really want to do. But if your variable ranges as reported but all values above 1 and below 0 should be chopped, then Winsorizing first seems unlikely to make much difference.

          Can you show the results of

          Code:
          quantile ETR5
          because I am still puzzled at what you really want here?

          Comment


          • #6


            ETR5Graph.gph
            -------------------------------------------------------------
            Percentiles Smallest
            1% -1.87102 -423.1667
            5% -.0456715 -329.7791
            10% .0019417 -153.1813 Obs 31,451
            25% .2558587 -131.7143 Sum of wgt. 31,451

            50% .3639508 Mean .3541434

            Largest Std. dev. 5.911417
            75% .4051866 227.7098
            90% .5053846 278.5 Variance 34.94485
            95% .7262599 296.6667 Skewness 22.72979
            99% 2.749482 589.4225 Kurtosis 4757.437




            . univar ETR5
            -------------- Quantiles --------------
            Variable n Mean S.D. Min .25 Mdn .75 Max
            ------------------------------------------------------------------------------------------------------------------------------------------------
            ETR5 31451 0.35 5.91 - 423.17 0.26 0.36 0.41 589.42
            ------------------------------------------------------------------------------------------------------------------------------------------------

            This is how I calculate ETR the long-run (five year) :
            If you would please help me make sure that I run the correct codes!


            How do I make the STATA keep only firms' obs with past five years as a condition to be taken into consideration ?
            Also, what is the code if I want to obtain obs that have at at most five years (taking into account companies with 2,3,4 years?

            gen TXT5a = sum(txt) if inrange(Year, Year[_n]-4, Year[_n]) & _N >= 5
            gen PI5 = sum(pi) if inrange(Year, Year[_n]-4, Year[_n]) & _N >= 5
            gen SPI5 = sum(spi) if inrange(Year, Year[_n]-4, Year[_n]) & _N >= 5

            drop if PI5 < 0
            drop if SPI5 < 0

            generate PI_SPI5 = PI5- SPI5
            generate ETR5 = TXT5 / PI_SPI5



            Also, these codes did not work with my data , don't know why? :

            clip(ETR5, 0, 1)
            gen new = (ETR5-r(min))/(r(max)-r(min))


            I also attached
            quantile ETR5

            Comment


            • #7
              Many thanks Nick Cox Maarten Buis for your help, I appreciate it!
              And I hope to find help in what I posted above.

              Nora

              Comment


              • #8
                I don't work in economics or finance, so for that reason alone I am struggling to follow what you're asking here.

                The summarize results in #6 are quite different from those in #1. This isn't explained.

                clip(ETR5, 0, 1)
                gen new = (ETR5-r(min))/(r(max)-r(min))
                These are code suggestions from #2 and #4 and represent completely different guesses at what you want.

                At best clip(ETR5, 0, 1) needs to be fed to a command. See e.g. https://www.stata-journal.com/articl...article=dm0058 for a basic review of functions, including their relation to commands.

                So, this should work, i.e. produce results.

                Code:
                gen ETR5_2 = clip(ETR, 0, 1)
                You haven't really clarified what is going on here that could produce values with a range of thousands (or hundreds) while only values between 0 and 1 are acceptable, which is what clip() ensures by brute force.

                @Maarten Buis's suggestion requires a prior summarize so that r(min) and r(max). Maarten's code would map your minimum to 0 and your maximum to 1.

                The shape of the distribution if you followed Maarten's suggestion would thus be identical to that in the quantile plot.

                Sorry, but necessarily I have no idea what your variables txt pi spi are or what should be done with them.

                Code like

                Code:
                gen TXT5a = sum(txt) if inrange(Year, Year[_n]-4, Year[_n]) & _N >= 5
                is based on some wild guesses on how Stata works.

                _N in this context can only mean the number of observations in the entire dataset and will not mean the number of observations that enter each calculation.

                >= makes little sense here: I guess you intend <=.

                sum() is a function that computes a running or cumulative sum and won't do what I guess you want.

                rangestat from SSC can be used to calculate moving sums. There are hundreds of threads here mentioning it.

                (If you are using AI here, it is not working well for you.)

                Comment

                Working...
                X