Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variability over time of positive and negative values

    Hello,

    I'm interested in calculating the variability over time of a variable that can takes either positive or negative values. If they were only positive values, I could calculate the standard deviation and the coefficient of variation but here with both positive and negative values, I'm really confused. I understand that computing a CV is weird. What about the SD?

    Which kind of data transformation could I use to deal with this problem and be able to assess the variability of this values over time? (I have 4 positive or negative values every 6 months in a longitudinal study).

    Thank you very much for your help,

    Cheers
    PM

  • #2
    The SD is a direct measure of the variation in a variable, the CV is not. The CV measures variation indirectly and mashes it up with the mean. This is often useful, but not invariably so. It is most useful when relative error, rather than absolute error, is what matters.

    Your data clearly do not lend themselves to calculating the CV. I don't think transforming the data is the right approach: with different transformations you can probably get any imaginable result. I think you need to work with a different measure of variation.

    I don't know what the nature of your variable is. One context in which CV is ordinarily used is where you have a bunch of laboratory measurements of some analyte in a series of samples. Each sample is analyzed more than once, and, in ordinary situations where all values are necessarily positive, you become interested in the within-sample CV. Another approach, that is not disturbed by negative numbers or zeroes, and is a better measure of absolute, not relative, error, is the intra-class correlation (ICC). Stata has an -icc- command that can accommodate a few different study designs. Read -help icc- and the manual section linked therein to see if that would be suitable for your situation.

    Comment


    • #3
      Cross-posted at https://stats.stackexchange.com/ques...egative-values in different form. Please note our policy on cross-posting, which is explicit in the FAQ Advice. You are asked to tell us about it. My wild guess is that had Clyde Schechter seen the existing comments on coefficients of variation on Cross Validated he would have felt that he didn't need to make essentially the same point all over again.

      There are different stories within and between these threads about what the data are, which is disconcerting.

      I don't know why you think occurrence of positive and negative values is a problem and why you think you need some kind of transformation. I am -- on the spectrum of statistically minded people -- very positive about transformations, but I don't think you have made any kind of case that one is needed here.

      Comment


      • #4
        Hi Clyde Schechter Nick Cox

        Thank you for your answers. I understand perfectly +++ that cross-posting can lead to duplication of effort and I'll mention it in my future posts.

        Let me clarify my question.

        I'm part of a cardiology team interested right now in studying the impact of postural blood pressure variability on incident stroke. Every year we have a measure of lying blood pressure and 3 min after a measure of standing blood pressure. Every year during 5 years [both lying BP and standing BP just after] have been measured.

        We worked last year on the impact of blood pressure variability over time on stroke. It was easier because we had 5 measures of standing blood pressure (one measure every year) and we considered the impact of the variability of these 5 measures on stroke. We used several indicators as in previous research papers with high impact factors: SD and coefficient of variation but also average real variability, variability independent of mean, successive variation and so on.

        Here, the problem is most complicated. I have first to summarize the information regarding the postural change in blood pressure (for each year) and then consider the variability over time of this postural change. Postural change can be from lying to standing either a rise or a fall in blood pressure.

        Initially, I wanted to calculate a delta standing - lying blood pressure and so have 5 deltas and work on the impact of the variability of the 5 deltas on stroke. The problem is that to my mind, it doesn't make sense to calculate a standard deviation or a coefficient of variation of something that can be either positive or negative (depending if rise or fall upon standing).

        So I decided to calculate a ratio standing BP/ lying BP so I have 5 ratios (one each year) and I can now work on the impact of their variability over time. And I can do the same thing as we did when we considered variability of blood pressure over time.

        I just wanted to know if you would have other ideas to deal with this problem? Data transformation?

        I know that I want to calculate all these indicators : see below.

        But first I have to deal with the change from lying to standing (ratio, delta) that allows me to make this calculation after. If I choose a delta (so having positive or negative values) it's not possible...

        Thank you very much for your help,

        Cheers

        PM

        HTML Code:
        Standard deviation (SD)
         
        SD, the most commonly used index of visit-to-visit blood pressure variability (BPV), provides a global measure of the spread of BP measurements around the mean value. It is independent of the order in which the measurements are taken. However, if there is an underlying trend in blood pressure (BP) over time or if variability is highly correlated with mean levels (which it usually is), then alternative measures may be more appropriate.
         
        Coefficient of variation (CV) 
         
        Absolute levels of variability in BP are often positively correlated with mean levels. By expressing SD relative to mean levels, CV is often considered to correct for correlations between mean levels and SD. However, when considering visit-to-visit BPV, the correction is not always sufficient with CV also being positively correlated with mean levels in most cohorts [2–4].  
         
        Variation independent of mean (VIM)
         
        VIM is a transformation of SD that is defined to be uncorrelated with mean levels. It is calculated by fitting a curve of the form through a plot of SD BP (y-axis) against mean BP (x-axis) for all individuals in the cohort. The parameter p is estimated from the data and k is a constant which can be chosen such that the values of VIM are on the same scale as values of SD. For example, if M is the average value of mean BP in the cohort, then and the value of VIM for any individual is given by VIM BP = (k SD/).
         
        Residual standard deviation (RSD)
        When BP measurements are taken over a period of months, there may be an underlying trend over time, for example a trend for BP to decrease over time in relation to the initiation of anti-hypertensive treatment. In such cases, using SD as a measure of variability may over-estimate the extent of variability, with large SD values resulting from changes over time and not necessarily as a consequence of variability. If the relationship between BP and time is approximately linear, variability over and above that due to a trend can be estimated as the residual mean square after fitting a linear regression to BP against time. RSD can then be defined as the square root of this value.
         
        Average real variability (ARV)
         
        ARV, calculated as the average absolute difference between consecutive measurements, takes into account the order of individual BP measurements. In the presence of an underlying trend, ARV will tend to be less than the corresponding SD and can be estimated without making any assumptions regarding the shape of the relationship between BP values and time. ARV will be greater than SD when there is a tendency for BP measurements to have an alternating pattern of increases and decreases between adjacent measurements.
         
        Successive variation (SV)
        SV, defined as the square root of the average squared difference between successive BP measurements, is conceptually similar to ARV. SV is highly correlated with ARV, tending to be larger in absolute value and influenced to a greater extent by large discrepancies between successive measurements.



        Comment


        • #5
          The problem is that to my mind, it doesn't make sense to calculate a standard deviation or a coefficient of variation of something that can be either positive or negative (depending if rise or fall upon standing).
          Well, with regard to standard deviation, what you say is simply not true. It is true, however, that calculating a coefficient of variation for a variable that straddles 0 is problematic.

          All of the different measures of variability you refer to, except for CV, are perfectly well defined and sensible for variables that can be positive or negative.

          As for whether to do a difference or a ratio to operationalize postural change, there is no purely statistical answer to that. That is a substantive empirical question, and if others have not studied it before you, then you should do it both ways and see which way works better for your purposes. (But, for scientific integrity, be sure to report both sets of results.)

          Comment


          • #6
            Among many other examples, let's recall that the standard normal is a distribution with mean 0 and SD 1 -- and possesses negative and positive values.

            Comment


            • #7
              (lengthier version of previous)

              Among many other examples, let's recall that the standard normal is a distribution with mean 0 and SD 1 -- and possesses negative and positive values.

              So in principle there is no problem.

              As for the practice, I think Clyde Schechter made the most important points in #5, but further comments are possible.

              For concreteness, let's focus on the problem -- or rather problems -- that Pierre is sketching.

              From #1 here we have

              I have 4 positive or negative values every 6 months in a longitudinal study.
              From Cross Validated https://stats.stackexchange.com/ques...egative-values we have (apparently the same)

              4 measurements of blood pressure every 6 months.
              From #4 here we have

              Every year during 5 years [both lying BP and standing BP just after] have been measured.
              From Cross Validated we have

              I have 4 measurements of postural changes over time: year 1, year 2, year 4 and year 6.
              Let's presume that these are related studies or somehow different facets of the same study. Whether we are dealing with 4, 5, or 6 measurements or a time span of 2 or 5 or even 6 years could be important in practice even if at first sight the principles should be essentially the same.

              I have two difficulties to flag.

              Robustness or resistance. The SD of 6, even more 5, even more 4 values is highly sensitive to even one outlier. That may be a feature -- that may be the sensitivity you want, but you need to think about it.

              Even the interquartile range is not especially resistant here. The recipes used by Stata are documented, but a demonstration is likely to be more effective.

              Code:
              . clear
              
              . set obs 6
              number of observations (_N) was 0, now 6
              
              . gen y = _n
              
              . tabstat y in 1/4, s(n p25 p75)
              
                  variable |         N       p25       p75
              -------------+------------------------------
                         y |         4       1.5       3.5
              --------------------------------------------
              
              . tabstat y in 1/5, s(n p25 p75)
              
                  variable |         N       p25       p75
              -------------+------------------------------
                         y |         5         2         4
              --------------------------------------------
              
              . tabstat y in 1/6, s(n p25 p75)
              
                  variable |         N       p25       p75
              -------------+------------------------------
                         y |         6         2         5
              --------------------------------------------
              I created a dataset with 1, 2,3,4,5,6. The IQR for a sample size of 4 will be influenced by any outlier -- lowest or highest value -- as it is

              upper quartile MINUS lower quartile =

              (average of highest and next highest value) MINUS ((average of lowest and next lowest value)

              The IQR behaves similarly for a sample size of 5 or 6. In either case it turns out to be

              second largest value MINUS second smallest value

              Hence it is robust to at most one outlier in each tail.

              Again, the question is what you want a measure of variability to do for you.

              Trend. I am no clinician or even a biostatistician or medical statistician but if the time span is up to 6 years it seems evident that some patients may experience long-term change in health or fitness over such a period. Anybody reading this will appreciate that, but it complicates the use and interpretation of any measure of variability if some patients are just fluctuating and some are showing long-term trend.

              Note. Although coefficient of variation is now deprecated by all discussants to date, that wasn't so for the original version of the question on Cross Validated. I will add in here a cross-reference

              https://stats.stackexchange.com/ques...t-of-variation

              to a longer discussion making the (standard and utterly unoriginal) case that

              coefficient of variation being natural and useful

              corresponds to

              analysis on logarithmic scale being natural and useful

              which corresponds to

              all values being positive (or at most having the same sign, which can be treated as conventional).



              Comment


              • #8
                Thank you very much Clyde Schechter and Nick Cox. Yes, there were some slightly differences between my 2 posts because we are still thinking about how many measures to consider and length of exposure. Yes it has never been published so far and it's a very interesting question from a medical point of view. So I'll use the ratio so summarize postural changes before calculating variability but also the delta (without calculating a CV with positive and negative values, I see).

                Thanks again,
                Best

                Pierre

                Comment

                Working...
                X