Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rayleigh test and circsummarize command in STATA

    I try to apply Rayleigh test using circular module in Stata. I try to test whether my data have a monthly peak. Specifically, I try to check whether patients die more often from a specific cause at specific days of month, specific months of year etc. I have already applied chi-square goodness-of-fit and I know that the distribution of my patients is not uniform and I also tried Edward's Seasonality test. However, I know that there is one peak only and I would like to use Rayleigh test too. My variables are months (I tried 1,2,3,4... at the beginning but it didn't work, and I suppose it is due to the fact that circular commands assume that data are recorded in degrees & then I chose to record months as 360/12=30 degrees each) and count (which is the number of patients recorded in each month).


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte count
    74
    73
    89
    86
    76
    57
    67
    58
    73
    45
    54
    75
    end

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int month
      0
     30
     60
     90
    120
    150
    180
    210
    240
    270
    300
    330
    end

    I used the following code after installing the CIRCULAR statistics module

    Code:
     circsu month [fweight=count], rayleigh
    All I get is varlist not allowed r(101); I am sure that there is a mistake at the command I use but I cannot find any proper example.

    I would appreciate the help
    Last edited by Maria Mihailovits; 01 Oct 2019, 11:27.

  • #2
    Please post a data example that can copied and pasted, not an image (FAQ Advice #12; see also #18).

    Comment


    • #3
      Thanks for that. I would do this.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte count
      74
      73
      89
      86
      76
      57
      67
      58
      73
      45
      54
      75
      end
      
      gen month = (_n - 1) * 30
      expand count
      circsummarize month , detail
      
        +------------------------------------------------+
        |           n   mean   strength   median   range |
        |------------------------------------------------|
        | month   827   65.8      0.096     90.0   330.0 |
        +------------------------------------------------+
      
        +----------------------------------------------------+
        |         lower   upper   Rayleigh   Kuiper      Rao |
        |----------------------------------------------------|
        | month    36.1    95.4     0.0005   0.0000   0.0000 |
        +----------------------------------------------------+
      I don't which version you are using, but it won't be this one.

      Code:
      *! 3.0.0 NJC 24 March 2013
      * 2.1.0 NJC 15 April 2009
      * 2.0.3 NJC 2 June 2004
      * 2.0.2 NJC 6 May 2004
      * 2.0.1 NJC 30 March 2004
      * 2.0.0 NJC 13 January 2004
      * 1.3.3 NJC 10 January 1999
      * 1.3.2 NJC 1 January 1999
      * 1.3.1 NJC 15 December 1998
      * 1.3.0 NJC 15 July 1998
      * 1.2.2 NJC 29 October 1996
      * mean direction, vector strength, median and circular range of circular data
      * optionally: confidence interval for mean; Rayleigh, Kuiper and Rao tests
      
      * from 3.0.0 includes rewrite of circmedian
      * 1.0.2 NJC 15 April 2009
      * 1.0.1 NJC 31 March 2004
      * 1.0.0 NJC 20 January 2004
      
      * from 3.0.0 includes rewrite of circrao
      * 2.0.3 NJC 14 April 2009
      * 2.0.2 NJC 6 May 2004
      * 2.0.1 NJC 30 March 2004
      * 2.0.0 NJC 15 January 2004
      * 1.0.0 NJC 7 August 2001
      
      program circsummarize
              version 9.2
              syntax varlist(numeric) [if] [in] ///
          [ , BY(varname) Level(int $S_level) Detail MISSing variablelabels *]
      
          quietly {
              count `if' `in'
              if r(N) == 0 error 2000
      
              local nvars : word count `varlist'
          
                  if `nvars' > 1 & "`by'" != "" {
                          di as err "too many variables specified"
                          exit 103
                  }
      
              if `nvars' > _N {
                  preserve
                  set obs `nvars'
              }
      
              tempvar what
              gen `what' = ""
              if "`by'" != "" {
                  local which : var label `by'
                  if `"`which'"' == "" local which "`by'"
                  char `what'[varname] `"`which'"'
              }
              else char `what'[varname] "  "
      
              foreach dname in nobs vecmean vecstr median range {
                  tempvar `dname'  
                  gen ``dname'' = .
                  local dnames `dnames' ``dname''  
              }
      
              char `nobs'[varname] "n"
              char `vecmean'[varname] "mean"
              char `vecstr'[varname] "strength"
              char `median'[varname] "median"
              char `range'[varname] "range"
      
                      format `nobs'   %1.0f  
              format `vecmean' `median' `range' %2.1f                                          
                      format `vecstr'   %4.3f                                        
      
              if "`detail'" != "" {
                  foreach iname in ll ul ray kui rao {
                      tempvar `iname'
                      gen ``iname'' = .
                      local inames `inames' ``iname''
                  }
      
                  format `ll' `ul' %2.1f
                  format `ray' `kui' `rao' %5.4f
      
                  char `ll'[varname] "lower"
                  char `ul'[varname] "upper"
                  char `ray'[varname] "Rayleigh"
                  char `kui'[varname] "Kuiper"
                  char `rao'[varname] "Rao"
              }
              
              local i = 0
                  foreach v of local varlist {
                      tempvar use
                          mark `use' `if' `in'
                          markout `use' `v'
                  if "`by'" != "" {
                      tempvar group
                      egen `group' = group(`by') if `use', label `missing'
                  }
                  else local group `use'
      
      /// don't want to -summarize- here, as it would give r-class results
      mata: st_local("max", strofreal(max(st_data(., "`group'"))))  
      
                          forval j = 1/`max' {
                      local ++i          
                      tempvar thisuse
                      gen byte `thisuse' = `group' == `j'
                  
                      if "`by'" != "" {
                                          local this : label (`group') `j'
                                      }
                                  else {
                          if "`variablelabels'" != "" {
                              local this : var label `v'
                          }
                          if `"`this'"' == "" local this "`v'"
                      }  
                      replace `what' = `"`this'"' in `i'
                      local this
      
      mata: summarize("`v'", "`thisuse'", "`dnames'", `i')
                  
                      if "`detail'" != "" {
      mata: infer("`v'", "`thisuse'", "`inames'", `i', `level')
                      }
                      
                      drop `thisuse'
                  }
      
                  drop `use'
                  capture drop `group'
                  }
          }
      
          list `what' `dnames' in 1/`i', noobs subvarname abbrev(16)
      
          if "`detail'" != "" {
              list `what' `inames' in 1/`i', noobs subvarname abbrev(16)
          }
      end
      
      mata:
      
      // NJC 23 March 2013
      void summarize(string scalar varname,
                     string scalar usename,
                 string scalar dnames,
                     real scalar obsno)
      {
          real colvector work, gaps, sumdev, diff  
          real scalar nobs, xsum, ysum, vecmean, vectstr, range, i  
      
          work = st_data(., varname, usename)
          nobs = length(work)
      
          xsum = sum(sin((work * pi()) / 180))
          ysum = sum(cos((work * pi()) / 180))
          vecmean = myatan2(xsum, ysum)
      
          vecstr = sqrt(xsum^2 + ysum^2) / nobs
      
          _sort(work, 1)  
          gaps = work[2..nobs] - work[1..nobs-1]
          gaps = gaps \ (work[1] - work[nobs] + 360)
          range = 360 - max(gaps)
      
          sumdev = J(nobs, 1, .)
          
          for(i = 1; i <= nobs; i++) {
              diff = abs(work :- work[i])
              diff = rowmin((diff, 360 :- diff))
              sumdev[i] = mean(diff)
          }
      
          work = select(work, sumdev :== min(sumdev))
          xsum = sum(sin((work * pi()) / 180))
          ysum = sum(cos((work * pi()) / 180))
          median = myatan2(xsum, ysum)
      
          st_store(obsno, tokens(dnames), (nobs, vecmean, vecstr, median, range))
          st_numscalar("r(N)", nobs)  
          st_numscalar("r(vecmean)", vecmean)  
          st_numscalar("r(vecstr)", vecstr)  
          st_numscalar("r(median)", median)  
          st_numscalar("r(mean_dev)", min(sumdev))  
          st_numscalar("r(range)", range)  
      }
      
      // NJC 23 March 2013
      void infer(string scalar varname,
                 string scalar usename,
             string scalar inames,
                 real scalar obsno,
             real scalar level)
      {
          real colvector work, cos2, sumdev, diff, rank
          real scalar nobs, xsum, ysum, vecmean, vecstr, disp, sem, mult, ul, ll,
              Z, factor, P_Ray, V_n, V, P_Kui, U, U_pc, P_Rao
      
          work = st_data(., varname, usename)
          nobs = length(work)
      
          xsum = sum(sin((work * pi()) / 180))
          ysum = sum(cos((work * pi()) / 180))
          vecmean = myatan2(xsum, ysum)
      
          vecstr = sqrt(xsum^2 + ysum^2) / nobs
      
          cos2 = cos((work :- vecmean) * pi() / 90)
          disp = (1 - mean(cos2)) / (2 * vecstr^2)
          sem = sqrt(disp / nobs)
          mult = invnormal((100 + level) / 200)
          ul = mod(vecmean + (180 / pi()) * asin(mult * sem), 360)
          ll = mod(vecmean - (180 / pi()) * asin(mult * sem), 360)
      
          Z = nobs * vecstr^2
          factor = 1 + (2 * Z - Z^2) / (4 * nobs) -
          (24 * Z - 132 * Z^2 + 76 * Z^3 - 9 * Z^4) / (288 * nobs^2)
          P_Ray = exp(-Z) * factor
      
          _sort(work, 1)
           gaps = work[2..nobs] - work[1..nobs-1]
          gaps = gaps \ (work[1] - work[nobs] + 360)
              gaps = gaps :- 360 / nobs    
          U = sum(select(gaps, (gaps :> 0)))
          U_pc = 100 * U / 360
          mean = 360 / exp(1)
          sd = 360 * sqrt(2 * exp(-1) - 5 * exp(-2)) / sqrt(nobs)
          if (U > mean) P_Rao = normal(-(U - mean) / sd)  
          else P_Rao = 1 - normal((U - mean) / sd)
       
          work = work / 360
          rank = (1::nobs)            
          V_n = max((rank / nobs) :- work) + max(work :- ((rank :- 1) / nobs))
          V  = V_n * (sqrt(nobs) + 0.155 + 0.24 / sqrt(nobs))
          // Stephens 1970 p.118: preferable to tabulated levels in Fisher
          // 3 dp accuracy for P < 0.447 (V > 1.26)
          P_Kui = (8 * V^2 - 2) * exp(-2 * V^2)
      
          st_store(obsno, tokens(inames), (ll, ul, P_Ray, P_Kui, P_Rao))
          st_numscalar("r(vm_ll)", ll)  
          st_numscalar("r(vm_ul)", ul)  
          st_numscalar("r(P_Ray)", P_Ray)  
          st_numscalar("r(V_n)", V_n)  
          st_numscalar("r(V)", V)  
          st_numscalar("r(P_Kui)", P_Kui)  
          st_numscalar("r(U)", U)
          st_numscalar("r(U_pc)", U_pc)
          st_numscalar("r(P_Rao)", P_Rao)  
      }
      
      // NJC 23 March 2013 NB: result in degrees
      real scalar myatan2(real scalar sine, real scalar cosine) {
          real scalar s1, s2, work  
      
          s1 = sign(sine); s2 = sign(cosine)
          work = (180 / pi()) * atan(sine/cosine)
      
          if ((s1 == 1 & s2 == 1) | (s1 == 0 & s2 == 1)) return(work)
          else if (s1 == 1 & s2 == 0)                    return(90)
          else if (s1 == -1 & s2 == 0)                   return(270)
          else if (s2 == -1)                             return(180 + work)  
          else if (s1 == -1 & s2 == 1)                   return(360 + work)
          else if (s1 == 0 & s2 == 0)                    return(.)  
      }
      
      end

      Comment


      • #4
        Thank you very much. Your response was very helpful.

        Comment


        • #5
          I am using circsummarize to describe time of the day data (24 h clock) with the statistics mean direction and confidence intervals for variables regarding times of mosquitoes biting activity reported by housedholders in a Colombian malaria endemic region. I am no sure how the data should be recorded (degrees? or radians?), before to apply circsummarize.
          In the help for circsummarize in Stata the description says that the statistics for circular variables with scales between 0 and 360 degrees. In contrast, in the help for circular I found: "Stata expects angles to be in radians (pi radians = 180°)" an the factors _pi / 180 and 180 / _pi are mentioned for conversion between angles and radians.
          I apply circsummarize to my data recorded in both degrees and radians (degrees converted to radians) and althought the statistic mean direction an confidence intervals (converted to hours) are very similar, they are no equal. I would appreciate very much your help.


          Comment


          • #6
            "Stata expects" means that its trigonometric functions expect arguments in radians. This is in my experience just about universal across software, such that if a program offers a function to take degrees it should be flagged as such.

            circsummarize expects arguments, meaning data as variables, in degrees. It doesn't employ any intelligence to guess whether you're using anything else, and, some extreme cases aside, nothing else will work correctly.

            I have thought of allowing other arguments, but for conversions from time of day it is a matter of simple multiplication and division, so I reckon users should be up to that. Given units of hours, multiply by 15 first and divide results by 15 afterwards. This includes hours and minutes so a time of 10:30 should be fed as 10.5 hours, and so forth.

            Comment


            • #7
              Dr Cox. So, the input data are in degrees (previous conversion of time, hh:mm:ss, to hours and its fraction). Thanks a lot. It has been very kind of you.

              Comment

              Working...
              X