Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compositional data analysis.

    Just a cursory enquiry as to whether there are any packages currently available or in development that permit compositional data analysis (i.e. the analysis of variables that comprise non-negative values that form part of a a finite total, such as the proportions of time in 24 hours spent undertaking different activities).

    I'm aware of a few packages in R (e.g. 'compositions'), but suspect there isn't a Stata equivalent at present.

  • #2
    I think you're right. There is nothing so developed in Stata. I wrote a few basic Mata functions but certainly have no full-blown package, and I have not noticed any other.

    I found this literature interesting, indeed fascinating, but also irritating and frustrating because of its repeated denial that zeros exist. The first step is to see that log ratios preserve the information and provide a framework for analysis -- so long as there aren't zeros.

    This is what I did. (They took longer to write than they do to read.)

    Code:
    // compositional data analysis 
    
    mata : 
    
    mata drop cda_*() 
    
    // NJC 1 Sept 2008 
    // rows scaled to sum to 1 
    real matrix function cda_closure(real matrix X) { 
            return(X :/ rowsum(X)) 
            
    } 
    
    // NJC 1 Sept 2008 
    // ln(all but last column / last column) 
    real matrix function cda_alr(real matrix X) { 
            real scalar c, cm1  
            c = cols(X); cm1 = c - 1 
            return(ln(X[, (1 .. cm1)]) :- ln(X[, c])) 
    }
    
    // NJC 1 Sept 2008 
    // ln(all / row geometric means) 
    real matrix function cda_clr(real matrix X) { 
            return(ln(X) :- mean(ln(X'))') 
    } 
    
    // NJC 1 Sept 2008 
    // centring 
    real matrix cda_centre(real matrix X) { 
            real rowvector centre, invcentre
            centre = cda_closure(exp(mean(ln(X)))) 
            invcentre = cda_closure((1 :/ centre))  
            return(cda_closure(X :* invcentre)) 
    } 
    
    // NJC 3 Sept 2008 
    // column geometric means 
    real matrix cda_colgmean(real matrix X) { 
            return(exp(mean(ln(X)))) 
    } 
    
    // NJC 3 Sept 2008 
    // row geometric means 
    real matrix cda_rowgmean(real matrix X) { 
            return(exp(mean(ln(X'))')) 
    } 
    
    // NJC 2 Sept 2008 
    // multiplicative replacement for rounded zeros 
    real matrix cda_mrzero(real matrix X, real rowvector delta, | real scalar total) { 
            real matrix iszero  
            if (total == .) total = 1 
            iszero = X :== 0 
            return((iszero :* delta) + ((!iszero) :* X :* (1 :- rowsum(iszero :* delta) :/ total)))
    }
    
    // NJC 10 Oct 2008 
    // isometric log-ratio transformation 
    real matrix function cda_ilr(real matrix X) { 
            real scalar c, j  
            real matrix Y, lnX 
            c = cols(X)
            Y = X[, (1 .. c - 1)]; lnX = ln(X)
            for (j = 1; j < c; j++) { 
                    Y[, j] = rowsum(lnX[, (1 .. j)]) - j * lnX[, j + 1] 
                    Y[, j] = (1 / sqrt(j * (j + 1))) * Y[, j]
            } 
            return(Y) 
    }
    Last edited by Nick Cox; 12 Dec 2016, 04:17.

    Comment


    • #3
      Dear Craig,

      Adding to Nick's comment, I believe that there is a multivariate version of -fracreg- that implements the model developed by John Mullahy. I do not know id this is now publicly available, but John should be able to help.

      Best wishes,

      Joao

      Comment


      • #4
        Based on only a quick glance at what John Mullahy wrote, it looks like fmlogit as available on SSC, also see dirifit. Also see this talk I gave at the 2010 German Stata Users' meeting: http://maartenbuis.nl/presentations/berlin10.pdf
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Originally posted by Maarten Buis View Post
          Based on only a quick glance at what John Mullahy wrote, it looks like fmlogit as available on SSC, also see dirifit. Also see this talk I gave at the 2010 German Stata Users' meeting: http://maartenbuis.nl/presentations/berlin10.pdf
          Incredibly helpful. Thank you.

          Comment

          Working...
          X