Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ML estimation of Zipf-Mandelbrot law

    Dear Statalisters,

    I want to estimate by maximum likelihood the parameters of the Zipf-Mandelbrot "law" by fitting a list of frequencies to a right truncated zeta distribution.

    The Zipf-Mandelbrot law (http://en.wikipedia.org/wiki/Zipf%E2...Mandelbrot_law) has two parameters alpha and beta.

    If beta=0 then the Zipf-Mandelbrot law becomes the Zipf law. I already implemented this case and it works perfect and quite fast.
    However, when I try to implement the ZM law, my program does not achieve convergence.


    The log likelihood is defined as:

    L = (-1)*alpha * SUM(f(r) * log(rank+beta) - T*log(SUM(rank+beta)^alpha)

    where SUM is the sum starting from rank one to the maximum rank N. f(r) is the frequency of the observation with rank r.
    T is defined as the sum of all individual frequencies. alpha & beta are the parameters to estimate.

    Here is the code I am using:

    ********************* START *********************
    capture program drop zipf_mandelbrot
    program zipf_mandelbrot
    version 12.1
    args lnf alpha beta
    tempname H f_log_rank
    quietly {

    gen double `f_log_rank'=frequency*log(rank+`beta')
    /* calculate the sum */
    sum `f_log_rank'
    scalar m=r(sum)

    /* calculate the sum of all frequencies */
    sum frequency
    scalar T=r(sum)


    generate double `H'=1/((rank+`beta')^`alpha')
    /* calculate the log sum */
    sum `H'
    scalar H=log(r(sum))

    /* calculate the contribution to log-likelihood */
    replace `lnf'=(-1)*`alpha'*m - T*H
    }
    end

    /* example data set */
    sysuse auto, clear
    gen frequency=price
    keep frequency


    /* generate a variable that contains the ranks */
    gsort -frequency
    gen rank=_n



    ml model lf zipf_mandelbrot () ()
    ml search
    ml maximize, iterate(1000)

    ********************* END *********************

    Various tests with different data sets using both my Stata program and an R package (using mle) show that my program sometimes gets (very close to) the right estimates, but does not converge and is generally very slow. [For the example above, R estimates alpha=0.1206387 & beta=83.2256729]
    Since this is my fist time that I am working with Stata's ml command (and with ml estimation in general), I am not sure what is going wrong.
    So if anyone can help me, please let me know, I would be very grateful.

    Many thanks

    Ali
    Last edited by Alexander Koplenig; 02 Jul 2014, 05:33.

  • #2
    Dear Statalisters,

    Since I was not able to solve the problem myself, I still would be very grateful for any ideas.
    I think there could be a very simple error somewhere in the code presented above, maybe the ml model, search and maximize options were not used correctly?

    In an R package (called 'tolerance' by DS Young (http://cran.r-project.org/web/packag.../tolerance.pdf)) the corresponding log likelihood is defined as

    ll.zima <- function(alpha, beta) sum(frequency * (alpha * log(rank + beta) +
    log(sum(1/(rank + beta)^alpha))))

    Has anyone an idea how to fit the same log likelihood with Stata?

    Thanks in advance

    Ali

    Comment


    • #3

      Dear Statalisters,

      I think I finally found a solution that I want to share with anyone interested.


      Ali

      Here is the code:
      ********************* START *********************
      capture program drop zipf_mandelbrot
      program zipf_mandelbrot
      version 12.1
      args lnf alpha beta
      tempname H
      quietly {
      /* calculate the harmonic number */
      egen double `H'=total(1/((rank+`beta')^`alpha'))

      /* calculate the contribution to log-likelihood */
      replace `lnf'=(-1)*freq*`alpha'*log(rank+`beta')-freq*log(`H')
      }
      end

      /* example data sets */
      sysuse auto, clear
      gen frequency=mpg
      keep frequency


      /* generate a variable that contains the ranks */
      gsort -frequency
      gen rank=_n


      /* fit the model */
      ml model lf zipf_mandelbrot (alpha: ) (beta: )
      ml search
      ml maximize, iterate(100) difficult

      ********************* END *********************

      Comment

      Working...
      X