Dear Statalisters,
I want to estimate by maximum likelihood the parameters of the Zipf-Mandelbrot "law" by fitting a list of frequencies to a right truncated zeta distribution.
The Zipf-Mandelbrot law (http://en.wikipedia.org/wiki/Zipf%E2...Mandelbrot_law) has two parameters alpha and beta.
If beta=0 then the Zipf-Mandelbrot law becomes the Zipf law. I already implemented this case and it works perfect and quite fast.
However, when I try to implement the ZM law, my program does not achieve convergence.
The log likelihood is defined as:
L = (-1)*alpha * SUM(f(r) * log(rank+beta) - T*log(SUM(rank+beta)^alpha)
where SUM is the sum starting from rank one to the maximum rank N. f(r) is the frequency of the observation with rank r.
T is defined as the sum of all individual frequencies. alpha & beta are the parameters to estimate.
Here is the code I am using:
********************* START *********************
capture program drop zipf_mandelbrot
program zipf_mandelbrot
version 12.1
args lnf alpha beta
tempname H f_log_rank
quietly {
gen double `f_log_rank'=frequency*log(rank+`beta')
/* calculate the sum */
sum `f_log_rank'
scalar m=r(sum)
/* calculate the sum of all frequencies */
sum frequency
scalar T=r(sum)
generate double `H'=1/((rank+`beta')^`alpha')
/* calculate the log sum */
sum `H'
scalar H=log(r(sum))
/* calculate the contribution to log-likelihood */
replace `lnf'=(-1)*`alpha'*m - T*H
}
end
/* example data set */
sysuse auto, clear
gen frequency=price
keep frequency
/* generate a variable that contains the ranks */
gsort -frequency
gen rank=_n
ml model lf zipf_mandelbrot () ()
ml search
ml maximize, iterate(1000)
********************* END *********************
Various tests with different data sets using both my Stata program and an R package (using mle) show that my program sometimes gets (very close to) the right estimates, but does not converge and is generally very slow. [For the example above, R estimates alpha=0.1206387 & beta=83.2256729]
Since this is my fist time that I am working with Stata's ml command (and with ml estimation in general), I am not sure what is going wrong.
So if anyone can help me, please let me know, I would be very grateful.
Many thanks
Ali
I want to estimate by maximum likelihood the parameters of the Zipf-Mandelbrot "law" by fitting a list of frequencies to a right truncated zeta distribution.
The Zipf-Mandelbrot law (http://en.wikipedia.org/wiki/Zipf%E2...Mandelbrot_law) has two parameters alpha and beta.
If beta=0 then the Zipf-Mandelbrot law becomes the Zipf law. I already implemented this case and it works perfect and quite fast.
However, when I try to implement the ZM law, my program does not achieve convergence.
The log likelihood is defined as:
L = (-1)*alpha * SUM(f(r) * log(rank+beta) - T*log(SUM(rank+beta)^alpha)
where SUM is the sum starting from rank one to the maximum rank N. f(r) is the frequency of the observation with rank r.
T is defined as the sum of all individual frequencies. alpha & beta are the parameters to estimate.
Here is the code I am using:
********************* START *********************
capture program drop zipf_mandelbrot
program zipf_mandelbrot
version 12.1
args lnf alpha beta
tempname H f_log_rank
quietly {
gen double `f_log_rank'=frequency*log(rank+`beta')
/* calculate the sum */
sum `f_log_rank'
scalar m=r(sum)
/* calculate the sum of all frequencies */
sum frequency
scalar T=r(sum)
generate double `H'=1/((rank+`beta')^`alpha')
/* calculate the log sum */
sum `H'
scalar H=log(r(sum))
/* calculate the contribution to log-likelihood */
replace `lnf'=(-1)*`alpha'*m - T*H
}
end
/* example data set */
sysuse auto, clear
gen frequency=price
keep frequency
/* generate a variable that contains the ranks */
gsort -frequency
gen rank=_n
ml model lf zipf_mandelbrot () ()
ml search
ml maximize, iterate(1000)
********************* END *********************
Various tests with different data sets using both my Stata program and an R package (using mle) show that my program sometimes gets (very close to) the right estimates, but does not converge and is generally very slow. [For the example above, R estimates alpha=0.1206387 & beta=83.2256729]
Since this is my fist time that I am working with Stata's ml command (and with ml estimation in general), I am not sure what is going wrong.
So if anyone can help me, please let me know, I would be very grateful.
Many thanks
Ali
Comment