Last digits

Nick Cox

Join Date: Mar 2014
Posts: 35698

22 Jan 2024, 05:40

The thread https://www.statalist.org/forums/for...-mata-function meandered from its title, which was natural enough, but makes starting a new thread sensible for what I am going to post.

The thread latterly focused on the Myers index for measuring age heaping, i.e. the tendency of people to report inaccurate ages that err in the last digit.

Chen Samulsion unearthed a nice-looking program by German Rodriguez that is more serious than my own token effort in that thread.

However, I did a little more work and changed the focus and direction of my code to be about last digits of integer numeric variables. So, possible last digits with nothing else said are 0 1 ... 8 9.

I calculate what I know as the dissimilarity index 1/2 SUM | p_j - 0.1 | where p_j is the proportion reporting digit j and 0.1 is the expectation if frequencies were equal. I now leave on one side whether this is, or is a relative of, the Myers index.

I also add a standard chi-square test.

At this moment I am not sure that I want to take this much further and can't even summon the inclination and energy to write a help file.

Code:

. sysuse auto, clear
(1978 automobile data)

. lastdigit mpg

  +---------------------------+
  | digit   counts   percents |
  |---------------------------|
  |     0        5        6.8 |
  |     1        7        9.5 |
  |     2        7        9.5 |
  |     3        3        4.1 |
  |     4       11       14.9 |
  |---------------------------|
  |     5        9       12.2 |
  |     6        7        9.5 |
  |     7        4        5.4 |
  |     8       12       16.2 |
  |     9        9       12.2 |
  +---------------------------+

Dissimilarity index: .15405405

Chi-square:          10.324324
P-value:             .32487296

. lastdigit price

  +---------------------------+
  | digit   counts   percents |
  |---------------------------|
  |     0        8       10.8 |
  |     1        3        4.1 |
  |     2        7        9.5 |
  |     3        4        5.4 |
  |     4        7        9.5 |
  |---------------------------|
  |     5       11       14.9 |
  |     6        7        9.5 |
  |     7        7        9.5 |
  |     8        4        5.4 |
  |     9       16       21.6 |
  +---------------------------+

Dissimilarity index: .17297297

Chi-square:          17.621622
P-value:             .03982599

Code:

*! 1.0.0 NJC 22 Jan 2024
* nod to Daniel Klein for earlier comments
program lastdigit, rclass
    version 9
    syntax varname(numeric) [if] [in]
    marksample touse
    
    * integer argument?
    assert `varlist' == floor(`varlist') if `touse'
    
    quietly count if `touse'
    if r(N) == 0 error 2000
    local N = r(N)
    
    tempvar digits counts percents which
    
    quietly {
        gen byte `digits' = mod(`varlist', 10)
        gen `counts' = .
        gen `percents' = .
        gen `which' = _n - 1 in 1/10
    }
    
    quietly forval i = 0/9 {
        local I = `i' + 1
        count if `touse' & `digits' == `i'
        replace `counts' = r(N) in `I'
        replace `percents' = 100 * r(N)/`N' in `I'
     }
    
    char `which'[varname]    "digit"
    char `counts'[varname]   "counts"
    char `percents'[varname] "percents"
    
    format `percents' %2.1f
    
    list `which' `counts' `percents' in 1/10, noobs subvarname  
    
    mata : _dissim("`counts'")
    di _n "Dissimilarity index: " r(dissim)
    
    mata : _chisq("`counts'")
    di _n "Chi-square:          " r(chisq)
    di    "P-value:             " r(P_value)
    
    return scalar dissim = r(dissim)
    return scalar chi_sq = r(chisq)
    return scalar P_value = r(P_value)
end

mata :

void _dissim(string scalar countsname) {
    real vector counts, props
    counts = st_data((1::10), countsname)    
    props = counts :/ sum(counts)
    
    st_numscalar("r(dissim)", sum(abs(props :- 0.1))/2)
}

void _chisq(string scalar countsname) {
    real vector counts, props
    real scalar n, expected, chisq
    counts = st_data((1::10), countsname)
    n = sum(counts); expected = n/10
    chisq = sum((counts :- expected):^2 :/ expected)
    
    st_numscalar("r(chisq)", chisq)
    st_numscalar("r(P_value)", chi2tail(9, chisq))
}

end

Tags: None

Announcement

Last digits