This is just a public service announcement that Mata's mean() function might not work as you think it does in the case of missing data. It is actually in the help file twice, but that assumes you even thought of checking the Mata help file for mean() to see if your program was working as intended.
In particular, when calculating the mean per column, Mata only consider rows that have no missing values. The easiest way to see what this means is perhaps the example below, where all the column means is missing, even though all the columns have non-missing values. In other words, if there's a missing in row 1 of column 1, then row 1 will be discarded for all columns.
Just hope this prevents someone from spending hours debugging their program when the issue could be this simple (as I did over the past weeks).
As a related question - is there a function in mata that calculates the mean without row-wise deletion?
In particular, when calculating the mean per column, Mata only consider rows that have no missing values. The easiest way to see what this means is perhaps the example below, where all the column means is missing, even though all the columns have non-missing values. In other words, if there's a missing in row 1 of column 1, then row 1 will be discarded for all columns.
Just hope this prevents someone from spending hours debugging their program when the issue could be this simple (as I did over the past weeks).
Code:
: A = (1, 2, . \ 3, ., 5 \ ., 1, 2) : A +-------------+ | 1 2 . | | 3 . 5 | | . 1 2 | +-------------+ : mean(A) +-------------+ | . . . | +-------------+
Comment