Missing values when calculating z score

Nitsan Machlis

Join Date: Jun 2022

Posts: 19
#1

Missing values when calculating z score

17 Jun 2022, 02:47

Hi, I have a database of grades organized by subject and exam year. I have calculated z scores for each grade relative to subject and year, using the following command:

Code:

egen z_grade = std(grade), by(subject year)

The code seems to work, however the variable z_grade has missing values for about a third of the observations in the database, despite the fact that these observations are not missing grade, subject or year data. Does anyone know why this may be happening? I have tried various solutions to no avail. Thank you.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

17 Jun 2022, 02:54

NItsan:
is it a cross-sectional or a panel dataset^

Kind regards,
Carlo
(Stata 19.0)
Comment
Nitsan Machlis

Join Date: Jun 2022

Posts: 19
#3

17 Jun 2022, 02:59

Cross-sectional. I have summarized the missing values and think the problem may be that many of the observations with missing z score values have 0 as their grade. In this case, what should the z score be?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

17 Jun 2022, 03:01

Nitsan:
your intuition seems correct:

Code:

. set obs 3
Number of observations (_N) was 0, now 3.

. g id=_n

. g grade=0

. egen wanted=std(grade), by(id)
(3 missing values generated)

. list

     +---------------------+
     | id   grade   wanted |
     |---------------------|
  1. |  1       0        . |
  2. |  2       0        . |
  3. |  3       0        . |
     +---------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10190
#5

17 Jun 2022, 04:13

The issue is not observations with test scores of zero, but no variance within a particular group defined by subject and year. Therefore, the standard deviation is zero and you cannot compute a z-score. To identify such groups:

Code:

bys subject year (grade): gen zerovariance= grade[1]==grade[_N] list if zerovariance, sepby(subject year)
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#6

17 Jun 2022, 05:57

I agree with Andrew Musau, but I think it not absurd to argue that if all values are the same then a z-score of 0 for all makes sense, but you have to code that explicitly.
2 likes
Comment
Nitsan Machlis

Join Date: Jun 2022

Posts: 19
#7

19 Jun 2022, 22:38

Yes this was my thinking too. Thanks.
Comment

Announcement

Missing values when calculating z score

Comment

Comment

Comment

Comment

Comment

Comment