Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing values when calculating z score

    Hi, I have a database of grades organized by subject and exam year. I have calculated z scores for each grade relative to subject and year, using the following command:

    Code:
    egen z_grade = std(grade), by(subject year)
    The code seems to work, however the variable z_grade has missing values for about a third of the observations in the database, despite the fact that these observations are not missing grade, subject or year data. Does anyone know why this may be happening? I have tried various solutions to no avail. Thank you.

  • #2
    NItsan:
    is it a cross-sectional or a panel dataset^
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Cross-sectional. I have summarized the missing values and think the problem may be that many of the observations with missing z score values have 0 as their grade. In this case, what should the z score be?

      Comment


      • #4
        Nitsan:
        your intuition seems correct:
        Code:
        . set obs 3
        Number of observations (_N) was 0, now 3.
        
        . g id=_n
        
        . g grade=0
        
        . egen wanted=std(grade), by(id)
        (3 missing values generated)
        
        . list
        
             +---------------------+
             | id   grade   wanted |
             |---------------------|
          1. |  1       0        . |
          2. |  2       0        . |
          3. |  3       0        . |
             +---------------------+
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          The issue is not observations with test scores of zero, but no variance within a particular group defined by subject and year. Therefore, the standard deviation is zero and you cannot compute a z-score. To identify such groups:

          Code:
          bys subject year (grade): gen zerovariance= grade[1]==grade[_N]
          list if zerovariance, sepby(subject year)

          Comment


          • #6
            I agree with Andrew Musau, but I think it not absurd to argue that if all values are the same then a z-score of 0 for all makes sense, but you have to code that explicitly.

            Comment


            • #7
              Yes this was my thinking too. Thanks.

              Comment

              Working...
              X