Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dtable missing values

    Hi Statalist

    I'm encountering a problem in dtable command and would appreciate your help. When running the following code, I see that there's missing values that the dtable does not ignore automatically. How to not including the missing values?

    Code:
    dtable, svy factor(categorical_var, stat(fvfrequency fvpercent)) novarlabel nofvlabel
    even when I do
    Code:
    drop if categorical_var==.
    it does not work and shows the missing values included number!!

  • #2
    Provide a data example. At the moment, it is difficult to follow your question.

    Comment


    • #3
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte cat_var
      
      svyset [pweight = finwgt], psu(psu) strata(stratum) singleunit(centered) vce(robust)
      collect clear
      
      dtable, svy factor(cat_var, stat(fvfrequency)) novarlabel nofvlabel
      
      dtable, svy by(school_level, tests missing) factor(cat_var, stat(fvfrequency)) novarlabel nofvlabel
      
      end

      The results should be something like this:

      Code:
      . dtable, svy factor(cat_var, stat(fvfrequency)) novarlabel nofvlabel
      
      ------------------
                Summary
      ------------------
      N       27,412,530
      cat_var          
        0         83,090
        1         29,216
        2         31,809
        3          6,609
        4         10,574
        5          4,428
        7          7,529
        8          1,409
        9          4,830
        10        44,598
        11        43,589
      ------------------
      
      .
      . dtable, svy by(school_level, tests missing) factor(cat_var, stat(fvfrequency)) novarlabel no
      > fvlabel
      note: using test pearson across levels of school_level for cat_var.
      
      --------------------------------------------------------------------------------------
                                              Education Level                              
                 Middle school       High school           .              Total         Test
      --------------------------------------------------------------------------------------
      N       11,932,960 (43.5%) 15,331,806 (55.9%) 147,764 (0.5%) 27,412,530 (100.0%)      
      cat_var                                                                              
        0                 16,424             66,061            606              83,090 0.011
        1                  7,082             22,134              0              29,216      
        2                  7,250             24,559              0              31,809      
        3                  3,148              3,462              0               6,609      
        4                      0              6,060          4,514              10,574      
        5                  3,326                785            317               4,428      
        7                    871              6,659              0               7,529      
        8                    525                884              0               1,409      
        9                      0              4,029            801               4,830      
        10                   284             39,482          4,832              44,598      
        11                 8,016             33,940          1,633              43,589      
      --------------------------------------------------------------------------------------

      As is apparent, there are some missing values corresponding to different levels of that categorical variable. We can specify "nomissing" in the "by" option, but for the dtable without the by option, I cannot find a way to have the frequencies having the missing values excluded.

      Comment


      • #4
        Your -dtable- command specifies the -missing- suboption in the -by()- option, thereby specifically telling Stata to include the missing category of school_level. To exclude them, change -missing- to -nomissing- and the data with missing values of school_level will be omitted from the table.

        Comment


        • #5
          That's right. But I want the first code to exclude the missing values:

          dtable, svy factor(cat_var, stat(fvfrequency)) novarlabel nofvlabel

          Comment


          • #6
            Try
            Code:
            dtable if !missing(school_level), svy factor(cat_var, stat(fvfrequency)) novarlabel nofvlabel

            Comment


            • #7
              Thanks, Clyde, I appreciate it. I'm not sure why I overlooked it. I was trying "if !missing(cat_var)" and found no change. I had to use "school_level". Thanks.


              I have one more question about the same codes and results. I'd want to round down the svy frequencies to the nearest 10,000. For example, 16,424 should become 10,000. I've posted this question a few days ago but I didn't receive a response to that, so I thought I'd ask here again in hopes of finding a solution. Any guidance you could offer would be greatly appreciated.

              Comment


              • #8
                I'm afraid I don't know the answer to this other question. Hopefully somebody else does and will respond.

                Comment

                Working...
                X