dtable missing values

Sa Fe

Join Date: Jun 2020

Posts: 93
#1

dtable missing values

11 Mar 2024, 10:54

Hi Statalist

I'm encountering a problem in dtable command and would appreciate your help. When running the following code, I see that there's missing values that the dtable does not ignore automatically. How to not including the missing values?

Code:

dtable, svy factor(categorical_var, stat(fvfrequency fvpercent)) novarlabel nofvlabel

even when I do

Code:

drop if categorical_var==.

it does not work and shows the missing values included number!!
Tags: collect, dtable, missing values
Andrew Musau

Join Date: Oct 2014

Posts: 9944
#2

11 Mar 2024, 11:12

Provide a data example. At the moment, it is difficult to follow your question.
Comment

Sa Fe

Join Date: Jun 2020
Posts: 93

11 Mar 2024, 11:59

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte cat_var

svyset [pweight = finwgt], psu(psu) strata(stratum) singleunit(centered) vce(robust)
collect clear

dtable, svy factor(cat_var, stat(fvfrequency)) novarlabel nofvlabel

dtable, svy by(school_level, tests missing) factor(cat_var, stat(fvfrequency)) novarlabel nofvlabel

end

The results should be something like this:

Code:

. dtable, svy factor(cat_var, stat(fvfrequency)) novarlabel nofvlabel

------------------
          Summary
------------------
N       27,412,530
cat_var          
  0         83,090
  1         29,216
  2         31,809
  3          6,609
  4         10,574
  5          4,428
  7          7,529
  8          1,409
  9          4,830
  10        44,598
  11        43,589
------------------

.
. dtable, svy by(school_level, tests missing) factor(cat_var, stat(fvfrequency)) novarlabel no
> fvlabel
note: using test pearson across levels of school_level for cat_var.

--------------------------------------------------------------------------------------
                                        Education Level                              
           Middle school       High school           .              Total         Test
--------------------------------------------------------------------------------------
N       11,932,960 (43.5%) 15,331,806 (55.9%) 147,764 (0.5%) 27,412,530 (100.0%)      
cat_var                                                                              
  0                 16,424             66,061            606              83,090 0.011
  1                  7,082             22,134              0              29,216      
  2                  7,250             24,559              0              31,809      
  3                  3,148              3,462              0               6,609      
  4                      0              6,060          4,514              10,574      
  5                  3,326                785            317               4,428      
  7                    871              6,659              0               7,529      
  8                    525                884              0               1,409      
  9                      0              4,029            801               4,830      
  10                   284             39,482          4,832              44,598      
  11                 8,016             33,940          1,633              43,589      
--------------------------------------------------------------------------------------

As is apparent, there are some missing values corresponding to different levels of that categorical variable. We can specify "nomissing" in the "by" option, but for the dtable without the by option, I cannot find a way to have the frequencies having the missing values excluded.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29792
#4

11 Mar 2024, 12:06

Your -dtable- command specifies the -missing- suboption in the -by()- option, thereby specifically telling Stata to include the missing category of school_level. To exclude them, change -missing- to -nomissing- and the data with missing values of school_level will be omitted from the table.
Comment
Sa Fe

Join Date: Jun 2020

Posts: 93
#5

11 Mar 2024, 13:24

That's right. But I want the first code to exclude the missing values:

dtable, svy factor(cat_var, stat(fvfrequency)) novarlabel nofvlabel
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29792

11 Mar 2024, 14:29

Try

Code:

dtable if !missing(school_level), svy factor(cat_var, stat(fvfrequency)) novarlabel nofvlabel

Comment

Sa Fe

Join Date: Jun 2020

Posts: 93
#7

11 Mar 2024, 14:51

Thanks, Clyde, I appreciate it. I'm not sure why I overlooked it. I was trying "if !missing(cat_var)" and found no change. I had to use "school_level". Thanks.

I have one more question about the same codes and results. I'd want to round down the svy frequencies to the nearest 10,000. For example, 16,424 should become 10,000. I've posted this question a few days ago but I didn't receive a response to that, so I thought I'd ask here again in hopes of finding a solution. Any guidance you could offer would be greatly appreciated.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29792
#8

11 Mar 2024, 16:37

I'm afraid I don't know the answer to this other question. Hopefully somebody else does and will respond.
Comment

Announcement

dtable missing values

Comment

Comment

Comment

Comment

Comment

Comment

Comment