Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop variables based on their observation count

    Hi,

    I am trying to drop variables with less than 2000 observations, in order to keep only frequent observed variables.
    Could somebody help me out?

    Best
    JB

  • #2
    Julian:
    do you mean something along the following lines?
    Code:
    . set obs 3000
    Number of observations (_N) was 0, now 3,000.
    
    . g id=1 in 1/2000
    
    . replace id=2 in 2001/3000
    
    . bysort id: drop if _N<2000
    
    
    .
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      When you say "observations", I think you mean "non-missing values", as in Stata observation means row, record or case in the dataset and all variables are defined for all observations.

      missings from the Stata Journal can report on the number of missing values, which is just the complement.

      Code:
      . webuse nlswork, clear
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . missings report
      
      Checking missings in all variables:
      15082 observations with missing values
      
      -------------------
                |      #
      ----------+--------
            age |     24
            msp |     16
        nev_mar |     16
          grade |      2
       not_smsa |      8
         c_city |      8
          south |      8
       ind_code |    341
       occ_code |    121
          union |   9296
         wks_ue |   5704
         tenure |    433
          hours |     67
       wks_work |    703
      -------------------
      
      . ret li
      
      scalars:
                        r(N) =  28534
      
      macros:
                  r(varlist) : "age msp nev_mar grade not_smsa c_city south ind_code occ_code union wks_ue tenure.."
      
      . missings report , min(5000)
      
      Checking missings in all variables:
      15082 observations with missing values
      
      -----------------
              |      #
      --------+--------
        union |   9296
       wks_ue |   5704
      -----------------
      
      . ret li
      
      scalars:
                        r(N) =  28534
      
      macros:
                  r(varlist) : "union wks_ue"
      
      .. net describe dm0085_2, from(http://www.stata-journal.com/software/sj20-4)
      
      ---------------------------------------------------------------------------------------------------------------
      package dm0085_2 from http://www.stata-journal.com/software/sj20-4
      ---------------------------------------------------------------------------------------------------------------
      
      TITLE
            SJ20-4 dm0085_2. Update: A set of utilities for ...
      
      DESCRIPTION/AUTHOR(S)
            Update: A set of utilities for managing missing
              values
            by Nicholas J. Cox, Department of Geography,
                 Durham University, Durham City, UK
            Support:  [email protected]
            After installation, type help missings
            DOI:  10.1177/1536867X20976342
      
      INSTALLATION FILES                             (type net install dm0085_2)
            dm0085_2/missings.ado
            dm0085_2/missings.sthlp
      
      ANCILLARY FILES                                (type net get dm0085_2)
            dm0085_2/missings.do

      Comment

      Working...
      X