Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mean of frequency of each value

    I have a variable which is ID with 26000 observations, there are more than 2000 values, some are repetitive. I want to find what is the mean of frequency of each value. I tried other code and it keeps showing too many values, but values is not what I am looking for. The next step is what I am finding, which is the mean of frequency of values, not the values itself. How can I solve this?

  • #2
    -collapse, count- or -egen, count- will probably help you do what you want.

    If you do not figure out how, post a sample of your data using -dataex-, and explain with reference to this sample what you want the outcome to be.

    Comment


    • #3
      help limits shows that in current Stata (17) tabulate will accept up to 3000 rows in an one-way table, m so an immediate answer could be


      Code:
      quietly tabulate ID 
      
      return list 
      
      di r(N) / r(r)
      We can't comment on the code you tried, because you don't show any of it.

      A more detailed discussion was given in

      Code:
      SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
              (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
              Q4/08   SJ 8(4):557--568
              shows how to answer questions about distinct observations
              from first principles; provides a convenience command
      accessible at https://www.stata-journal.com/articl...article=dm0042

      Code:
      search distinct, sj
      will show the latest update of distinct which at the time of writing is


      Code:
      SJ-20-4 dm0042_3  . . . . . . . . . . . . . . . . Software update for distinct
              (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
              Q4/20   SJ 20(4):1028--1030
              sort() option has been added

      Comment


      • #4
        I have a variable call host_id
        here is part of the data, data has 65000 observations, some values are repeated. I want to find the frequency of the values into the second table, and then calculate the mean of the frequency.
        host_id
        533062
        533062
        3113849
        3488642
        3488642
        6993205
        6993205
        7489816
        7696302
        4237084
        5557443
        4493874
        7790897
        729259
        6532783
        9975678
        9975678
        9975678
        9975678
        9975678
        frequency
        2
        1
        2
        2
        1
        1
        1
        1
        1
        1
        1
        1
        5

        Comment


        • #5
          I've already made a suggestion. Clearly your variable name is host_id not ID as I gathered from #1

          Comment


          • #6
            what if I am to generate a new variable for host_id into frequency?

            Comment


            • #7
              See concurrent thread https://www.statalist.org/forums/for...-in-a-variable

              Comment


              • #8
                Code:
                . sort host
                
                . by host: gen freq = _N
                
                . egen tag = tag(host)
                
                . summ freq if tag
                
                    Variable |        Obs        Mean    Std. Dev.       Min        Max
                -------------+---------------------------------------------------------
                        freq |         13    1.538462    1.126601          1          5

                Comment


                • #9
                  Or alternatively

                  Code:
                  . contract host
                  
                  . list, sep(0)
                  
                       +-----------------+
                       | host_id   _freq |
                       |-----------------|
                    1. |  533062       2 |
                    2. |  729259       1 |
                    3. | 3.1e+06       1 |
                    4. | 3.5e+06       2 |
                    5. | 4.2e+06       1 |
                    6. | 4.5e+06       1 |
                    7. | 5.6e+06       1 |
                    8. | 6.5e+06       1 |
                    9. | 7.0e+06       2 |
                   10. | 7.5e+06       1 |
                   11. | 7.7e+06       1 |
                   12. | 7.8e+06       1 |
                   13. | 1.0e+07       5 |
                       +-----------------+
                  
                  . summ _freq
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                         _freq |         13    1.538462    1.126601          1          5

                  Comment

                  Working...
                  X