Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • count total observations per variable group

    I have an id variable for firms (IMP), the destination of their exports (countryX), and the year (year). I need to count how many firms are exporting for each destination and for each year. I got this by tabulating countryX year, but this gives me a clumsy format with the countries in the row and years in columns. Ideally I should generate a newvar which gives me the total number of firms for every year in each destination. Appreciate any help.

  • #2
    One way to do this is:
    Code:
     egen N_destyear = count(IMP), by (countryX year)
    - Mike

    Comment


    • #3
      Thanks, Mike! But I keep on getting a type mismatch whenever i do this. Any idea why?

      Comment


      • #4
        bysort year countryX: gen howmany=_N

        Comment


        • #5
          Based on the error, IMP is a string variable, which causes egen count() to barf. The command I gave doesn't care about IMP, it just counts the # of obs within country-year. Mike's code would be slightly preferable if you had missing values on IMP, but if you have no missing values on IMP, then they'd get the same results (if you were to, say, run encode on IMP first, turning it into a numeric value).

          Comment


          • #6
            For the record

            Code:
             
            egen foo = count(IMP != "")
            would work too.

            Comment


            • #7
              Thanks Ben, Nick! This works!

              Comment


              • #8
                Hello, I've been looking for the command to create the variable in yellow, using "bysort folio2 ls04: gen mujeres=_N" I created "count" but I don't know how to create variable "mujeres por hogar", can you help me please?
                Click image for larger version

Name:	question stata.jpg
Views:	2
Size:	27.7 KB
ID:	1345769

                Comment


                • #9
                  #8 also asked and answered at http://www.statalist.org/forums/foru.../1345771-count

                  Comment


                  • #10
                    Hi ,
                    I have variables with response yes(1), no (2) and donot knwo (99) . I would like to know how many percent of each response value are present by observation ; my data set is like this ,
                    id var1 var2 var3
                    1 1 2 99
                    2 2 99 .
                    3 99 . 99
                    4 2 1 2
                    Help me with this .

                    Comment


                    • #11
                      I have a problem related to the original one but slightly different. In my dataset an observation is a firm (f_id), product (p_id), country (c_id). I want to count the number of products per firm (regardless of how many countries it is shipped).
                      One solution is to create a firm-product identifier, then drop all duplicates and count firm-product observations per firm using "collapse" . After that I can merge the resulting file with my original dataset. Is there a way to do the same avoiding collapse and merge?

                      Comment


                      • #12

                        Code:
                        egen tag = tag(p_id f_id)
                        egen wanted = total(tag), by(f_id)
                        tabdisp f_id, c(wanted)
                        For discussion see https://www.stata-journal.com/articl...article=dm0042

                        dm0042 is then an otherwise unpredictable search term for this forum. I count 117 hits, including this post.

                        Comment


                        • #13
                          Thanks Nick

                          Comment


                          • #14
                            Hello! I have a similar issue but with a little difference. I want to count by category, but also by combination of this category
                            The data looks something like this:
                            id age fund _N (by fund age)
                            11 20 A 2
                            11 20 B 2
                            22 20 A 1
                            33 20 C 1
                            I need to now, by age, how many id's have each fund, but also each combination of 2 funds. In this case, age 20 would have 1 id's with fund A, 1 id with fund B, 1 id with fund C, but also 1 id with fund A and fund B (note that id 11 would not count as "A" or "B" alone, but it would count as someone who has 2 funds, and the combination is A and B). I need to know how many people has 1 fund (and how many of each) and how much have 2 funds (and how many in each age)
                            thank you very much

                            Comment


                            • #15
                              #14 is perhaps
                              Code:
                              bysort id age: gen wanted = _N

                              Comment

                              Working...
                              X