Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count firms in panel data when the number of firms is not equal to the panel id

    Hello,

    I would like to report in my results the number of firms that are included in my regression.

    I am using panel data and firms are not my panel id. My panel id represents firm-country dyads.
    Code:
    egen panel_id=group(firm country2)
    egen temp=group(panel_id year)
    bysort temp : egen temp1=count(year)
    drop if temp1>1
    xtset panel_id year
    When running a fixed effect model, many panel ids are dropped because of single observations per group or all zero outcomes.
    Code:
    xtpoisson dv c.x1##c.x2 $controls i.year, fe
    note: 25704 groups (25704 obs) dropped because of only one obs per group
    note: 47820 groups (187270 obs) dropped because of all zero outcomes
    
    Conditional fixed-effects Poisson regression         Number of obs    = 75,676
    Group variable: panel_id                             Number of groups = 14,083
    
                                                         Obs per group:
                                                                      min =      2
                                                                      avg =    5.4
                                                                      max =      9
    My question is the following: How can I know how many firms were included in that regression? The number of firms does not equal the number of groups and it is unclear which groups were dropped. Would it be possible to run a distinct option on my firm variable in the post estimation results?

    I tried to replicate the dropped groups by dropping singletons and panel ids with zero outcomes in all years in my dataset (with the intention to run codebook firm to identify the number of unique firms).
    Code:
    *Delete all outcomes == 0
    bysort panel_id: egen temp2 = sum(dv)
    gen temp3=1 if temp2>0
    replace temp3=0 if temp2==0
    drop if temp2==0
    *Delete singletons
    bysort panel_id : drop if _N==1
    I am getting close but the regression still drops more groups than I identified with the two previous commands.

    Code:
    xtpoisson dv c.x1##c.x2 $controls i.year, fe
    note: 2896 groups (2896 obs) dropped because of only one obs per group
    note: 580 groups (1896 obs) dropped because of all zero outcomes
    
    Conditional fixed-effects Poisson regression         Number of obs    = 75,676
    Group variable: panel_id                             Number of groups = 14,083
    
                                                         Obs per group:
                                                                      min =      2
                                                                      avg =    5.4
                                                                      max =      9

    Thank you.

  • #2
    I find it very difficult to understand what you have done and why from your explanation.

    Let's back up. You want to run -xtpoisson-, so you have to -xtset- your data with the panel_id and I'm guessing you also want to have only a single observation per panel per year. Do I have that right?

    But your existing data set, I suppose, sometimes has multiple observations for some panel-year combinations. Well, if those observations all agree on all of the other variables, then there is a simple enough way to reduce to one per panel-year: -duplicates drop- will get you there in one line. BUT, you should first establish why all those exact duplicate observations are there. Usually it is a sign of errors in the data management that created the data set, and where one error has been found, others may lurk. So you should, before just forging ahead, go back and review the creation of this data set, fix the errors you find, and re-create it correctly.

    If the surplus observations for panel-year combinations do not agree on all the other variables, then you have a serious problem. Just arbitrarily deleting some of the observations is going to affect the results you get from your subsequent analyses. So you need to actually review the observations in question and figure out which ones you need to keep and which ones are errors. (Or sometimes no one of them is correct and you need to combine results from the different ones to come up with a single correct record for that panel-year combination.) In any case, the specific code to fix this problem depends on the actual data and the nature of the surplus observations. To see the surplus observations so that you can begin the process you can run:
    Code:
    duplicates tag panel_id year, gen(flag)
    browse if flag
    Other snippets of code that may be helpful:
    Code:
    // TO IDENTIFY PANELS WITH ALL-ZERO OUTCOMES
    by panel_id (dv), sort: egen byte all_zero = min(dv == 0 | missing(dv))
    
    // TO IDENTIFY SINGLETONS
    by panel_id, sort: gen byte singleton = _N == 1

    Comment


    • #3
      My question is the following: How can I know how many firms were included in that regression?
      So your problem is two parts
      • what observations were included in your regression?
      • in just those observations, how many distinct firms are there?
      If you review the end of the output of the help xtpoisson command you will see information about the stored estimation results. One of them is e(sample) which is a function that marks the estimation sample - those observations that were not omitted.
      Code:
      generate used = e(sample)
      Then you can use, for example,
      Code:
      codebook firm if used==1
      and see the number of distinct values in the output on the line labelled "Unique values".

      See the output of
      Code:
      help estimation
      for a fuller description of the results stored by estimation commands.

      Comment


      • #4
        William, it worked. I believe the e(sample) command was what I was looking for. I ran your code after the regression.

        Code:
        gen used = e(sample)
        codebook firm if used==1
        
        --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        firm                                                                                                                                                                       (unlabeled)
        --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        
                          Type: Numeric (long)
        
                         Range: [1078,326872]                 
                 Unique values: 2,362
        The help estimation is also very useful as I noticed that we can use the sum command etc. on the observations that were used in the regression.


        Clyde I apologize for the confusion. Please allow me to restate the issue so it is clearer for everyone on the forum.


        You want to run -xtpoisson-, so you have to -xtset- your data with the panel_id and I'm guessing you also want to have only a single observation per panel per year. Do I have that right?

        This is correct. However, I did not have an issue with "multiple observations for some panel-year combinations".

        My data is structured as follows: firms that have business partners in various countries so my panel id is firm-country. For instance:
        Microsoft-China 2015
        Microsoft-China 2016
        Microsoft-Japan 2015
        Microsoft-Japan 2016
        IBM-China 2015
        IBM-China 2016
        IBM-Japan 2015
        IBM-Japan 2016

        When using firms fixed effects, many observations are dropped because certain firm-country combinations have only one observation or because their outcome in all years is zero.
        Code:
           
         xtpoisson dv c.x1##c.x2 $controls i.year, fe note: 2896 groups (2896 obs) dropped because of only one obs per group note: 580 groups (1896 obs) dropped because of all zero outcomes  Conditional fixed-effects Poisson regression         Number of obs    = 75,676 Group variable: panel_id                             Number of groups = 14,083
        A firm in my dataset might be present in various panel_ids (i.e., several countries) so I cannot rely on the "Number of groups" shown in the output above.

        My question was about finding the unique number of firms that were used in that regression after many groups were dropped. In the second part of my post, I was trying to find a way to manually drop the groups that were dropped during the fixed effects estimation so I could retrieve the number of firms used in the regression. Anyway, it was confusing and William's solution is much more handy.

        Thank you.

        Comment


        • #5
          Thank you very much for the clarification. I'm glad that William Lisowski was able to help you solve your problem.

          Comment

          Working...
          X