Hello everyone,
This is my first post so I apologize in advance if I missed anything in the protocol for posting.
In my panel data analysis (S&P500 firms and 6 years), I am using a count variable to measure the number of reviews on Glassdoor that talk about a certain topic (variable name : GDReviews). Next, I create a variable for calculating the proportion of reviews posted in that firm year. This variable, called prop, is GDReviews divided by the total reviews posted on Glassdoor (TotalReviews) in that particular year.
I am looking for guidance on what models and Stata commands to use choosing between the following two cases. I use STATA version 17 on Windows.
Case A: In order to test if a binary firm-level variable (ProSocialGoal) causes GDReviews to increase, should I use the absolute count of reviews with fixed effects and control for the Total Reviews? In this case I would use a Poisson model with fixed effects and robust standard errors (even though my data is over dispersed with many zeros at the firm year level. Histogram is as below. The zeroes are true zeroes and not missing values.
I control for Assets and Net Income which are log transformed.
Case B: I've been told that the better way to test this relationship is by using proportion and not counts, since counts can be meaningless to compare across firms. I thought using fixed effects would resolve the problem of comparison but I am not sure. So, to test the hypothesis by looking at proportion instead of count, I use betareg . I have zeros and 1 in my proportion -- which I fixed by adding .0000000000000001 as advised in other posts online. However, betareg does not have a functionality for panel analysis (and hence for fixed effects). Now, an older post https://www.statalist.org/forums/for...both-inclusive discusses this question but I am not sure if fractional logit is appropriate for my variables.
Here is a code using betareg and controlling for as many factors as I could (but of course, it may not be the equivalent of using fixed effects).
Please help me understand what might be the best approach. I am happy to provide more details about the data and hypotheses. Thank you for your guidance.
This is my first post so I apologize in advance if I missed anything in the protocol for posting.
In my panel data analysis (S&P500 firms and 6 years), I am using a count variable to measure the number of reviews on Glassdoor that talk about a certain topic (variable name : GDReviews). Next, I create a variable for calculating the proportion of reviews posted in that firm year. This variable, called prop, is GDReviews divided by the total reviews posted on Glassdoor (TotalReviews) in that particular year.
I am looking for guidance on what models and Stata commands to use choosing between the following two cases. I use STATA version 17 on Windows.
Case A: In order to test if a binary firm-level variable (ProSocialGoal) causes GDReviews to increase, should I use the absolute count of reviews with fixed effects and control for the Total Reviews? In this case I would use a Poisson model with fixed effects and robust standard errors (even though my data is over dispersed with many zeros at the firm year level. Histogram is as below. The zeroes are true zeroes and not missing values.
I control for Assets and Net Income which are log transformed.
Code:
xtpoisson GDReviews l1.ProSocialGoal i.Year l_Assets l_NI TotalReviews, fe robust
Here is a code using betareg and controlling for as many factors as I could (but of course, it may not be the equivalent of using fixed effects).
Code:
betareg prop l1.ProSocialGoal i.Year i.industry l_Assets l_NI NumberofEmployees , vce(robust)