Statistical hypotheses test for independent count data and dependent continous variables

Tom Jay

Join Date: Jan 2021

Posts: 8
#1

Statistical hypotheses test for independent count data and dependent continous variables

29 Aug 2021, 04:32

Dear Stata Community,

I am looking for the right statistical model to test hypotheses with the following:
Independent variable: Count data on level of US states, e.g.: Michigan=304. and of US ZIP codes, e.g. 98661=1.
Dependent variables: Continous but NOT normally distributed, negative and positive values. E.g. Return of assets.
Eventually I will also need to consider some control variables.

Best,

Thomas
Tags: None
Maria Boutchkova

Join Date: May 2016

Posts: 103
#2

29 Aug 2021, 05:00

Hello Tom,
your qurstion is not Stata speciffic but concerns empirical research design. Your starting point is an existing published paper in a reputable academic journal that performs similar analysis. Replicate the specification it uses with your data and then modify based on any differences in your research question.
I tried to imagine what your RQ could be given the informarion you provide. The first thing that occurs to me is some kind of state level measure of urban development (number of zip codes) and how it depends on the profitability of local businesses. As a quick and dirty step when using count variables, we use log transformations. For the appropriate controls, again refer to a relevant existing paper.
Goog luck!
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 661
#3

29 Aug 2021, 05:03

For count variables, poisson is usually the appropriate model, like

Code:

poisson depvar indepvars

However, if your data is multilevel (from your description I am not really sure), you could also use mepoisson to model this kind of data structure better. See the documentation for details and examples. And of course, as Maria said, using published papers as a starting point is usually a good idea for guidance.

Best wishes

(Stata 16.1 MP)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#4

29 Aug 2021, 05:45

Based on your description (dependent variable continuous, main regressor count) linear regression is an appropriate method to analyse such data, and you can use the count data as it is.
Comment
Tom Jay

Join Date: Jan 2021

Posts: 8
#5

29 Aug 2021, 05:46

Dear Stata users,

I will be more clear about my RQ, design, etc.
The preliminary RQ is: How do cluster characteristics and compositions influence firm performance?

For this I use US cluster data and data from Compustat for US firms. About the former I am all good but I want to enquire about the latter.

I want to see whether the number of firms of the same industry in one state and within the same ZIP code have an effect on firm performance (so far RoA and RoE). As i said RoA and RoE are not normally distributed. Additionally, I have count data, i.e. the number of firms, on each state and on each ZIP code clustered by industry.

Best,

Thomas

Edit: I did OLS Reg. clustered by industry code, but want to know if there are more "elaborate" techniques.

Last edited by Tom Jay; 29 Aug 2021, 05:51.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3047
#6

29 Aug 2021, 07:16

What you have done is the appropriate thing to do. There are always more elaborate techniques, but OLS is appropriate in your case.

The only thing is that you need to cluster at the level of variation of your main independent variable. From what you are describing it seem more like Industry X Zip_code to be the appropriate level of clustering.

If you get significant results, and you have more than 60 industries say, you can keep the clustering at industry level too.

Originally posted by Tom Jay View Post

Dear Stata users,

I will be more clear about my RQ, design, etc.
The preliminary RQ is: How do cluster characteristics and compositions influence firm performance?

For this I use US cluster data and data from Compustat for US firms. About the former I am all good but I want to enquire about the latter.

I want to see whether the number of firms of the same industry in one state and within the same ZIP code have an effect on firm performance (so far RoA and RoE). As i said RoA and RoE are not normally distributed. Additionally, I have count data, i.e. the number of firms, on each state and on each ZIP code clustered by industry.

Best,

Thomas

Edit: I did OLS Reg. clustered by industry code, but want to know if there are more "elaborate" techniques.
Comment

Announcement

Statistical hypotheses test for independent count data and dependent continous variables

Comment

Comment

Comment

Comment

Comment