GLM Regression Family

Michael Leder

Join Date: Jun 2024

Posts: 4
#1

GLM Regression Family

26 Jun 2024, 09:22

Hello everyone,

I have a statistical question regarding the GLM regression and how to choose the right distribution family.

For my research, my dependent variable is the first day stock return, which theoretically ranges between -1 and ∞. Theoretically, the share price can decline by 100% or increase by more than 100%. Hence, my DV is characterized as follows:
Not normally distributed (based on Shapiro-Wilk test)

Not non-negative (incl. pos. and neg. values)

Continuous

Now, I do not know which distribution family I should choose for my GLM regression as most of the families do not fit:
Binomial --> no, because DV is not binary

Gaussian --> no, because DV is not normally distributed

Poisson --> no, because DV is not integer and not non-negative

Gamma --> no, because DV is not non-negative

NBinomial --> no, because DV is not non-negative

Tweedie --> no, because DV is not non-negative

Do you have any other ideas how to proceed with this issue?

Many thanks in advance!
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3121
#2

26 Jun 2024, 10:58

A non-normally distributed Y is not a problem. Regression would do. Maybe truncated regression would be of interest. Robust standard errors will address the presumed presence of heteroskedasticity.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35451
#3

26 Jun 2024, 12:45

The marginal distribution is pertinent but not at all decisive. So pushing it through e.g. a Shapiro-Wilk test is not really helpful. That's often true of plain regression too.

What's closer to the issue is what is plausible about conditional distributions. The delicacy involved is seen by considering that logarithmic link goes with a Poisson distribution but data suitable for that pairing often include zeros, so how is the occurrence of zero to be reconciled with logarithmic link? The resolution is that the functional form y = exp(Xb) implies that means of y conditional on X are always positive, so there is no contradiction, as positive means don't rule out some zero or even negative values.

I'd say that the choice of link comes first, and of family second. If means are expected to be positive then logarithmic link is natural, at least to try. If the fit is any good, then standard errors won't depend much on which family you go along with and you can always ask for robust standard errors, as George Ford points out.

It would possible to define your own link log1p(), but I've never seen that done.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35451
#4

26 Jun 2024, 18:20

https://stats.stackexchange.com/ques...duals-be-i-i-d isn't asking quite the same question, but it has much good explanation.
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17678

27 Jun 2024, 05:04

Michael:
welcome to this forum.
As an aside to previous helpful advice, I would say that, given the characteristics of your dependent variable, you're forced to go -gaussian- with an -identity- link and a -robust- standard errors, if heteroskedasticity has to be tamed:

Code:

. use "C:\Program Files\Stata18\ado\base\a\auto.dta"
(1978 automobile data)

. glm price mpg i.foreign, family(gaussian) link(identity) robust

Iteration 0:  Log pseudolikelihood = -683.35997  

Generalized linear models                         Number of obs   =         74
Optimization     : ML                             Residual df     =         71
                                                  Scale parameter =    6405686
Deviance         =  454803694.6                   (1/df) Deviance =    6405686
Pearson          =  454803694.6                   (1/df) Pearson  =    6405686

Variance function: V(u) = 1                       [Gaussian]
Link function    : g(u) = u                       [Identity]

                                                  AIC             =   18.55027
Log pseudolikelihood = -683.3599714               BIC             =   4.55e+08

------------------------------------------------------------------------------
             |               Robust
       price | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -294.1955   59.50419    -4.94   0.000    -410.8216   -177.5695
             |
     foreign |
    Foreign  |   1767.292   599.3555     2.95   0.003     592.5771    2942.007
       _cons |   11905.42   1343.753     8.86   0.000     9271.709    14539.12
------------------------------------------------------------------------------

. regress price mpg i.foreign, robust

Linear regression                               Number of obs     =         74
                                                F(2, 71)          =      12.72
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2838
                                                Root MSE          =     2530.9

------------------------------------------------------------------------------
             |               Robust
       price | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -294.1955   60.33645    -4.88   0.000     -414.503   -173.8881
             |
     foreign |
    Foreign  |   1767.292   607.7385     2.91   0.005     555.4961    2979.088
       _cons |   11905.42   1362.547     8.74   0.000     9188.573    14622.26
------------------------------------------------------------------------------

.

Exception made for a slight difference in robust SEs, as expected the results overlap those obtained via -regress-.

Kind regards,
Carlo
(Stata 19.0)

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#6

27 Jun 2024, 06:05

Are there many, or any, returns that actually hit -1?
Comment
John Mullahy

Join Date: Dec 2016

Posts: 743
#7

27 Jun 2024, 08:27

I don't know the stock return literature but I presume your return measures are something like

Code:

r(t+1) = (p(t+1)–p(t))/p(t) = (p(t+1)/p(t))–1

If that's the case why not model

Code:

p(t+1)/p(t)

as a non-negative outcome (e.g. using Poisson regression) and then transform to the return metric by simply subtracting 1, since

Code:

E[r(t+1)|x] = E[(p(t+1)/p(t))|x] – 1
2 likes
Comment
Michael Leder

Join Date: Jun 2024

Posts: 4
#8

27 Jun 2024, 14:31

Hi everyone,

Thank you very much!

@Carlo, thanks a lot- great to be here.

@Jeff, not a lot, but quite a lot between -0.5 and 0.

John, thanks a lot. Maybe I did not get it, but the return could potentially be negative. In literature, my DV is defined as (Closing price - opening price)/opening price. As the closing price can be below the opening price, the values can be negative.

Today, I also did an ln transformation. Now, it seems to be quite reasonable to assume a normal distribution, although Shapiro Wilk test rejects the null hyptohesis.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#9

27 Jun 2024, 16:05

What John is saying is that you can emodel p(t+1)/p(t) using an exponential function and using the Poisson quasi-MLE. Subtracting the constant one won't have affect any conclusions. Of course dividing by p(t) requires that the price doesn't hit zero.

When you say you used the log transformation, was that to p(t+1)/p(t), so that you are approximating the rate of return with the change in logs? That's certainly done a lot in economics. I'm guessing a zero price is a true anomaly, and that you just drop the return when that happens. But I don't know how much of a problem it is.
2 likes
Comment
John Mullahy

Join Date: Dec 2016

Posts: 743
#10

28 Jun 2024, 05:49

Jeff's comment raises the interesting point that if the density of p(t) has "too much" probability mass near zero then the moments of 1/p(t) (and presumably of p(t+1)/p(t)) like the mean and variance may not be finite. As such I doubt whether the assumptions required for consistency of Poisson QMLE or related approaches would be satisfied but would defer to Jeff on this point.

This was the theme of several papers by Mandelbrot in the 1960s (e.g. https://www.jstor.org/stable/1829014 ). See also https://www.jstor.org/stable/2684999 .
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35451
#11

28 Jun 2024, 06:10

#9 doesn't address the point made in #3. A single test of normality of the marginal distribution would not be decisive. In any case, what makes you say that the test leads to rejection but the normality assumption is "quite reasonable". That might be the indication of (say) looking at a normal quantile plot too, but otherwise people who take significance tests seriously should respect the result!
Comment

Announcement

GLM Regression Family

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment