Dear STATALIST,
My first post, so sorry if anything is unclear.
My problem is that logit and probit models are failing to converge. I have a solution and wanted to check why this worked, as well as get a better of idea of why I have this problem in the first place. I searched the forum archives, and couldn't find anything very helpful.
I am running logit and probit models on a pooled UK dataset of households (a combination of the Survey of English Housing and House Price Index data). My dependent variable is a dummy variable indicating whether the household owns their house or not (the alternative being to rent). The mean of this dummy variable is 0.87 (so 87% are 1's and 13% are 0) and my sample size is 103,821. I have 29 controls and a constant.
This is my probit input, logit is analagous:
When entering this code I get many 100s of iterations, get bored and press break after a while. If I use the
the following message is received:
"Note: 0 failures and 119 successes completely determined.
Warning: convergence not achieved"
I assume the latter line is the more relevant one. My (shaky) understanding of why this would happen is that one or more of the coefficients estimated in my regression is not converging to a single value. In other words when stata is using whatever numerical method it uses to maximise likelihood, there is no single value that maxmises likelihood (or rather there are several). Question 1 - have I got this completely wrong/is there something I'm missing here?
I played around with iterations a bit, and found that the coefficients never changed between 10-50 iterations (and all the others in between), but I always got the error message above. My understanding is, however, that this does not make these results robust in any way. I mention it anyway in case it's helpful.
I was advised to play around with my regression to see if I could identify the variable that 'caused' this convergence. I found that when I omitted the weekly income variable (w_inc) there were no problems with convergence. As I wanted to include weekly income, and essentially on a whim, I thought it might help to get rid of some of the extreme values of weekly income. On removing the top 250 people - who all earn above £6000 ($8952) a week and of whom 22 have a value of 0 and 228 have a value of 1 for my dependent variable - I found that my logit and probit worked fine. It confirmed the sign and significance of coefficients estimated from a linear probability model, as well as when I pre-defined the number of iterations. Note I still get 16 successes pre-determined.
Question 2: Is this a good fix? What likely problems might this cause?
Sorry if anything is unclear, and if it would be helpful for me to give any more information, more detail on my commands, upload some histograms or data then I can do so.
Thanks in advance,
Joe
My first post, so sorry if anything is unclear.
My problem is that logit and probit models are failing to converge. I have a solution and wanted to check why this worked, as well as get a better of idea of why I have this problem in the first place. I searched the forum archives, and couldn't find anything very helpful.
I am running logit and probit models on a pooled UK dataset of households (a combination of the Survey of English Housing and House Price Index data). My dependent variable is a dummy variable indicating whether the household owns their house or not (the alternative being to rent). The mean of this dummy variable is 0.87 (so 87% are 1's and 13% are 0) and my sample size is 103,821. I have 29 controls and a constant.
This is my probit input, logit is analagous:
Code:
logit own hpl1 rnl1 crrl1 voll1 rntl1 w_inc savings age old npers ndepch R2 R3 R4 R5 R6 R7 R8 R9 R10 G2 G3 G4 G5 G6 G7 G8 G9 St_5
Code:
,iter(20)
"Note: 0 failures and 119 successes completely determined.
Warning: convergence not achieved"
I assume the latter line is the more relevant one. My (shaky) understanding of why this would happen is that one or more of the coefficients estimated in my regression is not converging to a single value. In other words when stata is using whatever numerical method it uses to maximise likelihood, there is no single value that maxmises likelihood (or rather there are several). Question 1 - have I got this completely wrong/is there something I'm missing here?
I played around with iterations a bit, and found that the coefficients never changed between 10-50 iterations (and all the others in between), but I always got the error message above. My understanding is, however, that this does not make these results robust in any way. I mention it anyway in case it's helpful.
I was advised to play around with my regression to see if I could identify the variable that 'caused' this convergence. I found that when I omitted the weekly income variable (w_inc) there were no problems with convergence. As I wanted to include weekly income, and essentially on a whim, I thought it might help to get rid of some of the extreme values of weekly income. On removing the top 250 people - who all earn above £6000 ($8952) a week and of whom 22 have a value of 0 and 228 have a value of 1 for my dependent variable - I found that my logit and probit worked fine. It confirmed the sign and significance of coefficients estimated from a linear probability model, as well as when I pre-defined the number of iterations. Note I still get 16 successes pre-determined.
Question 2: Is this a good fix? What likely problems might this cause?
Sorry if anything is unclear, and if it would be helpful for me to give any more information, more detail on my commands, upload some histograms or data then I can do so.
Thanks in advance,
Joe