Hello,
I have a large dataset of 59,000,000 observations and 100+ variables.
I have ran a regression, the point of which is to obtain a predicted cost for each of the 59,000,000 observations (these are people).
I ran the regression and saved the model estimates to a .ster file using estimates save full_model, replace.
After running the regression, and obtaining the predictions, I realised that some observations (around 100,000) had missing predictions. This is because some of the independent variables had missing values. I replaced the missing values. I then used the .ster file to re-run the parameter estimates against the full dataset (this was done as a quick fix to get predictions for all observations, prior to running the model at a later date).
However, when I obtained the predictions, I still had the same number of missing predictions. I understood that I could use saved model estimates against an updated dataset - but despite replacing missing values for certain variables, I am still getting missing predictions for the same observations, even though they no longer have missing values.
I have read the documentation, and I think it has something to do with setting e(sample) - but I am not sure I quite follow.
Is anyone able to explain why I still get missing predictions when using saved estimates despite replacing missing values in the data?
I have a large dataset of 59,000,000 observations and 100+ variables.
I have ran a regression, the point of which is to obtain a predicted cost for each of the 59,000,000 observations (these are people).
I ran the regression and saved the model estimates to a .ster file using estimates save full_model, replace.
After running the regression, and obtaining the predictions, I realised that some observations (around 100,000) had missing predictions. This is because some of the independent variables had missing values. I replaced the missing values. I then used the .ster file to re-run the parameter estimates against the full dataset (this was done as a quick fix to get predictions for all observations, prior to running the model at a later date).
However, when I obtained the predictions, I still had the same number of missing predictions. I understood that I could use saved model estimates against an updated dataset - but despite replacing missing values for certain variables, I am still getting missing predictions for the same observations, even though they no longer have missing values.
I have read the documentation, and I think it has something to do with setting e(sample) - but I am not sure I quite follow.
Is anyone able to explain why I still get missing predictions when using saved estimates despite replacing missing values in the data?
Comment