Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Make a comparative table showing characteristics of the regression and specific variables across different models

    Hello everyone,

    I am working in a 2SLS estimation by instrumental variables with cross-country data. Especifically, we are interested in comparing how 3 types of instruments perform differently in each stage of the regression under different especifications, but what we want to compare is the performance (p-value, coefficient, F-test, R-squared of the regression) for the instrument in the first stage, and then of the endogenous variable (p-value, coefficient, adjusted R-squared of the regression) in the second stage.

    I am familiar with the esttab command that allows to compare a regression under different especifications reporting the coefficientes and p-values of each variable in a simple regression, like this:

    quiet reg euro1900 settlerpotential latitude, cluster(latitude)
    estimate store Euro1

    quiet reg euro1900 settlerpotential latitude landlock temp1 humid1, cluster(latitude)
    estimate store Euro2

    quiet reg euro1900 settlerpotential latitude landlock temp1 humid1 f_brit f_french, cluster(latitude)
    estimate store Euro3

    cd "$tables"

    esttab Euro1 Euro2 Euro3 using Table111.rtf, se r2 replace label


    However, we would like to compare not the regression models but rather the behavior of a specific variable in those regressions (and under different especifications). I have been reading about how to do more complex comparative tables of regressions by using command and collect, but it seems that all these comparisons are always at the level of the regression model and can not combine statistics of the regression as a whole (like the R-squared) and of specific variables (like its p-value and coefficient).

    Thus, what I expect is to get a table that looks something like this, and analogous for the second stage of the regression. Moreover, we have 3 versions of the endogenous variable and would need to repeat the exercise of this table for all of those.

    Is there any way to produce such a table? Maybe by first building a matrix and then getting some content of that matrix into a table? Or some kind of loop?

    I have been running ivregress 2sls and manually copying this specific information into comparative tables myself (with estat firststage to get the information of the F-test) but I have been ordered to find a way to automatize this process so is straightforward to update if we change details of the regressions.

    I hope I explained my inquiry clearly. Thank you very much in advance
    Last edited by Exequiel Caceres; 05 Jan 2023, 09:00.

  • #2
    Well, yes, it is definitely possible to dynamically build a table in the way that you describe. The difficulty of this task will depend on your level of programming experience, although the fact that you are asking such a high level question makes me think this will be fairly challenging, possibly taking several days. Best case scenario is that you can find a simple high level solution with estab. If not, then on the bright side, completing tasks like this will ultimately make you a much stronger programmer, if that is a skill you want to develop. I think you are on the right track when you discuss loops and matrixes above. Here is some rough sudo-code that might help get you started:

    Code:
    results = a preallocated matrix large enough to store your results.
    for i = 1; while i is less than or equal to the number of regression equations to estimate; run the following code block, then increment i {
        estimate the ith regression
        extract the statistics of interest from the estimated regression
        update the results matrix with the extracted statistics
    {
    convert the results matrix to an exportable table
    Obviously each of these steps presents its own difficulties. That last line might have a single command solution, or you may end up needing a more complicated solution using (e.g.) the -putexcel- command. I'm not sure. That said, generally speaking, breaking large problems down into smaller steps like this will make complicated problems much more manageable.

    -- Some Other General Programming Advice --

    I'm not sure how much of this is relevant to you; this is just some stuff I've learned over the years that you might find helpful.

    When you loop, think carefully about what exactly you are iterating over. For example, if you only need to change a single variable on each iteration of the loop, then use a forvalues block and loop over those variable names. People who are new to writing loops often do not have a clear sense of what, exactly, the loop is iterating over, which can lead to all kinds of conceptual problems.

    Keep in mind that it is fairly easy to get bogged down and overwhelmed when you try to write the complete, ideal solution the first time around. You want to start as simply as possible, then slowly build up the complexity as you go. For example, just start by writing a loop that prints out the number of the current iteration and make sure that the loop actually iterates as many times as you think it should. Next, just nosily estimate each regression, making sure you are estimating the correct regression equation on each iteration. Then, before trying to store anything in a matrix, extract the return values from the regression and print them to the console, checking to make sure each value is what you expect. Be sure to test things as you go by printing values of interest to the console or by using the -assert- command.

    My own bias is showing here: my undergrad was in computer science, and I tend to think of loops as the basics, whereas statistical languages treat them as advanced concepts. If you are more of a statistical programer, maybe start without the loop and just write out all of the regressions line by line, extracting the important return values and storing them in a matrix as you go. If that is simpler for you, then you might want to put off generalizing to a loop while you focus on getting the basic solution in place. Do whatever works best for you.

    I would also start with a specific example and generalize from there: just write a do file that solves this for one and only one version of the endogenous variable. When you have that working from start to finish, try to generalize a little bit more by taking your solution and putting it inside another loop that goes through and solves the problem for each of the three versions of your endogenous variable.

    You might eventually get to the point where you define your own generalized Stata command that will (e.g.) allow you to dynamically build tables like this with an arbitrary number of regression equations and with any data set while only requiring minor tweaks to the code. This is a great ideal to aim for, particularly in cases where you expect your project requirements will change 6 months from now. But of course, don't try to start with such a solution.

    Comment

    Working...
    X