Creating a marginsplot with my Dependent variable

Christiaan Rijsen

Join Date: Dec 2017

Posts: 16
#1

Creating a marginsplot with my Dependent variable

29 Dec 2017, 08:50

Dear all,

I'm writing my thesis for my Master's degree. I am using (OLS) regressions to test the relationship between International diversification and Firm performance and how this relationship is moderated by Cultural diversity. International diversification is measured by the percentage of foreign subsidiaries and a self created international diversification score. The measure for firm performance is ROA. Control variables in this model are Number of different countries and Number of employees. Cultural diversity is measured by the first four dimensions of Hofstede.
Now I have done all my regressions, testing for a linear relationship and a U- and S-shaped relationship. I want to create marginplots to check my results and visualize them. I already found several posts about marginsplot and try to do it on my own but it won't work. I want to plot my dependent variable over my independent variable and if possible moderator?

Below are the results of three regressions:

If I try to make margins I fill in the following:
margins, at(Dependent=(-30(0)30) Independent1 = (0(0.5)1))

Or different ones, but everytime Stata gives me the following:
variable 'Dependent' not found in list of covariates
or
Independent1: factor variables may not contain noninteger values

I tried to google for solutions, searched this website but did not find an answer and now I don't know what to do anymore.
Hope one of you has the solution so I can finish my thesis

Thank you very much in advance for your time.

Kind regards,
Christiaan van der Rijsen

Attached Files
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

29 Dec 2017, 08:56

Welcome to the Stata Forum / Statalist.

Please take a look at the FAQ, particularly the topic about sharing data/command/output.

That said, with margins, shall we want them for a categorical variable, we need to add the factor notation in the regression command. To end, the margins reflect the predictive values for the Dependent variable , hence you are not supposed to type the yvar. You may have nice examples of margins by type - help margins - in the command window. Hopefully that helped.

Last edited by Marcos Almeida; 29 Dec 2017, 08:59.

Best regards,

Marcos
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#3

29 Dec 2017, 09:03

As you have seen, you cannot specify the Dependent variable in the -margins- command: -margins- identifies the Dependent variable automatically from the posted results of the preceding regression. And you can't constrain the dependent variable to specific values for plotting: you constrain the predictors to specific values and then the corresponding values of the Dependent variable are whatever they are.

I'm confused by your post because you describe your research goal as including investigating moderation of one effect by another, but none of your regressions includes an interaction term, so there is no moderation being modeled. Also, the error message "Independent1: factor variables may not contain noninteger values" does not make any sense in the context of the commands you have shown because none of those commands are even using factor variable notation. Now, it wouldn't be the first time that Stata has given the wrong error message, when something else is actually the problem. But it is more likely that that error message does not actually arise from any of the commands you are showing and that you are actually doing something else.

If you want help, you have to show what you actually did and what Stata actually gave you. Descriptions do not provide enough detailed information. You must show the code and complete results exactly: do not edit them in any way. There are no "small" details in code. For helpful advice, please post back showing these. To make it readable, please be sure to surround the code and results with code delimiters. (Read the Forum FAQ, and especially #12, for instructions on using code delimiters if you are not familiar with them.) Please do not use screen shots in the future: the one you have shown in #1 is just barely readable on my computer; often they are altogether unreadable.

Added: Crossed with #2 which gives most of the same advice, more concisely.
Comment
Christiaan Rijsen

Join Date: Dec 2017

Posts: 16
#4

29 Dec 2017, 09:25

Thank you Marcos and Clyde for your quick reply!
So basically what your saying is that I can not add my Dependent variable in the margin comment?

I just did the following after I read your posts: linear regression without moderator:
. regress Dependent Independent2 Control1 Control2 Dummy2 Dummy3 Dummy4 Dummy5 Dummy6 Dummy7 Dummy8 Dummy9 Dummy10 Dummy11
. margins, at(Independent2 = (0 0.05 0.1)) <- because the values of my Independent variable are between 0 and 0.1
. marginsplot
Than it give me the following:

So if I interpret this graph correctly it says, when International diversification increases, my Dependent variable ROA is increasing?

Now I also want to do that for the other two regressions regarding quadratic and cubical relationships. Should I use the command?:
. margins, at(Independent2 = (0 0.05 0.1) Ind2squared = -0.1 0 0.3)

Regarding my moderator I now uploaded another regression with the moderator and interaction term included.
Hope they are readable, otherwise I will upload them in a Word file.

Thank you

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#5

29 Dec 2017, 09:50

Code:

margins, at(Independent2 = (0 0.05 0.1) Ind2squared = -0.1 0 0.3)

will give you incorrect results. For -margins- to handle a quadratic correctly, it has to know that you are dealing with a variable and its square: it cannot infer this from the presence of "squared" at the end of the variable name. To do this, you have to first re-run your regression using factor variable notation, and then you can run margins, as follows:

Code:

regress Dependent ... c.Independent2##c.Independent2...etc. margins, at(Independent2 = (0 0.05 0.1))

The use of the factor-variable term c.Independent2##c.Independent2 tells Stata that you want the regression to include both Independent2 and its square, and that these are continuous variables. With that, -margins- will handle this correctly. Note that you do not specify any values for the square of Independent2 in the -margins- command here. Stata will read the results of the preceding -regress- command and it will understand that Independent2 enters the model with both linear and quadratic terms, and it will automatically include the quadratic term in the margin calculations. Do read -help fvvarlist- for more information about factor-variable notation.

Your screenshot is unreadable on my computer this time. I imagine it contains regression output and margins output. As it happens, it wasn't necessary to see the details of that for present purposes, so the unreadability of the screenshot did not harm. But, in general, screenshots are strongly deprecated here. Again I ask you to please read the entire FAQ for the best advice available on how to get the most out of this Forum. If you follow the advice given there, you make it easier for people who want to help you to do so. By providing insufficient information or providing information in inaccessible ways, you are only dragging out the process of getting help. You will get better answers more quickly if you follow the advice in the FAQ.
Comment
Christiaan Rijsen

Join Date: Dec 2017

Posts: 16
#6

29 Dec 2017, 10:47

Thank you very much Mr Schechter!
Now I have all the marginsplots.
Comment
Christiaan Rijsen

Join Date: Dec 2017

Posts: 16
#7

05 Jan 2018, 03:06

I almost finished my thesis and interpreted the results of the marginsplots.
I have added two marginsplots, plotting two regressions testing the linear relationship between International diversification and firm performance.

In the marginsplot LinearInd1, percentage of foreign subsidiaries is used as independent variable and ROA as dependent variable. This relationship is tested not significant. If in interpret the marginsplot correctly, the line is positive linear but due to the curving line of the confidence interval it is not significant?

For the marginsplot LinearInd2, international diversification score is used as independent variable and ROA as dependent variable. This independent variable is calculated with the geographical distance of each country times the percentage of foreign subsidiaries in each country. This relationship is tested significant. The marginsplot also shows a linear positive line, and the confidence intervals have a different fluctuation.

I hope anyone of you can help me with the interpretation of these marginsplots, and if my interpretation is correct?

Your help is much appreciated, thank you in advance.

Christiaan
Attached Files
Comment
Tom Hsiung

Join Date: Sep 2017

Posts: 153
#8

05 Jan 2018, 05:33

What do you want? The post regression confidence interval for conditional mean or the prediction interval for specific independent variables?

Tom
Comment
Christiaan Rijsen

Join Date: Dec 2017

Posts: 16
#9

05 Jan 2018, 05:51

Hello Tom,

Thank you for your reply.
I am looking for the prediction interval for specific independent variables. What I am now saying in my thesis is that my findings are not significant for the relationship between percentage of foreign subsidiaries and ROA, but that the marginsplot visualizes a linear relationship. Same for the new plots I have added, adding the moderating variable cultural diversity does not make my findings significant, I hypothesized that cultural diversity would have a negative influence on the international diversification and firm performance relationship, but the marginsplots show a decrease when cultural diversity is getting higher.
And what I want to add, but I don't know if that is correct: that because of the large CI's and there fluctuations my findings are not significant?

Thank you.
Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#10

05 Jan 2018, 08:22

Re #7. You cannot ascertain the statistical significance of an effect from looking at these plots. The curved trace suggested by the ends of the confidence intervals has noting to do with it. That curve arises from the way confidence intervals are calculated, and you will see that whether the slope of the line is "statistically significant" or not. The plots enable you to understand the relationships among the variables according to your model and data, and in particular, when there are interactions, how one variable modifies the effect of another on the outcome. But they do not tell you about statistical significance.

To judge statistical significance of effects you have to look at the regression output or the -margins, dydx()- outputs. What the graphs in #7 show you is that the best estimate from your data and model is that the slope of the ROA-Ind1 relationship is positive. That it is not statistically significant in the regression output simply means that the quantity and precision of the data allow you to only estimate this in a very approximate way, the precision of the slope estimate being inadequate to even confidently assess the sign of the slope: it could be zero or perhaps a small negative and still be consistent with the data.

Re #9. Both graphs show that as Moderator increases, the predicted value of ROA decreases. And given that the graphs show all parallel lines (at least as far as is perceptible visually to me), this effect is the same at all values of the Ind1 and Ind2 predictors. That is, these graphs exhibit no interaction effect at all. That is possible, but rather unusual for a model with an interaction term in it: the interaction effect may be small and not statistically significant, but it is uncommon for it to be exactly zero (or so close to zero that the difference is not visible at all.) Did you perhaps omit the interaction term from the model you ran for these?
Comment
Christiaan Rijsen

Join Date: Dec 2017

Posts: 16
#11

05 Jan 2018, 08:51

Thank you very much for your answer Clyde. I think I now what I did wrong, in my regressions to calculate the interaction term I used another variable. Because the advise of my thesis supervisor was to do the interaction term with centered variables. Therefore I created new variables subtracting the Independent mean from the Independent variable and the Moderator mean from the Moderator. Those values multiplied would be my interaction effect.

Now if I do it differently without centering the variables my regression is:
. regress Dependent Independent1 Moderator c.Independent1##c.Moderator Control1 Control2 Dummy2 Dummy3 Dummy4 Dummy5 Dummy6 Dummy7 Dummy8 Dummy9 Dum
> my10 Dummy11

And the marginsplot looks like:

I think this is what it should look like? So that means that I can not use the centralized variables of the Independent and moderator.

Because doing it like this wont work:?
. regress Dependent Independent1 Moderator c.centered_ind1##c.centered_mod Control1 Control2 Dummy2 Dummy3 Dummy4 Dummy5 Dummy6 Dummy7 Dummy8 Dummy9 Dum
> my10 Dummy11
Attached Files

Last edited by Christiaan Rijsen; 05 Jan 2018, 09:08.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#12

05 Jan 2018, 10:48

So your new plot shows that the effect of Ind1 changes from positive at low values of Moderator to negative at high values of Moderator. The diverging lines are a good example of an interaction effect graph.

There is no reason you can't use the centered variables if you want to. But if you do that, when you run -margins-, the -at()- options need to specify the values of the centered variables that correspond to the values of the original variables in the original -at()- options so that you are showing the same range of actual data.

Added: Also, the code -regress Dependent Independent1 Moderator c.centered_ind1##c.centered_mod Control1...- is incorrect. The problem is that you have both Independent1 and Moderator and the centered variables in the model. The uncentered and centered variables are necessarily colinear, and Stata will omit something. As you have written it, Stata will leave in the uncentered variables, omit c.Independent1 and c.Moderator themselves, but retain the interaction term c.centered_ind1#c.centered_mod. So you are left with a model that contains an interaction term, but does not include their main effects, which is a mis-specification. Moreover, Stata will not know that the centered interaction term is related to the uncentered variables Independent and Moderator, so -margins- will not treat them accordingly. If you use centered variables in your model you must remove all mention of their uncentered counterparts from the model. Mixing and matching definitely does not work here!

Last edited by Clyde Schechter; 05 Jan 2018, 10:58.
Comment
Christiaan Rijsen

Join Date: Dec 2017

Posts: 16
#13

05 Jan 2018, 13:51

So if I understand it correctly:
If I want to use the interaction term with centered variables, the regression can have a normal Dependent variable but needs a centered Independent and Moderating variable? All the other variables do not need to be centered.
What I've read about centering variables is that it would help to make the intepretation of the results easier and better. But can I do the normal regression without centered variables and in the second regression when the moderating variable and interaction term is added use the centered ones? Or does that influence the way how I could interpret my results?

Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#14

05 Jan 2018, 14:19

If I want to use the interaction term with centered variables, the regression can have a normal Dependent variable but needs a centered Independent and Moderating variable? All the other variables do not need to be centered.

Correct.

What I've read about centering variables is that it would help to make the intepretation of the results easier and better.

Sometimes it does and sometimes it doesn't. If you don't use the -margins- command and rely directly on the regression output to interpret your results, you are stuck with the fact that in a model with an A#B interaction, the coefficient of A no longer represents "the effect of A." It reflects "the effect of A when B = 0." Now in many situations, B = 0 is actually impossible, or in any case outside the observed range of values of B, or perhaps occasionally observed but unusual and not really relevant. So you are left with regression coefficients that are, by themselves, not interesting, and required considerable algebraic manipulation to get to the heart of the matter. In that situation, centering solves the problem, because B_centered = 0 corresponds to B = mean value of B in the data, which is almost always an interesting value of B. So the coefficient of A_centered corresponds to the effect of A when B is at its mean value.

But if B = 0 is an interesting value of B (and similarly if A = 0 is an interesting value of A), then centering doesn't really add clarity. And just the fact that people will have to think about whether A and A_centered are different things, adds complexity to the situation.

There are other reasons why centering can be helpful in certain situations. Sometimes, centering a variable will help an estimation that is not converging to converge. And in multi-level models, if you are looking at the correlations among the random effects, the results you get actually are very sensitive to whether or not you center the variables (and, if so, whether you center them at the mean, or at some other key point in the distribution), and your interpretation has to factor that into account. So centering, broadly speaking, is a fairly complex topic whose pros and cons have to be considered for each particular model.

But can I do the normal regression without centered variables and in the second regression when the moderating variable and interaction term is added use the centered ones?

Yes, provided, as noted earlier, that when you use the interaction term from the centered variables you and exclude the non-centered IV and moderator. The only material difference between a linear interaction model with centered IV and moderator and an interaction model with uncentered IV and moderator will be in the constant term, which will typically be irrelevant to the research questions anyway.
Comment
Christiaan Rijsen

Join Date: Dec 2017

Posts: 16
#15

05 Jan 2018, 14:41

Thank you very much for your help Clyde!

Yes, provided, as noted earlier, that when you use the interaction term from the centered variables you and exclude the non-centered IV and moderator. The only material difference between a linear interaction model with centered IV and moderator and an interaction model with uncentered IV and moderator will be in the constant term, which will typically be irrelevant to the research questions anyway.

So, I will do my regressions again tomorrow when I'm back at the University.
Normal linear regression: regress Dependent Independent1 Control1 Control2 Dummy1......
Centered regression: regress Dependent Centered_Ind1 Centered_Mod c.Centered_Ind1##c.Centered_Mod Control1 Control2 Dummy1.....

And than afterwards the marginsplots to visualize the results. I tried to use the margins, dydx(Independent1) but these plots did not work correctly, all the lines wer flat. The standard: margins, at(.....) works.

Again thank you very much, this really helps me out.
Comment

Announcement

Creating a marginsplot with my Dependent variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment