Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Test the effect that the combined level of education in couples has on finances

    Hi Statalist.

    I would like to test the effect of the combined level of education of each partner in a couple on finances, etc. At present I run regressions on the education level of partner 1 (educm) and partner 2 (educf):
    Code:
    xtreg totasset educm educf
    however, if I want to test interaction effects of education with another variable, I current run these separately, but wanted to know if there is a more effective way? I thought of grouping these variables, but given there are seven levels of education, this leads to a variable with too many combinations.

    I appreciate your thoughts/suggestions.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(educm educf) float totasset
    3 1   284289
    1 3    55030
    1 2     8700
    3 3   313100
    3 3   320038
    1 1   180500
    1 1   177500
    5 7  4425050
    5 7 11344318
    4 1    66100
    3 2   618133
    3 2  1332300
    3 3  1014100
    2 2     7745
    5 5   252088
    5 5   427500
    5 4  2735000
    5 4   474850
    7 4   863800
    7 4  1233019
    7 4  1265062
    1 3   200506
    end
    label values educm educ1
    label def educ1 1 "[1] up to 11 years (up to Year 11)", modify
    label def educ1 2 "[2] 12 years (Year 12)", modify
    label def educ1 3 "[3] 13 years (Cert 3/4)", modify
    label def educ1 4 "[4] 14 years (Diploma/Adv Dip)", modify
    label def educ1 5 "[5] 17-19 years (Bachelor/Honours)", modify
    label def educ1 7 "[7] 20-25 years (Master, Doctorate)", modify
    label values educf educ2
    label def educ2 1 "[1] up to 11 years (up to Year 11)", modify
    label def educ2 2 "[2] 12 years (Year 12)", modify
    label def educ2 3 "[3] 13 years (Cert 3/4)", modify
    label def educ2 4 "[4] 14 years (Diploma/Adv Dip)", modify
    label def educ2 5 "[5] 17-19 years (Bachelor/Honours)", modify
    label def educ2 7 "[7] 20-25 years (Master, Doctorate)", modify
    Stata v.15.1. I am using panel data.
    Last edited by Chris Boulis; 12 Mar 2021, 20:47. Reason: Clarification.

  • #2
    Originally posted by Chris Boulis View Post
    I would like to test the effect of the combined level of education of each partner in a couple on finances . . . if I want to test interaction effects of education with another variable, I current run these separately, but wanted to know if there is a more effective way?
    You might have a handful of identifies no observations in the sample and omitted because of collinearity messages that result from correlation between spouses' educational histories, but what's the problem with just
    Code:
    xtreg totasset i.educm##i.educf##i.another_variable, i(household_id)
    testparm i.educm#i.educf#i.another_variable

    Comment


    • #3
      A useful way of thinking about "combining" variables to simplify your model is to realize that this is equivalent to imposing an constraint. Such a constraint that I use often is to constraint the education of both partners to have the same effect. It depends on the exact application whether this makes sense, and you obviously should not use it if it does not make sense. Moreover, constraints are things you can test.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thanks Joseph Coveney. Does this provide the effect of a change in the level of education of both partner 1 and partner 2 in different locations/regions (another variable) on total assets by household id? This isn't really clear for me. Does it assume the levels of education of both partners are the same to start with?

        Comment


        • #5
          Thanks Maarten Buis. When I run the regression with c.educm and c.educf on total assets (as in #1), male education generally has a larger (and positive) effect on total assets than female education. But when I interact education with another variable, say race or religion (separately) female education tends to have a larger (and positive) effect on total assets than male education. When I run the 3x interaction suggested in #2, 50% of the results have an inverse relationship, which is quite different to what I obtained interacting these separately. I'm not sure of the best way to proceed. is it better to run the 3x interaction?

          Comment


          • #6
            Originally posted by Chris Boulis View Post
            Thanks Joseph Coveney. Does this provide the effect of a change in the level of education of both partner 1 and partner 2 in different locations/regions (another variable) on total assets by household id?
            Yes.

            Does it assume the levels of education of both partners are the same to start with?
            No.

            It's liable to be tough going if you want to use indicator variables to identify levels of education, because the profile plots (using say, -marginsplot-) will be busy and you'll need to drill down to isolate the simple main effects that you're interested in looking at in the plots. And, if you don't have all combinations of spouses' educational history represented in the dataset, then there will be inestimable gaps. But it would look something along the lines of that below.

            .ÿ
            .ÿversionÿ15.1

            .ÿ
            .ÿclearÿ*

            .ÿ
            .ÿsetÿseedÿ`=strreverse("1597678")'

            .ÿ
            .ÿquietlyÿinputÿbyte(educmÿeducf)ÿlongÿtotasset

            .ÿ
            .ÿquietlyÿpoissonÿtotassetÿc.educm##c.educf,ÿvce(robust)ÿnolog

            .ÿ
            .ÿdropÿ_all

            .ÿ
            .ÿquietlyÿsetÿobsÿ750

            .ÿgenerateÿintÿhidÿ=ÿ_n

            .ÿgenerateÿdoubleÿhid_uÿ=ÿrnormal()ÿ//ÿspousalÿcorrelationÿforÿoutcome

            .ÿ
            .ÿgenerateÿbyteÿedmÿ=ÿruniformint(1,ÿ7)

            .ÿgenerateÿbyteÿedfÿ=ÿruniformint(1,ÿ7)

            .ÿ
            .ÿgenerateÿbyteÿlocÿ=ÿruniformint(1,ÿ4)ÿ//ÿdifferentÿlocations/regions

            .ÿ
            .ÿquietlyÿexpandÿ3

            .ÿbysortÿhid:ÿgenerateÿbyteÿtimÿ=ÿ_n

            .ÿ
            .ÿforeachÿvarÿofÿvarlistÿed?ÿ{
            ÿÿ2.ÿÿÿÿÿquietlyÿreplaceÿ`var'ÿ=ÿmin(`var'ÿ+ÿruniformint(0,ÿ2),ÿ7)ÿifÿtimÿ>ÿ1
            ÿÿ3.ÿ}

            .ÿ
            .ÿtempvarÿxb

            .ÿgenerateÿdoubleÿ`xb'ÿ=ÿhid_uÿ+ÿ///
            >ÿÿÿÿÿÿÿÿÿ_b[_cons]ÿ+ÿ///
            >ÿÿÿÿÿÿÿÿÿ_b[educm]ÿ*ÿedmÿ+ÿ_b[educf]ÿ*ÿedfÿ+ÿ///
            >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ_b[c.educm#c.educf]ÿ*ÿedmÿ*ÿedf

            .ÿgenerateÿlongÿhasÿ=ÿrpoisson(exp(`xb'))

            .ÿ
            .ÿ*
            .ÿ*ÿBeginÿhere
            .ÿ*
            .ÿgenerateÿdoubleÿlhaÿ=ÿln(has)

            .ÿquietlyÿxtregÿlhaÿi.edm##i.edf##i.locÿi.tim,ÿi(hid)ÿre

            .ÿ
            .ÿcontrastÿedm#locÿedf#locÿedm#edfÿedm#edf#locÿtim

            Contrastsÿofÿmarginalÿlinearÿpredictions

            Marginsÿÿÿÿÿÿ:ÿasbalanced

            ------------------------------------------------
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿÿchi2ÿÿÿÿÿP>chi2
            -------------+----------------------------------
            ÿÿÿÿÿedm#locÿ|ÿÿÿÿÿÿÿÿÿ18ÿÿÿÿÿÿÿ28.13ÿÿÿÿÿ0.0601
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
            ÿÿÿÿÿedf#locÿ|ÿÿÿÿÿÿÿÿÿ18ÿÿÿÿÿÿÿ10.48ÿÿÿÿÿ0.9149
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
            ÿÿÿÿÿedm#edfÿ|ÿÿÿÿÿÿÿÿÿ36ÿÿÿÿ1.16e+07ÿÿÿÿÿ0.0000
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
            ÿedm#edf#locÿ|ÿÿÿÿÿÿÿÿ108ÿÿÿÿÿÿ155.28ÿÿÿÿÿ0.0020
            ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
            ÿÿÿÿÿÿÿÿÿtimÿ|ÿÿÿÿÿÿÿÿÿÿ2ÿÿÿÿÿÿÿÿ2.07ÿÿÿÿÿ0.3555
            ------------------------------------------------

            .ÿ/*ÿcontrastÿq.edm#locÿq.edf#locÿq.edm#q.edf#locÿtimÿ*/
            .ÿÿ
            .ÿexit

            endÿofÿdo-file


            .


            I've hidden the regression model's output because it's quite voluminous with a 7-level × 7-level × 4-level three-way interaction. Also, I've commented out an alternative contrast using orthogonal polynomial components that might be more of interest to your research questions.

            Comment


            • #7
              In response to #5: That is very hard to say. Remember that the education of partners is one of the strongest "natural" correlations in the social sciences. That alone makes it often pretty hard to disentangle the effects (multicolinearity), add to that twoway and threeway interactions, and all you "find" is probably just random noise.

              Since you wanted to include education as a categorical variable (a very good idea in most educational systems), adding them as c.educm and c.educf is not going to do what you want. A quick way to constrain two categorical to be equal is this:

              Code:
              forvalues i = 1/7 {
                  gen educp`i' = `i'.educm + `i'.educf
              }
              drop educp2 // reference category
              reg totasset educp*
              See this Stata tip: http://maartenbuis.nl/publications/sum_constr.html on why this works
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                Thank you Maarten Buis and Joseph Coveney for your replies. To clarify, I don't want to constrain education in couples to be equal,

                There appears to be a positive relationship between education level and assets in houesholds, but I want to test whether this relationship is influened by other factors (e.g.) race/country of birth/regional factors/religion, etc, Is this achieved with
                Code:
                xtreg totasset i.educm##i.educf##i.another_variable, i(household_id)
                (as suggested in #2 by Joseph) or is it too messy because there are too many levels?

                Would it be better for me to create a simpler categorical variable for education, such as "0 = less than university level education, 1 = university level education"? Or maybe include a third category for post-graduate education? I appreciate your thoughts.

                Comment


                • #9
                  Originally posted by Chris Boulis View Post
                  . . . is it too messy because there are too many levels?

                  Would it be better for me to create a simpler categorical variable for education . . .? I appreciate your thoughts.
                  Try it with the greatest granularity and see whether the model yields an interpretable result. If the correlation between spouses in educational level results in too many combinations omitted, i.e., just noise, a possibility that Maarten mentions, then coarsen the educational level and see whether that helps.

                  I recommend including a time variable in the model as in #6 above. It does make the assumption that there is no educational level × time interaction, and so that is not included in the model..

                  Comment


                  • #10
                    So what I got from your question is that you had too many categories. The solution is to reduce the number of categories. There are several ways of doing that, all of them will result in a loss of information. That is not necessarily bad: a model is supposed to simplify things. However, it is a real trade-off. My suggestion to constrain the effects of education to be equal for partners is one way of reducing the number of categories (you halved the number of categories). It may work for you, or may not. That depends on your research question (e.g. if you are interested in gender issues, then this is obviously horrible) and the exact thing you are studying (e.g. if there are really prominent gender differences in that topic in the society your are studying, then this constraint would probably be a bad idea).

                    Coursening the education would be an alternative way of reducing the number of categories. You have to really look at the educational system of the society you are studying. For the ones I am familiar with, a university / non-university split would often be too course. If I wanted add another level to it then post-graduate would not be the one, there are just way too few out there. Adding a level would almost always mean breaking up the non-university group. But this all depends on what you know about the educational system you are studying.
                    Last edited by Maarten Buis; 17 Mar 2021, 06:20.
                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment


                    • #11
                      I forgot to mention, when running #2, I received a message from Stata saying "matsize too small". As default this is set at 400, I increased it to 1000 and was able to run this regression. Very few of the results achieved a level of significance of less than 5% (or 10%), which may in part be because for some of the combinations there are no observations and others very few. (Note that I use a variable called "couple" (groups id, p_id)) although is this needed given the variables are male / female versions of educ level and I tsset on couple and wave? Note that after I created a new categorical variable with only three categories ((1) <=12 years, (2) 13-14 years (cert/diploma), (3) 17+ (under/post grad), I re-ran the regression, but none of the estimates were significant now, which makes me feel this is the wrong way to go.

                      Regarding
                      I recommend including a time variable in the model as in #6 above. It does make the assumption that there is no educational level × time interaction, and so that is not included in the model.
                      Is this a reasonable assumption as education changes over time - noting I have 19 waves in my panel data set? If so, could I use my wave variable? If so, how would I code that?

                      I noticed you used both -poisson- and -xtreg-? As I have panel data, I shoud use -xtpoisson- right? Do you think -xtpoisson- more suitable to my analysis than -xtreg-? I must admit I'm getting a little confused here. Am I misunderstanding the advice or is there not an effective way to test if the effect of education is affected by say race? Because I am analysing couples (not only an individual) should I group educm educf and interact that with i.another variable, such as race?

                      I'm not sure if this helps, but in terms of the difference in education within couples, only about 30% of couples have the same level of education, male ed > female ed (40%) and female ed > male ed (30%).
                      Last edited by Chris Boulis; 18 Mar 2021, 21:40.

                      Comment


                      • #12
                        Your problem seems to be lack of statistical power. That is not surprising as with interactions you loose (a lot of) statistical power. One way to try to maximize power is to make sure that the three educational categories are approximately equally large, i.e. each about 1/3 of the sample. (I assume that your category 3 is 15+ not 17+). This is however not a hard rule: the categorization also has to make sense. This is the trade-off you need to make when coursesing your educational variables. It does not seems that you can get anymore by coursening your variable further, but depending on the distribution you might get a bit more out of it by moving the cut-points. However, your categorization seems fairly standard so I don't hold much hope there.

                        I know you don't like this idea, but now you need to find other ways of simplifying your model, and the only one I can think of is to constrain the effects of education to be the same for males and females. Notice that this does not mean that the male and female have the same education, just that the effects of the education they have is the same.

                        If that does not work (or if you don't want to do that), then I am afraid that it is time to accept defeat. If the information is not present in your data, then no amount of statistical trickery can get it out. To quote John Tukey (1986, p.74-75): "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data."

                        John Tukey (1986), "Sunset salvo". The American Statistician, 40(1):72-76.
                        ---------------------------------
                        Maarten L. Buis
                        University of Konstanz
                        Department of history and sociology
                        box 40
                        78457 Konstanz
                        Germany
                        http://www.maartenbuis.nl
                        ---------------------------------

                        Comment


                        • #13
                          Thanks for your reply Maarten Buis. Here's the composition of the three categories for males and females:
                          Code:
                            
                          Cat Level of education Male (%) Female (%)
                          1 Up to 12 Years (incl "up to grade 11" + "grade 12" - 2 cats) 34 46
                          2 13-14 Years (incl "cert 3/4 & dip & adv dip - 2 cats) 41 26
                          3 15+ Years (incl bach, grad cert/dip, honours, master & dr - 3 cats) 25 28
                          100 100
                          It does not reflect 1/3 shares as you noted, however, given the differences between male and female compositions - especially in the first and second categories, I'm not sure that would be possible. I'm not sure how I could group the levels differently that still make sense. Given this, I think the shares appear reasonable.

                          Regarding constraining the effects - I understand this does not mean they have the same education level, but that the effects of each education level is the same for males and females. My only point is that when I interacted these e.g. "xtreg totasset educm#religion educf#religion" (for example) the significant results for males tend to be positive, but negative for females. As such, I'm not sure this assumption would be reasonable.

                          Does this mean there is not an effective way to test how other demographic factors may influence the effect of education (by interacting them) on financial asset values and the like (DV) in couples? Thank you and regards, Chris

                          Comment


                          • #14
                            Originally posted by Chris Boulis View Post
                            Given this, I think the shares appear reasonable.
                            I agree

                            Originally posted by Chris Boulis View Post
                            My only point is that when I interacted these the significant results for males tend to be positive, but negative for females. As such, I'm not sure this assumption would be reasonable.
                            The purpose of a model is to simplify reality. Statistical significance does not tell you anything about whehter or not a difference is important. So you can simplify by constraining the effects to be equal if you can argue that it is not important for the question you try to answer.

                            Originally posted by Chris Boulis View Post
                            Does this mean there is not an effective way to test how other demographic factors may influence the effect of education (by interacting them) on financial asset values and the like (DV) in couples?
                            You need to think of this as a trade-off: if you want to maintain the statistical power of your model and you want make it more flexible in one way (e.g. include interactions) then you need to make the model less flexible in another (e.g. constrain the effects of education to be the same for males and females). If your model is simple enough and/or your dataset is large enough that you don't care about statistical power, then you can make the model more flexible without compensation, but for every dataset there comes a point where making the model even more flexibile will lead to an unacceptable loss of statistical power. You seem to have reached that point with your dataset. So you need to think of reasonable simplifications to your model to compensate for your interactions. If you cannot find a reasonable simplification (which is a real possibility) then your data is not suitable for your question.
                            ---------------------------------
                            Maarten L. Buis
                            University of Konstanz
                            Department of history and sociology
                            box 40
                            78457 Konstanz
                            Germany
                            http://www.maartenbuis.nl
                            ---------------------------------

                            Comment


                            • #15
                              Hi Maarten Buis. I've returned to investigating the effect of a couple's level of education on assets and other financial variables. I ran your code in #7 which appears to combine the effects in creating 'educp`i' (in which you drop educp2 to create a reference group) and obtained the following output:
                              Code:
                              . xtreg asset educp*
                              
                              Random-effects GLS regression                   Number of obs     =     21,473
                              Group variable: id                              Number of groups  =      8,212
                              
                              R-sq:                                           Obs per group:
                                   within  = 0.0227                                         min =          1
                                   between = 0.0867                                         avg =        2.6
                                   overall = 0.0746                                         max =          5
                              
                                                                              Wald chi2(6)      =    1087.34
                              corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                              
                              ------------------------------------------------------------------------------
                                     asset |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                              -------------+----------------------------------------------------------------
                                    educp1 |  -6235.733   25474.09    -0.24   0.807    -56164.04    43692.58
                                    educp3 |   74522.69   26269.58     2.84   0.005     23035.26    126010.1
                                    educp4 |   356952.6   32289.92    11.05   0.000     293665.5    420239.7
                                    educp5 |   338992.3   28246.98    12.00   0.000     283629.2    394355.4
                                    educp6 |   579156.9   36669.78    15.79   0.000     507285.4    651028.4
                                    educp7 |     796663   37092.34    21.48   0.000     723963.4    869362.7
                                     _cons |   609103.8   41067.81    14.83   0.000     528612.3    689595.2
                              -------------+----------------------------------------------------------------
                                   sigma_u |  944832.83
                                   sigma_e |  847984.18
                                       rho |  .55386319   (fraction of variance due to u_i)
                              ------------------------------------------------------------------------------
                              The results appear to make sense - in that there appears a positive relationship between assets and education.

                              With regards to your comments in #10 regarding the composition of educ levels:
                              Code:
                              . tab educ1 (male partner educ)
                              
                                                             educ1 |      Freq.     Percent        Cum.
                              -------------------------------------+-----------------------------------
                                [1] up to 11 years (up to Year 11) |     19,841       23.08       23.08
                                            [2] 12 years (Year 12) |      9,007       10.48       33.56
                                           [3] 13 years (Cert 3/4) |     26,584       30.93       64.49
                                    [4] 14 years (Diploma/Adv Dip) |      8,505        9.89       74.38
                                [5] 17-19 years (Bachelor/Honours) |     11,969       13.92       88.30
                              [6] 19-20 years (Grad dip/grad cert) |      4,871        5.67       93.97
                               [7] 20-25 years (Master, Doctorate) |      5,184        6.03      100.00
                              -------------------------------------+-----------------------------------
                                                             Total |     85,961      100.00
                              
                              . tab educ2 (female partner educ)
                              
                                                             educ2 |      Freq.     Percent        Cum.
                              -------------------------------------+-----------------------------------
                                [1] up to 11 years (up to Year 11) |     28,232       31.66       31.66
                                            [2] 12 years (Year 12) |     12,337       13.84       45.50
                                           [3] 13 years (Cert 3/4) |     14,220       15.95       61.45
                                    [4] 14 years (Diploma/Adv Dip) |      9,163       10.28       71.73
                                [5] 17-19 years (Bachelor/Honours) |     14,667       16.45       88.18
                              [6] 19-20 years (Grad dip/grad cert) |      6,363        7.14       95.32
                               [7] 20-25 years (Master, Doctorate) |      4,177        4.68      100.00
                              -------------------------------------+-----------------------------------
                                                             Total |     89,159      100.00
                              After further consideration of how to test the effect of each partner's level of education on asset wealth, I want to know if I can sum the level of education of both partners (by summing the value labels of educ1 and educ2), e.g. the highest level in a couple would be 14. I would then like to interact this 'new' educ variable with demographic factors, such as race or religion, etc to see if these explain education levels in couples. Do you know how I could code such a variable?
                              Code:
                              * Example generated by -dataex-. To install: ssc install dataex
                              clear
                              input float couple byte(wave educ1 educ2 values) float asset
                                 60 18 1 1  1   37208
                              10753 18 2 2  1   43730
                               3344 18 1 1  1   18282
                               7962 18 1 2  1 2061050
                              10620 18 1 1  1 1418426
                               9252 18 1 1  1     452
                              11873 18 2 2  1    9001
                                373  2 3 1  2   30000
                               4738 10 1 1  2  225202
                               6489 10 2 2  2   10150
                               3666 10 2 2  2   55862
                               9129 10 1 2  2  130358
                                987 14 2 2  2   38182
                              10419 14 1 3  2 1115950
                               4435 14 3 2  2  970679
                                603  6 3 2  3  228998
                               1320  6 3 2  3   21700
                               2067  6 1 1  3     905
                               4340  6 2 3  3    4530
                               7826 10 2 2  3  772293
                               8200 14 2 2  3  197823
                               9066 14 3 1  4   13744
                              10648 18 2 2  4   18529
                               6315 10 1 5  7   98675
                               7492 10 1 2  7   35280
                               1657 10 1 1  7   33137
                               5016  6 1 2  8   21870
                                453  6 1 1  8    4030
                                760 14 1 3  8    7003
                               9880 18 2 2  8   22890
                               3579 10 2 2  9   32975
                               9524 10 2 2  9  123373
                               5504 14 2 3  9    2300
                                492 18 3 2  9    9997
                               7786 18 2 2  9  264234
                               2018 18 2 2  9  357362
                              11531 18 1 1  9  410774
                               1472 10 1 2 10   12720
                                435 10 2 2 10   31819
                               6355 10 1 1 10     451
                               7683 10 1 1 10   24506
                                2662 18 1 3 10  119371
                              10916 18 1 1 10  821803
                               8615 18 1 1 10   73488
                              10973 18 3 4 10   11494
                               1822  2 2 2 11   48650
                               6977  6 3 2 11   42626
                                499  6 2 3 11   62018
                               9267 10 2 5 11    9306
                               2507 10 3 2 11   15197
                              10186 14 2 3 11 2365820
                              10050 14 1 3 11    4020
                              end
                              Stata v15.1. I am using panel data.

                              Comment

                              Working...
                              X