Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why there's no offical command for performing Log-linear Models in Stata

    Dear Stata users,

    This is a question annoyed me for a long time. Log-linear models that model cross tabulation is used a lot in sociology, especially in social mobility research. However, there's no offical command specially designed for it in Stata. Some user-written command such as -loglin- (D. H. Judson, 1992, from stb8) and -ipf- (Adrian Mander, 2009, from SSC) is old and not well performed. And also some scholars suggest to use -logit- or -poisson- or -glm- command as a substitute. For example German Rodriguez and Maarten Buis
    http://data.princeton.edu/wws509/notes/c5.pdf
    http://maartenbuis.nl/presentations/london15b.pdf
    But compared to SPSS and others statistical softwares, these substitute is dissatisfying in their model build option, outputs (results form, parameters) and interpretation. After all, it is a mystery for me that Stata does not and does not plan to provide offical command for Log-linear models.
    Last edited by Chen Samulsion; 18 Oct 2018, 19:29.

  • #2
    This is for StataCorp really, but you have partly answered your own question by alluding to poisson, glm and so forth. I don't know the basis for your statement that StataCorp is not planning additions of this kind.

    I haven't used SPSS in this century and never used it for log-linear models, so I can't follow what you're missing.

    FWIW, one of the reasons I originally wrote contract was to get datasets into a shape standard for these models, but over the following 20 years I have not noticed much interest in them (e.g. on Statalist). I don't doubt that they are heavily used in some fields.

    Comment


    • #3
      StataCorp never tells us what its plans are, so we don't know that it does not plan to provide official commands for log-linear models. Moreover, StataCorp is the only entity that can tell us about the reasons behind its past decisions, but we can guess.

      My guess is that it isn't implemented because it is not used that much. In most disciplines it is not used at all. Even in the sub-discipline social stratification research that you mention (and I work in), the number of people that use it is fairly limited. If you go to the big conferences in this field (RC28, ECSR) then talks using log-linear models will happen, but the vast majority of talks will not use log-linear models. Moreover, those who wanted to do log-linear models can do so by using contract in combination with poisson or glm, and with the current factor variable notation, that has become even easier. Those two together reduces the added value of a separate command. StataCorp needs to prioritize what it spends its resources on, and, although I would like it to be high on their list of priorities, I understand why that is not the case.

      In principle we don't need to wait for StataCorp to implement a model. I have thought about writing such a command and decided that a basic implementation was doable, but not worth it. What would be interesting is a wider suite of log-linear models: e.g. taking care of missing values with EM or an RC2 model. I decided that implementing such a suite of log-linear commands myself would cost me too much time.

      So, why does SPSS have log-linear models? My guess is that it has to do with the age of SPSS and Stata. There was a time before the widespread use of logit and probit models for analyzing categorical data, and at that time the go-to-method for analyzing categorical data was log-linear analysis. SPSS is older than Stata and was in part written at that time, so it included a module for log-linear analysis. Stata got started in the mid `80s, at which time log-linear models were replaced by the logit and probit models, which for most applications was a clear improvement.

      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Nick, an observation about quantitative sociology: scholars relied heavily on SPSS when social mobility research was dominated by a variety of loglinear models (1974~1992 or so), and as with this tidal wave receding they turned to Stata.
        Last edited by Chen Samulsion; 19 Oct 2018, 05:40.

        Comment


        • #5
          Thanks Maarten Buis for your remarkable explanation, with both model technique and history.

          Comment


          • #6
            A student asked me about this in 2011. Michael Hout has a little green Sage book, "Mobility Tables", published in 1983. With a little bit of work I was able to replicate much of his analysis. Granted, it could be easier, but it is not impossible to do with existing Stata commands. One advantage of doing it the hard way is that it does force you to understand the models a bit better. Here is the code, entirely self-contained. You'll of course understand it better if you have Hout's book handy. (Incidentally, I do not teach this in my courses because I see so little demand for it. If somebody really really really wants to do it I might recommend that they check out SPSS or another package if they think Stata is too hard.)

            Code:
            * Reproduce Analyses from Hout, 1983, Mobility Tables,
            * Little Green Sage Book
            ***********************************************************************
            clear all
            input float(freq fathocc sonocc)
            1414 1 1
             724 2 1
             798 3 1
             756 4 1
             409 5 1
             521 1 2
             524 2 2
             648 3 2
             914 4 2
             357 5 2
             302 1 3
             254 2 3
             856 3 3
             771 4 3
             441 5 3
             643 1 4
             703 2 4
            1676 3 4
            3325 4 4
            1611 5 4
              40 1 5
              48 2 5
             108 3 5
             237 4 5
            1832 5 5
            end
            label values fathocc Occupation
            label values sonocc Occupation
            label def Occupation 1 "Upper Nonmanual", modify
            label def Occupation 2 "Lower Nonmanual", modify
            label def Occupation 3 "Upper Manual", modify
            label def Occupation 4 "Lower Manual", modify
            label def Occupation 5 "Farm", modify
            
            
            * Reproduce Hout Table 1, p. 11 
            tab2  fathocc sonocc [fw = freq]
            * Reproduce Hout Table 2, p. 12
            tab2  fathocc sonocc [fw = freq], nofreq col
            tab2  fathocc sonocc [fw = freq], nofreq row
            
            * Hout, Perfect Mobility, p. 15. See the stats for Deviance, Pearson, and residual d.f.
            glm freq  i.fathocc  i.sonocc, family(poisson) link(log)
            * Hout, p. 14. Predicted frequencies under perfect mobility
            predict pm_xb
            list  pm_xb fathocc sonocc
            
            * Quasi-Perfect Mobility - Hout p. 23
            * Create a dummy var for each diagonal element
            foreach j of numlist 1/5 {
                gen cell`j'`j' = fathocc == `j' & sonocc == `j'
            }
            glm freq  i.fathocc  i.sonocc  cell11-cell55, family(poisson) link(log)
            predict qpm_xb
            list  qpm_xb fathocc sonocc
            * The predicted values are the same as Hout reports except for the diagonals. To
            * get his predicted values:
            preserve
            foreach var of varlist  cell11-cell55 {
                replace `var' = 0
            }
            predict qpmhout_xb
            list  qpmhout_xb fathocc sonocc
            restore
            
            * Corners model, Hout p. 25
            gen cell12 = fathocc == 1 & sonocc == 2
            gen cell21 = fathocc == 2 & sonocc == 1
            gen cell45 = fathocc == 4 & sonocc == 5
            gen cell54 = fathocc == 5 & sonocc == 4
            glm freq  i.fathocc  i.sonocc  cell11-cell55 cell12-cell54, family(poisson) link(log)
            predict cm_xb
            list  cm_xb fathocc sonocc
            * get Hout's expected Freqs
            preserve
            foreach var of varlist  cell11-cell54 {
                replace `var' = 0
            }
            predict cmhout_xb
            list  cmhout_xb fathocc sonocc
            restore
            -------------------------------------------
            Richard Williams, Notre Dame Dept of Sociology
            StataNow Version: 19.5 MP (2 processor)

            EMAIL: [email protected]
            WWW: https://www3.nd.edu/~rwilliam

            Comment


            • #7
              Dear Richard Williams, thanks a lot for your attention and kind help. I identified with your student about the inconvenience. I have Hout's book and your codes and suggestion are very very helpful (and practical as to resorting to SPSS).

              Comment


              • #8
                A lot of loglinear models should be easy to replicate because the table is usually right in the publication. So, if I was trying to do this sort of thing, I would first check to see if I could replicate the work of others, and then use my code as a template for my own analysis. Of course, when possible, making sure you can replicate work is a good way to learn any method. I like the Stata Press books and The Stata Journal because they almost always show you how to replace analyses.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  If you type -findit loglinear- you will see that there are various user-written loglinear routines. I've used one of them, Adrian Mander's ipf, and it was fine for my purposes at that time. See pp. 7-9 of

                  https://www3.nd.edu/~rwilliam/stats1...ical-Stata.pdf.

                  There is an article on the ipf command (pp. 10-12) at

                  https://www.stata.com/products/stb/journals/stb55.pdf

                  You might see if ipf or some of the other user-written programs would meet your needs. Mobility tables do all these tricky models, so I'm not sure if ipf would be any easier than poisson or glm for such purposes.
                  -------------------------------------------
                  Richard Williams, Notre Dame Dept of Sociology
                  StataNow Version: 19.5 MP (2 processor)

                  EMAIL: [email protected]
                  WWW: https://www3.nd.edu/~rwilliam

                  Comment


                  • #10
                    I would first check to see if I could replicate the work of others, and then use my code as a template for my own analysis. Of course, when possible, making sure you can replicate work is a good way to learn any method.
                    So do I. We think alike.

                    Comment

                    Working...
                    X