Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logit model including industry FE

    Hi all,

    I am working with trade data and have one observation per firm, year, and destination. I would like to predict the probability to trade as
    Code:
    logit exporter distance ... i.industry if inrange(year,2016,2019), vce(robust)
    ,
    where my outcome variable exporter is an indicator variable.
    Afterwards, I would like to make an out-of-sample prediction:
    Code:
    gen prediction_exporter = normprob(_b[cons]+_b[distance]xdistance + ... + _b[_industry__1]+_b[_industry__2] + ... _b[_industry__99]) if year == 2012
    I have two questions:
    1) Can I use the normal logit-command? (I found forum entries stating that if there are enough observations within each industry and if there are not too many industry categories, one can use xtlogit and include industry dummies. How is "enough" and "not too many" specified? I have 99-categories, and around half of them are omitted when estimating a logit. The minimum amount of observations per industry is 220; most industries have much more observations. It is not possible to use the xtlogit-command because I not only have one observation per firm and year, but I have one observation per firm, year and destination.)
    If it is not possible to use the normal logit-command, what could be an alternative?
    2) Something with my prediction is wrong, but I don't know what causes the problem. (I tried an in-sample prediction and compared it to
    Code:
    predict prediction_exporter, pr
    , which does not give me the same values.


    Unfortunately, I am not able to provide a data example due to confidentiality reasons.

    Best,
    Kathrin

  • #2
    how many observations per industry? if few, you may have an incidental parameters problem.

    xtlogit is a RE method, not a FE method.

    What's wrong with using predict rather than generate?

    Comment


    • #3
      George Ford : Thank you for your answer.

      As indicated in the post, the smallest industry has 220 observations; 4 industries are below 1,000 observations; the largest has 7 million observations. Most categories are a 6-digit number.

      How would you specify the predict command when estimating the logit for 2016-2020, but want to predict the probability for 2012?

      Best,
      Kathrin

      Comment


      • #4
        Kathrin:
        you may want to go -logit- clustering yiour standard errors on -industry-.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          predict prhat2012 if year ==2012
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Carlo Lazzaro Maarten Buis : Thanks for your input.

            Fair point with the standard errors. Carlo Lazzaro : Can I "go logit" although more than half of all industry dummies are omitted in the estimation? (I have ~50 industries being omitted, and ~35 that are not.) If I cannot "go logit", what would you suggest instead?
            Also, Stata notes "676 failures and 0 successes completely determined", not sure what this means...

            Maarten Buis : Sorry. I thought I had tried it and it didn't work, but I just looked at it again and it does work. The amount of predicted values is quite low compared to the total amount of observations in 2012 (~900,000 out of 1,400,000 observations have a non-missing prediction in 2012). I would still be curious to understand how you would do it manually (to get the dynamics behind the predict command.

            Comment


            • #7
              Kathrin:
              1) if you have at least 30 (surviving) industries, you can go -vce(cluster industry)- standard errors (see Cameron_Miller_Cluster_Robust_October152013.pdf (ucdavis.edu));
              2) it means that you've a limited variation across observations.
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Code:
                sysuse auto, clear
                
                logit foreign mpg weight 
                matrix b = e(b)
                matrix score double xb = b
                gen p = invlogit(xb)
                predict xb_predict, xb
                predict p_predict , pr
                PS. Borrowed from Clyde.
                HTML Code:
                https://www.statalist.org/forums/forum/general-stata-discussion/general/1633917-manually-producing-probabilites-after-logit

                Comment


                • #9
                  Originally posted by Kathrin Me View Post
                  Hi all,

                  I am working with trade data and have one observation per firm, year, and destination. I would like to predict the probability to trade as
                  Code:
                  logit exporter distance ... i.industry if inrange(year,2016,2019), vce(robust)
                  ,
                  where my outcome variable exporter is an indicator variable.
                  Afterwards, I would like to make an out-of-sample prediction:
                  Code:
                  gen prediction_exporter = normprob(_b[cons]+_b[distance]xdistance + ... + _b[_industry__1]+_b[_industry__2] + ... _b[_industry__99]) if year == 2012
                  Something with my prediction is wrong, but I don't know what causes the problem. (I tried an in-sample prediction and compared it to
                  Code:
                  predict prediction_exporter, pr
                  , which does not give me the same values.
                  You are mixing up logit and probit. The latter applies a normal transformation from \(Xb\) to \(Pr(Xb)\), which is what you have, whereas the former applies a logistic transformation. The logistic function is very tractable, so you can do the computations by hand. You can find some illustrations here in my old lecture notes.



                  Comment


                  • #10
                    Okay, thank you Carlo Lazzaro, George Ford and Andrew Musau!
                    Last edited by Kathrin Me; 22 Jun 2024, 10:02.

                    Comment


                    • #11
                      Adding in Andrew's suggestion:

                      Code:
                      sysuse auto, clear
                      
                      logit foreign mpg weight 
                      matrix b = e(b)
                      matrix score double xb = b
                      gen p = invlogit(xb)
                      predict xb_predict, xb
                      predict p_predict , pr
                      g p2 = invlogit(_b[_cons] + _b[mpg]*mpg + _b[weight]*weight)

                      Comment

                      Working...
                      X