Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use lnmvnormalden(M,V,X) function (natural logarithm of the multivariate normal density)?

    I need to calculate the (logs of) bivariate normal densities for a large number of observations on two variables R and S (this is part of some likelihood evaluator code). My searching suggests that I should use the lnmvnormalden function. (I'm using Stata code in the evaluator rather than Mata code.) The online help description is as follows:

    Code:
    lnmvnormalden(M,V,X)
           Description:  the natural logarithm of the multivariate normal density
    
                         M is the mean vector, V is the covariance matrix, and X
                         is the random vector.
           Domain M:     1 x n and n x 1 vectors
           Domain V:     n x n, positive-definite, symmetric matrices
           Domain X:     1 x n and n x 1 vectors
           Range:        -8e+307 to 8e+307
    I'm assuming that in the bivariate case, that "n" = 2. But, if that is the case, how do I calculate the log densities for a large number of observations? I was looking for functionality analogous to the use of lnnormalden in the univariate normal density case -- with this one can feed the function a variable name and calculate logs of densities for a large number of observations (as shown in the example below).

    Code:
    lnnormalden(x,m,s)
    Description: the natural logarithm of the normal density with mean m and standard deviation s
    
    lnnormalden(x,0,s) = lnnormalden(x,s) and lnnormalden(x,m,s) = lnnormalden((x-m)/s)
    - ln(s)
    Domain x: -8e+307 to 8e+307
    Domain m: -8e+307 to 8e+307
    Domain s: 1e-323 to 8e+307
    Range: 1e-323 to 8e+307
    My attempts to use lnmvnormalden are shown below, and you'll see that I am getting missing values even in a case in which I think the matrices and vectors are compatible with the help description about their dimensions. At the end of the post is a dataex extract of the data set I've been using.

    I suspect that I am misunderstanding something fundamental about lnmvnormalden and would appreciate being put right. Thanks. (Stata version 15.1 on a Windows server.)

    Code:
    . mkmat R S, matrix(X)
    
    . mat list X
    
    X[50,2]
                 R          S
     r1  9.5391407  9.4915266
     r2  9.6193991  9.7069855
     r3  10.415143  10.585219
     r4  9.5922642  9.5764217
     r5  9.4610987  9.5772448
     r6  9.5327864   9.477334
     r7   8.244071  7.9390125
     r8  9.4064007  9.4318829
     r9  9.1348619  9.1652555
    r10  9.3621168  9.3593121
    r11  9.6324663  9.2575293
    r12   9.980217  9.9595852
    r13  10.310984  10.339585
    r14  9.9761333   9.931035
    r15  9.5503778  9.5343771
    r16  8.4362001  8.4235115
    r17  8.4213428  8.4067535
    r18  10.700206   10.55044
    r19  9.5288668  9.4493494
    r20  8.4703112  9.3805895
    r21  8.1053076  8.8629341
    r22  9.7116613   9.647891
    r23  10.173896  9.8595171
    r24  10.020025  10.067356
    r25  10.999881  11.042206
    r26  10.408376  10.587072
    r27  9.3610849  9.3850803
    r28  10.240103  10.201797
    r29  9.7658339   9.614254
    r30  9.3755159  9.4409666
    r31  9.8051577   9.865303
    r32  8.1886892  8.1493082
    r33  9.2007952  9.1513138
    r34  8.4523344  9.4964209
    r35  8.7395363  8.7519808
    r36  9.8634462  9.9078016
    r37  8.7152243  8.5628929
    r38  9.9405422  9.9521141
    r39  9.8146019  9.8067474
    r40  9.9920931  10.283386
    r41  10.207658  10.085143
    r42  10.494436  10.502324
    r43  9.5510178  8.7480259
    r44  8.7528973  8.5524721
    r45  10.170954   10.15915
    r46  9.9867716  10.123063
    r47  9.8254719  9.8778028
    r48  9.9557476  9.8533163
    r49  8.8417377  9.2087393
    r50  10.170341  10.035743
    
    .
    . matrix Y = (10.170341 , 10.035743)  // last row of X
    
    . mat list Y  
    
    Y[1,2]
               c1         c2
    r1  10.170341  10.035743
    
    .
    . matrix M = (9.8 , 9.8)
    
    . mat list M
    
    M[1,2]
         c1   c2
    r1  9.8  9.8
    
    .
    . scalar v11 = .6
    
    . scalar v22 = (1-.4)^2*(.6)^2 + (.3)^2
    
    . scalar v12 = ((1-.4)^2*(.6)^2)/( .6* sqrt( (1-.4)^2*(.6)^2 + (.3)^2 )  )
    
    . matrix V = (v11, v12 \ v12, v22)
    
    . mat list V
    
    symmetric V[2,2]
               c1         c2
    r1         .6
    r2  .46093277      .2196
    
    .
    . mat dir
                V[2,2]
                M[1,2]
                Y[1,2]
                X[50,2]
    
    .
    . gen v1 = normalden(R, 9.8, .6)
    
    . su v1
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
              v1 |         50    .4439967    .2291873   .0123145   .6648793
    
    .
    . ge v2 = lnnormalden(R, 9.8, .6)
    
    . su v2
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
              v2 |         50   -1.141031    1.069473  -4.396977  -.4081499
    
    .
    . ge v3 = lnmvnormalden(M, V, X)
    (50 missing values generated)
    
    . su v3
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
              v3 |          0
    
    .
    . ge v4 = lnmvnormalden(M, V, Y)
    (50 missing values generated)
    
    . su v4
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
              v4 |          0


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(R S)
     9.539141  9.491527
     9.619399  9.706985
    10.415143  10.58522
     9.592264  9.576422
     9.461099  9.577245
     9.532786  9.477334
     8.244071  7.939013
     9.406401  9.431883
     9.134862  9.165256
     9.362117  9.359312
     9.632466  9.257529
     9.980217  9.959585
    10.310984 10.339585
     9.976133  9.931035
     9.550378  9.534377
       8.4362 8.4235115
     8.421343  8.406754
    10.700206  10.55044
     9.528867  9.449349
     8.470311 9.3805895
     8.105308  8.862934
     9.711661  9.647891
    10.173896  9.859517
    10.020025 10.067356
     10.99988 11.042206
    10.408376 10.587072
     9.361085   9.38508
    10.240103 10.201797
     9.765834  9.614254
     9.375516  9.440967
     9.805158  9.865303
     8.188689  8.149308
     9.200795  9.151314
     8.452334  9.496421
     8.739536  8.751981
     9.863446  9.907802
     8.715224  8.562893
     9.940542  9.952114
     9.814602  9.806747
     9.992093 10.283386
    10.207658 10.085143
    10.494436 10.502324
     9.551018  8.748026
     8.752897  8.552472
    10.170954  10.15915
     9.986772 10.123063
     9.825472  9.877803
     9.955748  9.853316
     8.841738  9.208739
    10.170341 10.035743
    end





  • #2
    Stephen: In your particular example the V matrix you specify is not positive definite. I wonder if this may be what is causing the result?

    Comment


    • #3
      Thanks, John: very helpful observation. If I change matrix V to be the 2x2 identity matrix (to guarantee positive definitiveness in this toy example), I get a non-missing return from the lnmvnormalden function call when I supply it matrix Y (2x1) as an argument (see below). However, I still get missing values if I supply it X (50x1) -- mostly likely because 50 > 2. Thus I am still still left with the problem of how to calculate multivariate log densities for lots of obs!

      [PS for those interested, I am trying to fit to UK data the measurement error models used by Kapteyn & Ypma ("Measurement error and misclassification: a comparison of survey and administrative data", Journal of Labor Economics, 2007, 25(3), 523-551]


      Code:
      . matrix M = (9.8 , 9.8)
      
      . mat list M
      
      M[1,2]
           c1   c2
      r1  9.8  9.8
      
      . 
      . 
      . scalar v11 = 1
      
      . scalar v22 = 1
      
      . scalar v12 = 0
      
      . matrix V = (v11, v12 \ v12, v22)
      
      . mat list V
      
      symmetric V[2,2]
          c1  c2
      r1   1
      r2   0   1
      
      . 
      . matrix symeigen ev lambda = V
      
      . mat list lambda
      
      lambda[1,2]
          e1  e2
      r1   1   1
      
      . 
      . mat D = det(V)
      
      . mat li D
      
      symmetric D[1,1]
          c1
      r1   1
      
      . 
      . 
      . mat dir
                  D[1,1]
             lambda[1,2]
                 ev[2,2]
                  V[2,2]
                  M[1,2]
                  Y[1,2]
                  X[50,2]
      
      . 
      . gen v1 = normalden(R, 9.8, .6)
      
      . su v1
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
                v1 |         50    .4439967    .2291873   .0123145   .6648793
      
      . 
      . ge v2 = lnnormalden(R, 9.8, .6)
      
      . su v2
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
                v2 |         50   -1.141031    1.069473  -4.396977  -.4081499
      
      . 
      . ge v3 = lnmvnormalden(M, V, X)
      (50 missing values generated)
      
      . su v3
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
                v3 |          0
      
      . 
      . ge v4 = lnmvnormalden(M, V, Y)
      
      . su v4
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
                v4 |         50   -1.934241           0  -1.934241  -1.934241

      Comment


      • #4
        In #3, I wrote
        Thus I am still still left with the problem of how to calculate multivariate log densities for lots of obs!
        Isabel Canette of StataCorp has kindly pointed me towards the Mata function __lnmvnormalden() to do what I want. (Note the 2 underscores at the front of the name.) I somehow missed this function when looking for it. But now I see that it's documented via help mata __lnmvnormalden()

        Illustrative code:

        Code:
        cscript
        input float(R S)
         9.825472  9.877803
         9.955748  9.853316
         8.841738  9.208739
        10.170341 10.035743
        end
        
        matrix Y = (10.170341 , 10.035743)  // last row of X 
        mat M = (9.8, 9.8) 
        mat V = (.8, 0 \ 0, .7)
        
        local a  = lnmvnormalden(M, V, Y)
        display "`a'"
        
        putmata X1 =(R S)
        mata:
            M1 = st_matrix("M")
            V1 = st_matrix("V")
            myeval = __lnmvnormalden(M1,V1,X1)
            myeval
        end
        
        getmata a = myeval
        list

        Comment


        • #5
          Thanks for sharing this, Stephen.

          Yet another good example of an "undocumented" feature of Stata that should perhaps be documented. See
          Code:
          help undocumented
          in which __lnmvnormalden() is listed.

          Comment

          Working...
          X