Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Take the log of zero values

    Hi everyone,

    I have a question regarding taking the log of zero values. I want to take log transformation of two variables which are not continuous, however there are zero values in both variables. In this case, we need to scale up the variables to avoid the complication of zero values. However, I am wondering:

    1. When taking the log of a variable, as long as there is a zero value in a variable, we need to scale up it or only when there are too many zero values? and what does "many" mean?
    2. Is there any difference between taking the log of discontinuous variables and continuous ones?
    3. What are the specific methods of dealing with zero values and when to use those methods?

    I read a post about this in this forum, but still do not understand.

    I really appreciate your help. Thank you.

  • #2
    I assume that by "scaling up" you mean something like using log(1+x) to avoid problems with x = 0. While there are others on this Forum whom I respect who disagree with me, in my opinion this is never appropriate to do. The problem is that the 1 comes from nowhere: it's just an arbitrary constant. You could use log(0.1 + x), or log(571823+x), etc. And the results of subsequent analyses using this variable could be very sensitive to which constant you used.

    There are various approaches to dealing with this situation, but they really depend on why you are thinking about log-transforming in the first place, and how you wish to use the resulting variable in your analyses. To cover all of the possibilities would lead to a very lengthy post that would probably just confuse you more. If you post back explaining the why and how of your particular situation, a clearer and more focused answer can be given.

    Comment


    • #3
      Hi Clyde

      Thank you so much for your answer. I am sorry, for confidential reason, I can not give the true names of all vars. I am running a nested logit model on the location choice. There are 6 nests based on the variable "region". Four main variables of interest are industry specific: Z1, Z2, Z3, Z4. The variables with zero values are X3 (the number of patent) and X5 (the number of habour). X2 is a dummy var. X4 and X7 are in percentage.
      I run the code nlogit province_chosen X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11||region:Z1 Z2 Z3 Z4, base(Red River)||province_n:, noconstant case(firm_id).

      However, the model can not converge. After iteration 191, it says " cannot compute an improvement -- flat region encountered".

      I was advised to transform the data (except the dummy one) to log as it is easier to interpret and all variables will be in the same scale (percentage), then I should try each variable one by one in the model to see when the model started choking.
      For variable Z1...Z4, as they are pretty small. I multiplied them by 1 000 000 and then took the log transformation.

      I am not sure if my post is clear enough now as I am also a novice.

      Really look forward to your help.



      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double firm_id str22 province_n byte province_chosen double X1 byte(X2 X3) double X4 byte X5 double(X6 X7 X8) long X9 int X10 double(X11 Z2 Z1 Z3 Z4) str17 region
      101744855 "Ninhbinh province"      0 61.86 0  2   138.0373361392978  0    868 26.4  59.97504938143258   1376  694   731.1 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "Thanhhoa province"      0 62.46 1  1  52.893358070754246  4 1572.5 19.9  63.57634578490013  17653  319  2541.3 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "North Centrals"   
      101744855 "Quangnam province"      0 65.41 1  0   62.94837810376283  3  753.6 18.1  58.86999598339805  14886  141  1164.6 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "North Centrals"   
      101744855 "Bentre province"        0 66.69 0  0   72.03610268258544  0  236.7 12.9  64.30093944896187   1183  529  1242.7 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Mekong River"     
      101744855 "Bacninh province"       0 64.36 1  0  1548.9397459568413  0  616.9 22.4  60.53324555628703  10998 1477  1326.8 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "haiduong province"      0 60.36 1  0   923.8725838845442  0 1114.5 20.8  59.59494797752184  20288 1077  2193.5 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "namdinh province"       0 61.43 1 53   113.0099754072265  2    868 15.3 56.758215075810725  23384 1111  1914.8 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "Vinhlong province"      0 66.07 0  0  33.456064742076954  1  123.7 15.7  58.13178442201485  12057  688   737.9 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Mekong River"     
      101744855 "Phuyen province"        0 60.59 1  0  15.420546631642019  1  772.5   18  56.87748783724016   3076  180     911 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "North Centrals"   
      101744855 "Thainguyen province"    0 64.45 0  1   4621.874959892429  0  943.8   26   60.7919687674289  56854  356  1130.1 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "North Midlands"   
      101744855 "Phutho province"        0 62.55 0  0  154.79796206354897  0  608.6 21.8  60.42070500394859  17886  394    1688 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "North Midlands"   
      101744855 "Hochiminh city"         0 65.19 1  0   158.9816991596538 39 8454.8 36.6  54.22044857068422 462552 4097 14314.3 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "South East"       
      101744855 "Binhphuoc province"     0  56.7 1  0  153.69592961075796  0  186.6   14  61.24471049643926   1862  141  1040.7 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "South East"       
      101744855 "Tiengiang province"     0 61.44 0  1   115.0420811562781  2    401 10.2  62.92384975453819   9616  698    41.9 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Mekong River"     
      101744855 "Thaibinh province"      0 61.97 1  0   366.3631763119754  1  102.4 15.5  62.52302539771141  20635 1129  1426.6 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "Hungyen prvince"        0 59.09 1  0   279.0736156988338  0  694.2 19.9 60.877327212445806   1738 1265    97.4 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "Hanam province"         0 61.97 0  0    540.925064950953  0  539.8 18.1  60.49398038972321   5201  935   797.9 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "Dongnai province"       0 63.15 1  2  245.08250885339345 13   3147 21.5  57.43401711095696  24652  516  3042.7 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "South East"       
      101744855 "Tayninh province"       0 63.82 1  0   218.2669626125968  0 1000.7 14.1 61.463328005682826   1404  279  1416.5 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "South East"       
      101744855 "Longan province"        0  66.7 1  0  204.59950607561512  0  124.3 14.6  66.56199893105291   2647  333    79.1 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Mekong River"     
      101744855 "Danang city"            0 70.11 1  0   79.27334161423045  8 2728.2 40.9  55.89700216145099 105745  828    1569 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "North Centrals"   
      101744855 "Nghean province"        0 63.52 1  3   31.61252435457886  5 1972.5 19.9  61.04812697601635  43612  190  2731.3 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "North Centrals"   
      101744855 "Baria-vungtau province" 0 64.43 1  0   69.82567985431474 37  269.1 26.5  55.06535947712419   8097  556    1360 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "South East"       
      101744855 "Haiphong city"          0 65.15 1  1   458.0888509438664 43 5701.6 33.8  56.86539520448516  40262 1279  1915.4 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "Backan province"        0 58.82 0  0 .013656223324658354  0   30.5 17.2   64.5730198019802    376   67    21.3 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "North Midlands"   
      101744855 "Hanoi city"             1 64.71 1  1  109.31573758707371  0 7332.2 44.2 54.235118125092654 594898 2209    2167 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "Bacgiang province"      0  62.2 0  0   944.7606859042183  0    611 17.9  63.52723363592929   6083  430  1393.4 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "North Midlands"   
      101744855 "Quangninh province"     0 70.69 1  0   911.7441016908404 12  786.3 32.9  58.54776455451914   5297  201  1980.5 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "Vinhphuc province"      0  64.9 1  0   756.0539836231155  0  843.7 22.1  57.92496526169523  34953  874   834.9 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Red River"        
      101744855 "Binhduong province"     0 64.47 1  1  354.39337527452966  1 2050.8 18.7  72.04732013520038  32613  769  2889.3 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "South East"       
      101744855 "Dongthap province"      0 68.78 0  0  48.509745820287804  3  152.7 15.3  57.71756492930249  14461  500    1402 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Mekong River"     
      101744855 "Lamdong province"       0  63.5 0  0   19.57593373391529  0 1613.6   15  56.47855878050658  15926  133  1342.9 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "Central Highlands"
      101744855 "Quangngai province"     0 63.16 1  0  24.730290707890703  5 1339.5 18.3  59.20259987317692  10602  245  1032.3 .0025468464009463787 .0052201030775904655  .003338340437039733 .007653000298887491 "North Centrals"   
      107952393 "Binhduong province"     0 64.47 1  1  354.39337527452966  1 2050.8 18.7  72.04732013520038  32613  769  2889.3 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "South East"       
      107952393 "Longan province"        0  66.7 1  0  204.59950607561512  0  124.3 14.6  66.56199893105291   2647  333    79.1 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Mekong River"     
      107952393 "Hochiminh city"         0 65.19 1  0   158.9816991596538 39 8454.8 36.6  54.22044857068422 462552 4097 14314.3 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "South East"       
      107952393 "Bacninh province"       0 64.36 1  0  1548.9397459568413  0  616.9 22.4  60.53324555628703  10998 1477  1326.8 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107952393 "Hanoi city"             1 64.71 1  1  109.31573758707371  0 7332.2 44.2 54.235118125092654 594898 2209    2167 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107952393 "Nghean province"        0 63.52 1  3   31.61252435457886  5 1972.5 19.9  61.04812697601635  43612  190  2731.3 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "North Centrals"   
      107952393 "Phutho province"        0 62.55 0  0  154.79796206354897  0  608.6 21.8  60.42070500394859  17886  394    1688 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "North Midlands"   
      107952393 "Ninhbinh province"      0 61.86 0  2   138.0373361392978  0    868 26.4  59.97504938143258   1376  694   731.1 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107952393 "Binhphuoc province"     0  56.7 1  0  153.69592961075796  0  186.6   14  61.24471049643926   1862  141  1040.7 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "South East"       
      107952393 "Quangngai province"     0 63.16 1  0  24.730290707890703  5 1339.5 18.3  59.20259987317692  10602  245  1032.3 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "North Centrals"   
      107952393 "Tayninh province"       0 63.82 1  0   218.2669626125968  0 1000.7 14.1 61.463328005682826   1404  279  1416.5 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "South East"       
      107952393 "Vinhphuc province"      0  64.9 1  0   756.0539836231155  0  843.7 22.1  57.92496526169523  34953  874   834.9 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107952393 "Thaibinh province"      0 61.97 1  0   366.3631763119754  1  102.4 15.5  62.52302539771141  20635 1129  1426.6 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107952393 "Tiengiang province"     0 61.44 0  1   115.0420811562781  2    401 10.2  62.92384975453819   9616  698    41.9 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Mekong River"     
      107952393 "Bentre province"        0 66.69 0  0   72.03610268258544  0  236.7 12.9  64.30093944896187   1183  529  1242.7 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Mekong River"     
      107952393 "Hanam province"         0 61.97 0  0    540.925064950953  0  539.8 18.1  60.49398038972321   5201  935   797.9 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107952393 "Dongthap province"      0 68.78 0  0  48.509745820287804  3  152.7 15.3  57.71756492930249  14461  500    1402 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Mekong River"     
      107952393 "Backan province"        0 58.82 0  0 .013656223324658354  0   30.5 17.2   64.5730198019802    376   67    21.3 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "North Midlands"   
      107952393 "Quangninh province"     0 70.69 1  0   911.7441016908404 12  786.3 32.9  58.54776455451914   5297  201  1980.5 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107952393 "Baria-vungtau province" 0 64.43 1  0   69.82567985431474 37  269.1 26.5  55.06535947712419   8097  556    1360 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "South East"       
      107952393 "namdinh province"       0 61.43 1 53   113.0099754072265  2    868 15.3 56.758215075810725  23384 1111  1914.8 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107952393 "Hungyen prvince"        0 59.09 1  0   279.0736156988338  0  694.2 19.9 60.877327212445806   1738 1265    97.4 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107952393 "Danang city"            0 70.11 1  0   79.27334161423045  8 2728.2 40.9  55.89700216145099 105745  828    1569 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "North Centrals"   
      107952393 "haiduong province"      0 60.36 1  0   923.8725838845442  0 1114.5 20.8  59.59494797752184  20288 1077  2193.5 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107952393 "Quangnam province"      0 65.41 1  0   62.94837810376283  3  753.6 18.1  58.86999598339805  14886  141  1164.6 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "North Centrals"   
      107952393 "Vinhlong province"      0 66.07 0  0  33.456064742076954  1  123.7 15.7  58.13178442201485  12057  688   737.9 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Mekong River"     
      107952393 "Bacgiang province"      0  62.2 0  0   944.7606859042183  0    611 17.9  63.52723363592929   6083  430  1393.4 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "North Midlands"   
      107952393 "Phuyen province"        0 60.59 1  0  15.420546631642019  1  772.5   18  56.87748783724016   3076  180     911 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "North Centrals"   
      107952393 "Thanhhoa province"      0 62.46 1  1  52.893358070754246  4 1572.5 19.9  63.57634578490013  17653  319  2541.3 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "North Centrals"   
      107952393 "Lamdong province"       0  63.5 0  0   19.57593373391529  0 1613.6   15  56.47855878050658  15926  133  1342.9 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Central Highlands"
      107952393 "Dongnai province"       0 63.15 1  2  245.08250885339345 13   3147 21.5  57.43401711095696  24652  516  3042.7 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "South East"       
      107952393 "Thainguyen province"    0 64.45 0  1   4621.874959892429  0  943.8   26   60.7919687674289  56854  356  1130.1 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "North Midlands"   
      107952393 "Haiphong city"          0 65.15 1  1   458.0888509438664 43 5701.6 33.8  56.86539520448516  40262 1279  1915.4 .0048759086057543755  .008318087086081505  .009550430811941624 .010374494828283787 "Red River"        
      107997813 "Danang city"            0 70.11 1  0   79.27334161423045  8 2728.2 40.9  55.89700216145099 105745  828    1569  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "North Centrals"   
      107997813 "Thainguyen province"    0 64.45 0  1   4621.874959892429  0  943.8   26   60.7919687674289  56854  356  1130.1  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "North Midlands"   
      107997813 "Phuyen province"        0 60.59 1  0  15.420546631642019  1  772.5   18  56.87748783724016   3076  180     911  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "North Centrals"   
      107997813 "Longan province"        0  66.7 1  0  204.59950607561512  0  124.3 14.6  66.56199893105291   2647  333    79.1  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Mekong River"     
      107997813 "Binhduong province"     0 64.47 1  1  354.39337527452966  1 2050.8 18.7  72.04732013520038  32613  769  2889.3  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "South East"       
      107997813 "Bacninh province"       0 64.36 1  0  1548.9397459568413  0  616.9 22.4  60.53324555628703  10998 1477  1326.8  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      107997813 "Quangninh province"     0 70.69 1  0   911.7441016908404 12  786.3 32.9  58.54776455451914   5297  201  1980.5  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      107997813 "Vinhphuc province"      0  64.9 1  0   756.0539836231155  0  843.7 22.1  57.92496526169523  34953  874   834.9  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      107997813 "Baria-vungtau province" 0 64.43 1  0   69.82567985431474 37  269.1 26.5  55.06535947712419   8097  556    1360  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "South East"       
      107997813 "Hungyen prvince"        0 59.09 1  0   279.0736156988338  0  694.2 19.9 60.877327212445806   1738 1265    97.4  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      107997813 "Phutho province"        0 62.55 0  0  154.79796206354897  0  608.6 21.8  60.42070500394859  17886  394    1688  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "North Midlands"   
      107997813 "Dongnai province"       0 63.15 1  2  245.08250885339345 13   3147 21.5  57.43401711095696  24652  516  3042.7  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "South East"       
      107997813 "Ninhbinh province"      0 61.86 0  2   138.0373361392978  0    868 26.4  59.97504938143258   1376  694   731.1  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      107997813 "Vinhlong province"      0 66.07 0  0  33.456064742076954  1  123.7 15.7  58.13178442201485  12057  688   737.9  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Mekong River"     
      107997813 "Bentre province"        0 66.69 0  0   72.03610268258544  0  236.7 12.9  64.30093944896187   1183  529  1242.7  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Mekong River"     
      107997813 "Lamdong province"       0  63.5 0  0   19.57593373391529  0 1613.6   15  56.47855878050658  15926  133  1342.9  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Central Highlands"
      107997813 "Hochiminh city"         0 65.19 1  0   158.9816991596538 39 8454.8 36.6  54.22044857068422 462552 4097 14314.3  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "South East"       
      107997813 "namdinh province"       0 61.43 1 53   113.0099754072265  2    868 15.3 56.758215075810725  23384 1111  1914.8  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      107997813 "Thaibinh province"      0 61.97 1  0   366.3631763119754  1  102.4 15.5  62.52302539771141  20635 1129  1426.6  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      107997813 "Thanhhoa province"      0 62.46 1  1  52.893358070754246  4 1572.5 19.9  63.57634578490013  17653  319  2541.3  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "North Centrals"   
      107997813 "Tayninh province"       0 63.82 1  0   218.2669626125968  0 1000.7 14.1 61.463328005682826   1404  279  1416.5  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "South East"       
      107997813 "Binhphuoc province"     0  56.7 1  0  153.69592961075796  0  186.6   14  61.24471049643926   1862  141  1040.7  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "South East"       
      107997813 "Bacgiang province"      0  62.2 0  0   944.7606859042183  0    611 17.9  63.52723363592929   6083  430  1393.4  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "North Midlands"   
      107997813 "Nghean province"        0 63.52 1  3   31.61252435457886  5 1972.5 19.9  61.04812697601635  43612  190  2731.3  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "North Centrals"   
      107997813 "Backan province"        0 58.82 0  0 .013656223324658354  0   30.5 17.2   64.5730198019802    376   67    21.3  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "North Midlands"   
      107997813 "Dongthap province"      0 68.78 0  0  48.509745820287804  3  152.7 15.3  57.71756492930249  14461  500    1402  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Mekong River"     
      107997813 "Quangnam province"      0 65.41 1  0   62.94837810376283  3  753.6 18.1  58.86999598339805  14886  141  1164.6  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "North Centrals"   
      107997813 "Quangngai province"     0 63.16 1  0  24.730290707890703  5 1339.5 18.3  59.20259987317692  10602  245  1032.3  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "North Centrals"   
      107997813 "haiduong province"      0 60.36 1  0   923.8725838845442  0 1114.5 20.8  59.59494797752184  20288 1077  2193.5  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      107997813 "Tiengiang province"     0 61.44 0  1   115.0420811562781  2    401 10.2  62.92384975453819   9616  698    41.9  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Mekong River"     
      107997813 "Hanoi city"             1 64.71 1  1  109.31573758707371  0 7332.2 44.2 54.235118125092654 594898 2209    2167  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      107997813 "Haiphong city"          0 65.15 1  1   458.0888509438664 43 5701.6 33.8  56.86539520448516  40262 1279  1915.4  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      107997813 "Hanam province"         0 61.97 0  0    540.925064950953  0  539.8 18.1  60.49398038972321   5201  935   797.9  .002417945768684149  .013264483772218227 .0034943532664328814  .01645873300731182 "Red River"        
      108058196 "Ninhbinh province"      0 61.86 0  2   138.0373361392978  0    868 26.4  59.97504938143258   1376  694   731.1 .0023350135888904333  .003866626648232341 .0026982827112078667 .003946760203689337 "Red River"        
      end



      [/CODE]

      Comment


      • #4
        The title of the thread and the question in #1 are general enough that people may come back to it. I am optimistic that a general answer could be of moderately wide interest -- and welcome complementary or contradictory replies.

        That is for this post.

        Then I hope to come back to the quite specific details now revealed in #3 in a later post.

        The context I take to be any context where working on logarithmic scale seems likely to be a good idea but zeros are present. The topic is enormous, so I leave aside the even more challenging question in which negative values are also present!

        To be facetious but serious at the same time, logarithms seem natural if nature or society seems to be using exp() as a good first approximation -- except that nature and society are usually more complicated than that. Even in physics exponential growth or decay is often at best a good phenomenological approximation to stuff more complicated than it may seem.

        The answers understandably vary with the details, such as

        1. Is a variable discrete or continuous?

        2. Is a variable an outcome or a predictor?

        3. Is the goal at the moment exploration or modelling?

        4. Is the perceived problem one or more of

        + skewness

        + outliers

        + heteroscedaticity

        + nonlinearity

        (not intended as a complete list, but I would say that getting closer to a defensible functional form is often far more important than anything else)

        5. How far is the readership (and even the researcher) comfortable with transformations and able to defend/explain/understand what is going on?

        Here is a series of statements not intended to be dogmatic. The spirit is pragmatic not purist. The question is what works as well or better as anything else we can imagine

        If plotting on a logarithmic scale works well, except for the zeros, I am moderately happy to plot using a scale such as log (y + 1) or more generally log (y + c). Here the notation y isn't meant to imply "outcomes only". The question why that c and not somethng else is a good one, and some answers are (a) c = 1 sometimes seem to work well for counts, and sometimes for measured variables too (b) c as half the smallest observable positive quantity is a convention is some fields

        A warning: log (y + very small quantity) is close to log y for y >> 0 but utterly awful for y near zero!

        Another warning: log (y + 1) if the units are dollars and log (y + 1) if the units are million dollars are quite different transformations, as some economists have had occasion to underline.

        Instead of the above, plotting using square roots (a long history of this over a century or more) or cube roots may or may not appeal as alternatives. (Those who want to mutter ad hoc could be right. A tongue-in-cheek translation of ad hoc is fit for purpose.)

        If an observed outcome includes 0 values, the best answer is often Poisson regression, with no need for transformation at all. See also generalized linear models, etc.

        Sometimes, zeros are qualitatively different as well as quantitatively in which case some kind of two-part model often fits the substantive logic, as in modelling first whether someone is a smoker, and then if they do smoke how much they do.

        A trick I have seen mentioned for predictors that call out for logarithmic transformation except for some zeros:

        * replace 0 with the smallest positive amount but makes sense

        * but also use a (0, 1) indicator for observations so modified

        I have never seen this in literature: any references, any one?

        Around 1950, there were many clever papers by statisticians such as Francis Anscombe with the flavour if sqrt (y) or log (y) is a good idea is sqrt (y + c) or log (y + c) better? The specific problem was often getting closer to a normal distribution. Enthusiasm for such fudges seems to have waned, on various grounds, the most important being that the exact character of marginal distributions is usually not that important. After all a (0, 1) predictor can't help being skew if it is and no transformation will do anything but flip the sign of the skewness. (But outliers can matter!)

        There is more to be said, but I will stop there. In summary, I am willing to use log (y + c) for visualization but much more reluctant to use it in modelling.

        EDIT A partial answer to Clyde Schechter is that visualizations let you see if results are just an artefact of some arbitrary choice, so don't do that then!

        Leaving the variable as it comes -- because no transformation seems defensible -- is also a choice that carries the risk of a poor analysis.

        Last edited by Nick Cox; 07 Feb 2023, 03:19.

        Comment


        • #5
          Now to #3. Advice has to be qualified given

          * (in my case) not being very familiar with the model here

          * the data being presumably just a small sample

          * subject-matter experts (not me either) might need to know more about what the variables are measuring to give advice

          Clearing aside one detail. log(1 million X) is just log (1 million) + log X and unlikely to be much of an improvement on log X.

          I used transplot from SSC to look at X3 and X5. For more see

          https://www.statalist.org/forums/for...dable-from-ssc

          The use of a normal distribution as reference is just a psychological convenience and -- as above in #4 -- not an implication that normal distributions are attainable or even desirable (just, like affection and respect, welcome if you encounter them). You can see that just about any of the transformations shown -- square root, cube root, log(x + 1) -- pulls in the outliers somewhat and might help. I don't think the evidence is that clearcut. I might even leave X3 and X5 as they arrive.


          Click image for larger version

Name:	X3.png
Views:	2
Size:	27.9 KB
ID:	1700623



          Click image for larger version

Name:	X5.png
Views:	1
Size:	28.6 KB
ID:	1700624


          Next, just some quantile plots of Z1 to Z4. I can't think that these variables would be responsible for the model not converging. I think the problem lies elsewhere. Why so few distinct values?
          Click image for larger version

Name:	Z.png
Views:	1
Size:	35.9 KB
ID:	1700625
          Attached Files

          Comment


          • #6
            So, it seems that the real goal here is to find a way to get the model to converge.

            I was advised to transform the data (except the dummy one) to log as it is easier to interpret and all variables will be in the same scale (percentage),
            This sounds like bad advice to me. Yes, if the variable is always positive, and if the coefficient turns out to be close to zero, then you can interpret the coefficient as a good approximation to the absolute difference in the outcome (or, in the case of a logit model, the log-odds of outcome) associated with a 1% change in the independent variable. But that does not work when the untransformed independent variable can be zero, because a 1% change from zero still leaves you at zero! In fact any percentage change from zero still leaves you at zero--which is closely related mathematically to why the logarithm of 0 is undefined.

            Moreover, if the real goal is to get the model estimation to converge, I doubt that a log transform (or log(1+x)) will help). That kind of transformation is useful when you have independent variables that differ greatly in order of magnitude. And rescaling can sometimes solve that problem. But look:
            Code:
            . summ X* Z*
            
                Variable |        Obs        Mean    Std. dev.       Min        Max
            -------------+---------------------------------------------------------
                      X1 |        100     63.6396    2.977015       56.7      70.69
                      X2 |        100         .66    .4760952          0          1
                      X3 |        100           2     9.04534          0         53
                      X4 |        100    416.1968    821.8734   .0136562   4621.875
                      X5 |        100         5.4    11.33512          0         43
            -------------+---------------------------------------------------------
                      X6 |        100    1465.519    1970.614       30.5     8454.8
                      X7 |        100      21.399    8.017412       10.2       44.2
                      X8 |        100    60.02085    3.687472   54.22045   72.04732
                      X9 |        100    48699.95    124680.2        376     594898
                     X10 |        100      746.14    754.8894         67       4097
            -------------+---------------------------------------------------------
                     X11 |        100    1761.657    2349.591       21.3    14314.3
                      Z2 |        100    .0032708    .0011335    .002335   .0048759
                      Z1 |        100    .0088835    .0033514   .0038666   .0132645
                      Z3 |        100    .0054334    .0029055   .0026983   .0095504
                      Z4 |        100    .0114199    .0037579   .0039468   .0164587
            The only variables that seem to be seriously differently scaled from the others are the Z's, X9 and, possibly X6. So I would first try just scaling down X9, say dividing it by 1000. If that doesn't help, the more radical compression provided by log transforming (which is not problematic for X9 as it is bounded away from 0) might be more effective. Scaling up the Z's by multiplying them by 1,000 or perhaps a somewhat greater scale factor (as you have already mentioned doing), might also help. My next step would be rescaling, or, if need be, log-transforming X6. I don't think it is sufficiently different in scale from the other variables to cause convergence issues, but, if you don't get convergence with the other measures, it is worth trying. And a last resort variable to look at for rescaling or log transformation would be X4. The variables that take on zero values are X2, X3, and X5. X2 is a 0/1 dichotomy, and X3 and X5 have pretty tame maximum values, and while they are pretty skewed, especially X3, I don't think that they are likely to be causing convergence issues.

            Note: All of the above suggestions are conditional on the distributions of these variables in the full data set being similar to what is observed in the example data. Also, like Nick, I am not familiar with -nlogit- and there could be special considerations for estimating its parameters that I am unaware of.

            Look, I could be wrong about all of these judgments. Fixing convergence problems is more an art than science, and even when you have a lot of experience dealing with them, a lot of trial and error is involved. And sometimes, you have to give up and just drop some variables if they are too problematic and you can't find a fix for them. But I think log(C+x) transforming X3 and X5 are among the things least likely to help with the convergence problem.

            I should also add that the specific non-convergence here is that Stata has gotten stock in a flat region. That may mean that the model is simply unidentifiable. So before manipulating the variables, O.P. should invest time thinking about whether there are, in fact, relationships among the chosen variables that make the model unidentifiable. That can only be done by somebody who knows what the variables are and understands the connections that may exist among them, and is familiar with the nested logit model being estimated. I don't qualify on any of those criteria.

            Nick Cox The small number of values of the Z variables is probably because they are invariant within firm_id. And with the full data sample, there may well be a large number of firms so that even the Z variables will have an appreciable number of different values.

            Comment


            • #7
              Hi Nick Cox and Clyde Schechter. Thank you so much for your thoughtful and considerate answers, which gives me further direction to explore. I really appreciate your help.

              Comment


              • #8
                https://stats.stackexchange.com/ques...can-faithfully is a thread in which I used log(1 + y) for visualization, and indeed it is far from the only choice.

                Comment


                • #9
                  Thank you Nick!

                  Comment

                  Working...
                  X