Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert 'accessibility' driving time to logarithmic variable, but it contains zeros

    Hello everyone,

    Currently I am running a convergence regression. I want to include a variable on the accessibility of a region. The data I have contain several variables on accessibility, all which are driving time in minutes by car to the next high-way access, train station, airport or to several measures of agglomeration centers. However, some values are zero, thus indicating that there is on average no driving time.

    The model I have is a log-log model. Of course, except for the dummy variables all variables have been converted to logarithmic function. However, of course when I transform the accessibility variables which contain a value of 0 it generates the 'missing values'.

    My question is, what should I do?
    A. Nothing and leave the 'missing values' out, but that would mean that all the regions which are very accessible are not included in the regression.
    B. Replace all the missing values for 0, thus "replace lnACC_MAJOR = 0 if (lnACC_MAJOR == .)", however that does not seem like a good practice to me.
    C. I should not convert the accessibility variables to logarithms.
    D. Other, .....

    Additionally, I want to include commuter flows, however I try to include net flow, which results in negative values for certain regions, thus higher outflow than inflow of commuters. However, negative values cannot be converted into logs. What should I do here?

    Thanks in advance.
    Last edited by Jantje Beton; 23 Dec 2015, 03:20.

  • #2
    D. This variable being recorded as zero is a measurement problem. You need some systematic way to impute positive values, e.g. based on the areal extent of each place. Absent a good method, the least bad guess for zeros is possibly half the smallest positive value observed otherwise.

    On B: Replacing missings for logarithms with 0 implies that the original value is 1 (minute), which sounds like a way to create massive outliers. For example, suppose a short journey is 10 minutes and a long one 100 minutes. In general, plot the logged values to see what your results look like.

    Comment


    • #3
      Thanks for answering my question. For point D, I replaced all zeros for 0.5 and then converted it to a log. Considering point B, I made a scatterplot of it, which of course showed a lot of outliers. However, it turned out that some measures turned out to have an insignificant effect and due to the availability of other variables concerning accessibility I decided to use others.

      But, what remains is the net commuter flows and migration flows, which of course can take negative values and negative values cannot be converted in a log function. Is there a practice which I can use in order to include this data in my log-log regression model? Or should I include two separate variables, one with the inflow and one with the outflow?

      Thanks in advance.

      Comment


      • #4
        It's hard to see that a log-log model makes any sense if variables on any side of the equation can be negative. There isn't a fudge around that.

        Even on travel time, are you implying that the shortest journeys take 30 seconds = 0.5 minute?

        Comment


        • #5
          Considering the travel time it is indeed interesting that there were values of 0 or the 0.5 by myself. This basically means that everyone lives next to a train station or highway access. That's the reason why I have decided to use another measure instead.

          Considering migration and commuting, I want to know whether that had an effect on regional economic performance. These are just two variables I try to include, especially migration in the case migration from east to west Germany in the years after the fall of the Berlin Wall.

          The simplest version is - reg lnGDP (from '96 to '12) lnGDP96 (GDP in '96) EAST (East Germany) and then something considering migration of commuting.

          Comment


          • #6
            What you are trying to do is not especially clear to me. The thread started with an example on travel time (my guess: data are for individuals or small places) and now appears to be about predicting GDP (my guess: data are for regions or countries).

            But it seems that you have grounds for transforming just some of your variables. I wouldn't call that a log-log model. The question then is: why do you think you need to transform net commuter flows and migration flows?

            Transformations that "pull in" extreme values, work symmetrically on absolute values but also preserve sign and behave smoothly around zero include

            1. cube roots and more generally odd integer roots

            2. inverse hyperbolic sine

            3. neglog

            More in transint (SSC). (The last public version is not the latest version in my files, but it should give you a start.)
            Last edited by Nick Cox; 30 Dec 2015, 08:49.

            Comment

            Working...
            X