Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wrong code for generating variables

    Dear all,

    My data consists of panel data (unbalanced). Period 2011-2020. Belgian firms.

    For one of my independent variable, I have created a dummy that indicates if there has been a decrease in productivity for following 2 years (value = 1). To control for even more strictness, I have created the same dummy but this time it indicates when there has been a decrease in productivity for 3 following years (value = 1). As you can see below, the first observation always has to be blank. Simply because there is no way to create a dummy here when there is only 1 value of productivity.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID float(Productiviteit_w5 DalingProductiviteit2J DalingProductiviteit3J)
    1   4587718 . .
    1   4587718 0 0
    1   4587718 0 0
    1   4587718 0 0
    1   4587718 0 0
    1   4587718 0 0
    1   4587718 0 0
    1   4587718 0 0
    1   4587718 0 0
    1   4587718 0 0
    2  158699.6 . .
    2  51460.25 0 0
    2  51460.25 0 0
    2  51460.25 0 0
    2  51460.25 0 0
    2   4587718 0 0
    2   4587718 0 0
    2   4587718 0 0
    2   4587718 0 0
    2   4587718 0 0
    3  634812.6 . .
    3  763892.2 0 0
    3  875231.2 0 0
    3  962520.2 0 0
    3  993217.7 0 0
    3 1248302.9 0 0
    3 2232884.5 0 0
    3 2598562.5 0 0
    3   3146280 0 0
    3 3596431.5 0 0
    4   4587718 . .
    4   4587718 0 0
    4   4587718 0 0
    4   4587718 0 0
    4   4587718 0 0
    4   4587718 0 0
    4   4587718 0 0
    4   4587718 0 0
    4   4587718 0 0
    4   4587718 0 0
    5   2506160 . .
    5   2446999 0 0
    5 2313214.8 1 0
    5 2397515.3 0 0
    5 2179901.3 0 0
    5 2510287.5 0 0
    5 2437122.5 0 0
    5 2380639.3 1 0
    5 2267758.8 1 1
    5 2345807.8 0 0
    end
    I have created this dummy with following commands:
    Code:
    gen DalingProductiviteit2J = 0
    replace DalingProductiviteit2J = . if Productiviteit_w5 == .
    bysort ID (Jaar): replace DalingProductiviteit2J = 1 if Productiviteit_w5 < L.Productiviteit_w5 & L.Productiviteit_w5 < L2.Productiviteit_w5 & ID==L.ID & ID==L2.ID
    gen DalingProductiviteit3J = 0
    replace DalingProductiviteit3J = . if DalingProductiviteit2J == .
    replace DalingProductiviteit3J = 1 if DalingProductiviteit2J == 1 & L.DalingProductiviteit2J == 1
    Sorry for the variable names being in Dutch, but the idea behind this dummy should be clear. If not, please let me know.


    As a robustness check, I have defined "productivity" in a similar way & I tried to create the exact same dummy as above, but with just a different definition of "productivity". This time I looked at operating revenue per employee:
    Code:
    gen DalingBedrijfsresultaatperwn2J = 0
    replace DalingBedrijfsresultaatperwn2J = . if Bedrijfsresultaatperwn_w1 == .
    bysort ID (Jaar): replace DalingBedrijfsresultaatperwn2J = 1 if Bedrijfsresultaatperwn_w1 < L.Bedrijfsresultaatperwn_w1 & L.Bedrijfsresultaatperwn_w1 < L2.Bedrijfsresultaatperwn_w1 & ID==L.ID & ID==L2.ID
    gen DalingBedrijfsresultaatperwn3J = 0
    replace DalingBedrijfsresultaatperwn3J = . if DalingBedrijfsresultaatperwn2J == .
    replace DalingBedrijfsresultaatperwn3J = 1 if DalingBedrijfsresultaatperwn2J == 1 & L.DalingBedrijfsresultaatperwn2J == 1
    Problem: The dummy I generated first clearly knows that the first value of each firm (ID) should be empty. As you can see below, the second dummy (with the exact same code) puts "0" in the first observation. This is wrong. But I just can't seem to find the problem here... Anyone else that sees the problem?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID float(Bedrijfsresultaatperwn_w1 DalingBedrijfsresultaatperwn2J DalingBedrijfsresultaatperwn3J)
    1  64952.91 0 0
    1  99873.05 0 0
    1  68731.36 0 0
    1  76606.58 0 0
    1  75386.92 0 0
    1 102321.81 0 0
    1 102532.42 0 0
    1 123674.22 0 0
    1 108004.75 0 0
    1 115439.52 0 0
    2 14075.665 0 0
    2  49803.37 0 0
    2  52598.84 0 0
    2  52297.27 0 0
    2  57116.21 0 0
    2  34775.42 0 0
    2  45138.53 0 0
    2  44305.94 0 0
    2  164799.1 0 0
    2  86502.11 0 0
    3  421836.3 0 0
    3 466955.25 0 0
    3 571562.75 0 0
    3  553545.3 0 0
    3 510486.25 1 0
    3  824580.4 0 0
    3  595216.2 0 0
    3  684050.1 0 0
    3  781644.9 0 0
    3  851280.6 0 0
    4 233238.34 0 0
    4    303409 0 0
    4  242142.4 0 0
    4 180557.23 1 0
    4  392744.7 0 0
    4  386430.1 0 0
    4  354275.7 1 0
    4 188446.25 1 1
    4 129381.95 1 1
    4  59840.48 1 1
    5 236874.33 0 0
    5  159015.8 0 0
    5 120722.27 1 0
    5 108236.05 1 1
    5  13566.07 1 1
    5  81998.32 0 0
    5  45384.85 0 0
    5  -97713.3 1 0
    5  26576.12 0 0
    5  98425.99 0 0
    end
    Kind regards,
    Jordi

  • #2
    Using Stata 16 (Windows 11)

    Comment


    • #3
      Fixed it. Thanks anyway.

      Comment


      • #4
        I am surprised the code worked in the first instance. When I run the code on your example data (I have to generate Jaar because you neglected to include it, and I add the necessary xtset command so the lag operators work) I get 0 in the first Jaar for each ID.

        Nothing in your code will cause the value to be missing in the first Jaar for each ID. So you need to add something like
        Code:
        bysort ID (Jaar): replace DalingProductiviteit2J = . if _n==1
        to create a missing value in the first Jaar for each ID.

        Comment


        • #5
          Thank you very much William.

          Comment

          Working...
          X