Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate new variable which the difference of the values from the previous variable

    Dear Stata Users,

    I want to create a new variable as "Profit" based on the previous variable that includes both "Revenue" and "Cost".

    It is too laborious to do it manually because the number of observations is more than 2 lakh.

    For example:
    ID Code (Revenue=1, Cost=2) Amount (Rs) Profit=Cost-Revenue
    12 1 1000 500
    12 2 1500 .
    13 1 500 500
    13 2 1000 .
    14 1 500 -400
    14 2 100 .
    15 2 500 .
    15 1 300 200

    Thank you very much

  • #2
    1 lakh = 100,000

    Millions of people know that, but perhaps not everyone on Statalist. (I had to look it up; I was remembering 1000, which was wrong.)

    We get the point, either way.

    More crucially, your data example works, just, but please note FAQ Advice #12.

    This works for your example, but depending on what other variables you have, you may benefit from reshape wide.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id coderevenue1cost2) int(amountrs profitcostrevenue)
    12 1 1000  500
    12 2 1500    .
    13 1  500  500
    13 2 1000    .
    14 1  500 -400
    14 2  100    .
    15 2  500    .
    15 1  300  200
    end
    
    bysort id (code) : gen wanted = amount[2] - amount[1] if _n == 1
    
    list, sepby(id)
    
         +----------------------------------------------+
         | id   codere~2   amountrs   profit~e   wanted |
         |----------------------------------------------|
      1. | 12          1       1000        500      500 |
      2. | 12          2       1500          .        . |
         |----------------------------------------------|
      3. | 13          1        500        500      500 |
      4. | 13          2       1000          .        . |
         |----------------------------------------------|
      5. | 14          1        500       -400     -400 |
      6. | 14          2        100          .        . |
         |----------------------------------------------|
      7. | 15          1        300        200      200 |
      8. | 15          2        500          .        . |
         +----------------------------------------------+
    More paranoid code would be

    Code:
    if _n == 1 & code[1] == 1 & code[2] == 2
    Last edited by Nick Cox; 10 Jan 2025, 06:03.

    Comment


    • #3
      The profit is typically the revenue minus the cost, so that is what I will use instead of the cost minus the revenue.


      Code:
      // load the example data
      clear all
      input id     code amount
      12     1     1000    
      12     2     1500    
      13     1     500    
      13     2     1000    
      14     1     500    
      14     2     100    
      15     2     500    
      15     1     300    
      end
      
      label define code_lb  1 "Revenue" ///
                            2 "Cost"
      label values code code_lb
      
      // reshape the dataset
      reshape wide amount, i(id) j(code)
      rename amount1 revenue    
      rename amount2 cost    
      
      // create the profit
      gen profit = revenue - cost
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        I should have spotted the point about Cost and Revenue. I imagine that it's the right way round in the real data and this is just a fake example.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          I should have spotted the point about Cost and Revenue. I imagine that it's the right way round in the real data and this is just a fake example.
          Probably true. This is the main reason why I like to rename variables to something meaningful even if it is technically not necessary. It is so much easier to spot the mistake in gen profit = cost - revenue compared to gen profit = amount2 - amount1. I guess that this is the second time today that we are talking about debugging...
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            I seem to remember using Minitab in the previous millennium when you could call your columns (variables) C1 C2 and the like but that was it. Minitab was moderately programmable otherwise; I remembering implementing PCA before there was a dedicated command (or function, whichever it was).

            Apologies to ancient Minitab if I'm remembering incorrectly.

            Comment


            • #7
              Nick Cox and Maarten Buis I'm sorry for writing the profit formula incorrectly.

              Comment

              Working...
              X