Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting xlsx files to one single dta file after calculation

    Dear all,
    I have different excel files stored as “xlsx files”. These files have the same structure as below.
    The files are named an identifier (ISIN), followed by “_1”. There also files with identifiers and stored with _2 _3 _4, but I don’t have to consider them. All the files are in a specific pathfolder.
    What I need to do is the following.
    • Open all of them and create a new column which is the absolute difference between “close” and “open”
    • From this column I need to take:1. the minimum value excluding 0. So, actually I need to take the second minimum difference 2. Average (excluding when again the absolute difference is 0) 3. Median (excluding 0)
    • These new values, together with the ISINs should go to a new stata dataset.
    So, the final purpose is a dataset like the following
    ISIN Value Second minimum difference Mean Median
    XXXX
    YYYY
    ZZZ


    Click image for larger version

Name:	Structure.jpg
Views:	1
Size:	178.5 KB
ID:	1641848


  • #2
    Sorry, but I don't understand what you want here:

    From this column I need to take:1. the minimum value excluding 0. So, actually I need to take the second minimum difference
    Those two things are not the same unless for each ISIN there is one and only one observation where open = close. If some ISIN has no such days, then the first minimum is not zero. And if there are two or more such days, then even the second minimum is zero. Please clarify what you actually want here. Do you want the minimum excluding any zeroes (however many there happen to be, if any, ) or do you want the second minimum (which may sometimes itself be zero). In the code below, I assume you want all three statistics calculated by excluding any zeroes.

    Code:
    clear*
    
    filelist, pattern(*_1.xlsx) norecursive
    
    capture program drop one_isin
    program define one_isin
        local isin = subinstr(filename[1], "_1.xlsx", "", .)\
        local filename = filename[1]
        import excel using `filename', clear firstrow case(lower)
        gen delta = abs(open - close)
        drop if delta == 0
        summ delta, detail
        local minimum = r(min)
        local mean = r(mean)
        local median = r(p50)
        clear
        gen isin = `"`isin'"'
        gen minimum = `minimum'
        gen mean = `mean'
        gen median = `median'
        exit
    end
    
    runby one_isin, by(filename) status
    As I do not have a set up of files analogous to what you describe, this code is untested. So you may have to debug it a little, but it'll point you in the right direction. To use this code you will need to user-written programs. -runby- is written by Robert Picard and me. -filelist- is written by Robert Picard. Both are available at SSC.

    Comment


    • #3
      Dear Clyde Schechter
      Many thanks for your guidance and apologies for getting back late.
      Thanks for pointing out the clarifications. As you highlighted I would like to have all three statistics calculated by excluding any zeroes.

      I adopted the solution that you suggest to me. Nevertheless, I get an error message as the one below. I attached a screenshoot for the purpose of understanding what is going on.
      I tried to explore
      Code:
      help runby
      but I don't get what could be the driver.

      Do you any suggestions? I might be doing a silly mistake.
      I basically use all your codes just specifying the folder with
      Code:
      cd"path folder"

      at the beginning of the code (after clear all)

      Click image for larger version

Name:	error.png
Views:	1
Size:	13.2 KB
ID:	1642180

      Last edited by Marco Errico; 23 Dec 2021, 12:40.

      Comment


      • #4
        OK, one thing that -runby- does, by default, is suppress all output, including error messages, from the program it runs. So there is something wrong in program one_isin.

        The general approach to finding it is to add the -verbose- option to the -runby- command. Then you will get output, including error messages. (With 1,992 groups the output is likely to be enormous, and since apparently the error, whatever it is, applies to all 1,992 files, likely you can get all the helpful information if you first drop all but a small number of observations from the data set before the -runby-. The error messages you get this way will probably help you identify what is going wrong inside one_isin.

        However, in this case, I can probably spare you that trouble because I see an error already. The \ character at the end of the -local isin = subinstr(...- command is a typo and is causing an error. Remove that, and, if there are no other errors, it should be OK. I'm not sure how that \ character got there. It does not appear in my original code stored on my computer. I must have accidently hit the \ key at some point while I was reviewing what I wrote before posting it. Anyway, sorry about that. If you still get errors, try the -verbose- option described in the preceding paragraph to help diagnose the problem.

        Comment


        • #5
          Thank you for your time and patience, Clyde Schechter .
          I modified the codes by removing \ as you suggested and I see that the codes are working. Nevertheless, I get this output
          Click image for larger version

Name:	1png.png
Views:	1
Size:	16.1 KB
ID:	1642234





          Then I removed all the files, so just keeping one file and using the -verbose- option. Nevertheless it gives me this output which is puzzling to me
          Click image for larger version

Name:	2.png
Views:	1
Size:	26.4 KB
ID:	1642235




          I'm wondering what could be the problem. Do you think is the xlsx file causing problem? Maybe stata is not correctly reading the name of the variables. In this case I can try to create a loop to convert all of them in csv first.
          But it might be just a silly point the mine
          Last edited by Marco Errico; 24 Dec 2021, 03:05.

          Comment


          • #6
            Oh, I think I see it. In program -one_isin-:

            Code:
            program define one_isin
                local isin = subinstr(filename[1], "_1.xlsx", "", .)\
                local filename = filename[1]
                import excel using `filename', clear firstrow case(lower)
                gen delta = abs(open - close)
                drop if delta == 0
                summ delta, detail
                local minimum = r(min)
                local mean = r(mean)
                local median = r(p50)
                clear
                set obs 1
                gen isin = `"`isin'"'
                gen minimum = `minimum'
                gen mean = `mean'
                gen median = `median'
                exit
            end
            Sorry, I should have seen that sooner.

            Comment


            • #7
              Dear Clyde Schechter. No sorry at all. I
              I'm really grateful for the guidance that I've been receiving. Truly thank you.

              I've modified the command as you suggested but still get an error message as below. I assume is related to -runby- but is puzzling for me to understand what could be the driver.

              Click image for larger version

Name:	runby.png
Views:	1
Size:	12.7 KB
ID:	1642293




              I recognize that it might be difficult to work on it without having a sample. Therefore I upload a .rar file where there are some random files picked from my folder. Here is the link
              https://we.tl/t-U5EPTeGVI3

              I quickly went through the forum guidelines and policies, but not sure whether is allowed or not. In case it goes against any policies, may accept my apologies in advance. I've just done it for the purpose of better solving this issue.

              Again, truly thank you for your help and assistance.

              Comment


              • #8
                I notice that that unwanted \ character at the end of the -local isin = ...- command is back in what you show in #6. I think that just got copied over from the error in #2. Anyway, get rid of that \.


                If that doesn't solve the problem, please re-run adding the -verbose- option to the -runby- command so we can see where the error is coming from. The error is arising inside program one_isin, but without the -verbose- option, the error messages are suppressed. Again, so you do not get overwhelming amounts of output and since apparently every single iteration is provoking an error, before the -runby- command I suggest you reduce the data set to a small number of observations (each observation at that point corresponds to a single .xlsx file). It is a fairly safe bet that the same error is cropping up repeatedly.

                I notice that that unwanted \ character at the end of the -local isin = ...- command is back in what you show in #6. I think that just got copied over from the error in #2. Anyway, get rid of that /.

                The forum guidelines discourage attachments for a number of reasons. I appreciate your intent in providing example data. But, as a matter of safety, I do not download or open any attachments from people I don't know. I'm sorry, but we're going to have to try to figure this out without example data.

                Comment


                • #9
                  Thanks Clyde Schechter and sorry for getting back late.
                  The codes are working properly and indeed the error was the / character was the final problem.

                  Thanks again for your guidance, I will explore more runby command that can be really helpful in the future.

                  Comment

                  Working...
                  X