Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • data in columns

    Hello,

    I have excel data arranged in this way:

    aut81 ita86 can94
    varA varB varA
    1 2 7
    varB varC varC
    varC varD 12
    2 4 varB
    varD varE 9
    3 8 varF
    varE varF 2
    varF 10
    8

    and I would like to manipulate them to obtain this result.

    datasource varA varB varC varD varE varF
    aut81 1 2 3 8
    ita86 2 4 8 10
    can94 7 9 12 2


    Any help is really appreciated. Thank you.

  • #2
    The short answer is that you will probably be helped by using a dictionary. See more info an example (especially example 3) here: https://www.stata.com/support/faqs/d...d-format-data/
    I do have to say that deciphering what bit goes where is still a bit of a puzzle, even after seeing the resulting output.

    Do you have many different sheets that are in the same format? Or are there more observations in the same sheet?


    Also, in future posts, please use dataex for posting example data. Makes life a lot easier for people trying to answer your questions, and improves the chances they will take the effort. Read more about this in the FAQ: https://www.statalist.org/forums/help#stata

    Comment


    • #3
      Dear Jorrit Gosens, thank you very much for your reply, I will look at the link, and certainly will remember to use dataex in the future. At the moment, I attach the simple example in the excel spreadsheets, one as it is, one as I would like it to be. I have one single spreadsheet, each column (200+ columns in total) a country-year combination of values and variables. My Regards, Alberto
      Attached Files

      Comment


      • #4
        Okay. So the missing values make this extra complicated.
        Before you try to go any further with this, I'd suggest seeing if there is a way to get the dataset in a more useful format from the same source.

        Comment


        • #5
          Ok, and thank you.

          Comment


          • #6
            Actually, I just figured you could do this without a dictionary as well.
            Works by finding corresponding variable value by looking in observation immediately below the var name. execute in blocks if you want to see the idea at work.

            This works, provided that all your variables are numeric values, and that all variable values are in the row immediately below the variable name in your xls sheet. If there are further rules to how your data is laid out, this might not work.
            There might be more elegant ways to do this.




            Code:
            import excel "toyData.xlsx", sheet("data_input") firstrow allstring clear
            save masterplaydata, replace
            
            foreach var of varlist _all{
            replace `var'=`var'[_n+1]
            }
            destring *, replace force
            gen obsno=_n
            order obsno, first
            ren * value*
            ren valueobsno obsno
            reshape long value, i(obsno) j(ctryyr) string
            save usingplaydata, replace
            
            use masterplaydata
            gen obsno=_n
            order obsno, first
            ren * varname*
            ren varnameobsno obsno
            reshape long varname, i(obsno) j(ctryyr) string
            merge 1:1 obsno ctryyr using usingplaydata
            drop _merge
            destring varname, gen(helper) force
            drop if value==. & helper!=.
            drop helper
            drop if varname==""
            
            drop obsno
            reshape wide value, i(ctryyr) j(varname) string
            ren value* *

            Comment


            • #7
              This works just great. Can I thank you enough? Best, Alberto

              Comment


              • #8
                A combination of stack, collapse and xpose can also solve this issue a little bit more concisely.

                Code:
                import excel "toyData.xlsx", sheet("data_input") firstrow allstring clear
                ds *
                local varlist `r(varlist)'
                stack *, into(_v) wide
                gen _varname=_v[_n-1]
                destring `varlist' _v, force replace
                keep if _v==. &  _varname !=“”
                collapse (firstnm) `varlist', by(_varname)
                xpose, clear varname
                ren_varname datasource

                Comment


                • #9
                  Nice solution. I learned something from that.
                  Just as a note, there does seem to be a mistake in the 'keep if' line.
                  Below code works.



                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str4(aut81 ita86 can94)
                  "varA" "varB" "varA"
                  "1"    "2"    "7"  
                  "varB" "varC" "varC"
                  "varC" "varD" "12"  
                  "2"    "4"    "varB"
                  "varD" "varE" "9"  
                  "3"    "8"    "varF"
                  "varE" "varF" "2"  
                  "varF" "10"   ""    
                  "8"    ""     ""    
                  end
                  
                  ds *
                  local varlist `r(varlist)'
                  stack *, into(_v) wide
                  gen _varname=_v[_n-1]
                  destring `varlist' _v, force replace
                  keep if _v!=. &  _varname !=""
                  collapse (firstnm) `varlist', by(_varname)
                  xpose, clear varname
                  ren_varname datasource

                  Comment


                  • #10
                    Thank Jorrit for the correction. The typos arise with my typing ... in my Ipad without properly checking. Honestly, I have tested the code in my computer before happening to be "forced" using the Ipad to upload due to power problem.

                    My original thinking for this line is
                    Code:
                    drop if _v ==.
                    which should work, at least for the small sample provided by Alberto. Then I have changed it (in the ipad and ...wrongly) to this 'keep if' in a try of keeping all the Varnames, even those which might have all missing values. But this try makes no sense but is incorrect in typing.

                    And now, Jorrit correction is ... correct. But I also notice two small improvement, including an additional 'clear' in 'stack' command and a lack of 'space' mistake (of mine) in the last line, which has not been fixed. As a result, I would like to re-upload the code here for saving to any future reading. Thanks again and sorry for any inconvenience.

                    Code:
                    clear
                    input str4(aut81 ita86 can94)
                    "varA" "varB" "varA"
                    "1" "2" "7"
                    "varB" "varC" "varC"
                    "varC" "varD" "12"
                    "2" "4" "varB"
                    "varD" "varE" "9"
                    "3" "8" "varF"
                    "varE" "varF" "2"
                    "varF" "10" ""
                    "8" "" ""
                    end
                    ds *
                    local varlist `r(varlist)'
                    stack *, into(_v) wide clear
                    gen _varname=_v[_n-1]
                    destring `varlist' _v, force replace
                    keep if _v!=. & _varname !=""
                    collapse (firstnm) `varlist', by(_varname)
                    xpose, clear varname
                    ren _varname datasource

                    Comment


                    • #11
                      Dear Jorrit and Romalpa, thank you again very much to both of you. I have been learning quite a lot from both of you. Best, Alberto

                      Comment

                      Working...
                      X