Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long string variable

    Dear all,

    I am trying to import a csv file (separated by commas) in which one variable contains very long strings (>300 characters). Unfortunately, Stata seems to truncate the string variable when I import the data (i.e., certain terms or words are cut off), so I lose information. I have read that Stata can hold very long strings since version 13, but I am still facing the truncation problem in version 16. I'm sure I missed something very trivial, but any help would be welcome.

    For importing the data I use:

    Code:
    insheet using "mydata.csv", names comma
    Thanks,

    Marvin

  • #2
    if you look at the help file for "insheet" you will see that has been replace; see
    Code:
    help import delimited

    Comment


    • #3
      Hello Rich,

      Thanks for your answer. Maybe I missed it, but do you mean that the string variable is replaced during import?

      I enclosed an example to illustrate the problem. The last line contains the broken string after this code was executed:

      Code:
      clear all
      insheet using "my_data.csv", names comma // load CSV dataset
      split my_var, p(;) // separate string by delimiter
      gen counter = _n
      drop my_var
      reshape long my_var, i(counter) j(counter_2) // after reshaping the last values are cut off

      Best,

      Marvin
      Attached Files

      Comment


      • #4
        insheet in Stata 16 is explained thus

        insheet has been superseded by import delimited. insheet continues to work but, as of Stata 13, is no longer an official part of Stata. This is the original help file, which we will
        no longer update, so some links may no longer work.


        Comment


        • #5
          Replacing your insheet command with
          Code:
          import delimited using "my_data.csv", varnames(1) delimiters(",") // load CSV dataset
          seems to do what you want, in particular the last values are
          Code:
          . list in 100/l
          
               +-----------------------------------------------+
               | counter   counte~2                     my_var |
               |-----------------------------------------------|
          100. |       1        100   org_H9IhveXmfxoO28FGgWgy |
          101. |       1        101   org_woSDqy3UKfvlbmg8k16w |
          102. |       1        102   org_EXpKXIvGQ8UrH3WHH3ys |
          103. |       1        103   org_GXpnGHx2QBhp4zq9Z0Ll |
          104. |       1        104   org_YHcbLJzeq2EgQqAvstGs |
               |-----------------------------------------------|
          105. |       1        105   org_MjuzntgXH3v6w46T6xKF |
          106. |       1        106   org_fGMIcj4sAkYrf9Lo0fbr |
               +-----------------------------------------------+

          Comment


          • #6
            Dear Nick and William,

            Thank you very much for sharing this solution. With the delimited command the problem is resolved.

            With thanks & regards,

            Marvin

            Comment


            • #7
              Hi Team,

              I have a question on the same topic.
              I have 50 data sets in microsoft excel 97-2003 worksheet with string and numerical values, see example below -

              Click image for larger version

Name:	Screenshot 2022-11-07 154051.png
Views:	1
Size:	6.2 KB
ID:	1688395

              I am importing theses using the following command -

              import delimited using "monthly_arr/`x'", delimiters(",") encoding(utf8) case(preserve) stringcols(2) clear

              However, the numerical values which are truncated above such as in row 3863 show up as text as "2.6373E+11" instead of numeric value which is 263473340016. Is there a way to ensure that this is imported as 263473340016 without manually formatting the excel files.
              Last edited by Bhavya Sinha; 07 Nov 2022, 03:39.

              Comment

              Working...
              X