Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • unexpected change to the dataset

    Hi,
    I am new in STATAList and I thought a lot before posting ( I also read the FAQ to be sure).
    Well the problem I am having right now (I am new in STATA 16, although I used STATA 15 until now) is that I am working on a dataset and after asking STATA to derive some basic stats it shows me the dataset l with points instead of the values of the variable (See picutre), although it keeps calculating whatever I ask for.

    Here the commands I used

    use "/Users/mattia/Protesi d´anca/Protesi anca STATA.dta", clear

    encode InsertMaterial, gen(InsertMaterial2)

    by InsertMaterial, sort: sum DiffLegLenthmm

    It happens when I insert the last command.

    I am now really puzzled because it never happened to me before.
    As I mentioned: STATA keeps deriving the statistics, but I can´t verify the dataset or changes in it and it is very disturbing.
    Does anyone have the same problem?

    Thanks a lot!

    Mattia





    Attached Files

  • #2
    What that suggests is that actually your dataset has at least 21 observations with missing values for many of your variables — blanks for string variables like GenderMF and numeric missing values displayed as "." — , which you normally do not see, but once you sort your dataset by InsertMaterial, all the observations with missing values sort to the top of the dataset.

    I expect if you did
    Code:
    use "/Users/mattia/Protesi d´anca/Protesi anca STATA.dta", clear
    codebook GenderMF WeightKg
    sort InsertMaterial
    codebook GenderMF WeightKg
    you would find that both of the codebook commands tell you the same things about each of your variables, in particular, the same number of missing values.

    In short, nothing changed in your data — your're just seeing observations you've not seen before.

    Comment


    • #3
      Well, yes, William Lisowski's observation is most certainly correct. But it does seem odd that there are 21 (or perhaps more) observations in the data set with missing values on every variable. (I assume that all the variables in the data set are shown in the screenshot, or if there are more, that they too are populated with missing values.) While there's nothing illegal about that, it does at least raise the question whether something has gone amiss in the data management that created this data set. Until I investigated that possibility, I would not trust any results generated from it.

      Comment


      • #4
        To what Clyde Schecter wrote, I would add that if you were surprised by these observations — if you did not previously realize they existed, rather than were surprised to see them at the top of your dataset — then you should be spending more time with Stata's data management commands like codebook and misstable before you set about using a new dataset. The Stata Data Management Reference Manual PDF (included in your Stata installation and accessible from Stata's Help menu) will give you guidance on tools for exploring your data before you set about to use it.

        Comment


        • #5
          Great! thanks a lot William and Clyde!
          I checked again the data and to no surprise the point is that I imported the an excel spreadsheet, without noting that at the bottom of the dataset there were around 900 blank spaces... thanks guys!

          Comment

          Working...
          X