Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape and re-organized data

    Hi again,

    I deeply appreciate your collaborations in the last issue I had. below is a snapshot of data I have. the problems are :

    - there are two variables
    LBCANT_Desc " which has all Antibiotics"
    Result_Value : which has two things in the same column!, the organism name and its reactions as "resistant, intermediate and sensitive".

    I want to reshape the data to be:

    - column for each organism (mostly 9 organisms)
    - column for each antibiotic
    - column for each reaction for each antioboitcs


    is there a way to break it down and reshape it ?


    Many thanks,
    Meshal



    Click image for larger version

Name:	B565956B-A8FE-4893-874B-040AB0FA8774.jpeg
Views:	2
Size:	1.55 MB
ID:	1741422
    Attached Files

  • #2
    Visually reviewing your screenshot, I am drawn to assume the following:
    1. Groups of consecutive observations refer to a single laboratory isolate and tests of its sensitivity to a variety of antibiotics.
    2. Each such group begins with an observation in which LBCANT_Desc is "NULL", Result_Value is the name of the organism, and Test_Item is "CULTR."
    So try this:
    Code:
    //    TESTASSUMPTIONS ABOUT DATA
    assert (LBCANT_Desc == "NULL") == !inlist(Result_Value, "Resistant", "Sensitive", "Intermediate")
    assert (LBCANT_Desc == "NULL") == (Test_Item == "CULTR")
    
    //    GROUP CONSECUTIVE OBSERVATIONS INTO GROUPS DESCRIBING A SINGLE ISOLATE
    gen isolate = sum(Test_Item == "CULTR")
    sort isolate, stable
    
    //    PREPARE AND RESHAPE
    drop Seq Test_Item // THESE VARY WITHIN ISOLATE
    rename Result_Value _
    reshape wide _, i(isolate) j(LBCANT_Desc) string
    rename _NULL organism
    rename _* *
    Now, while I'm sure you intended well by posting a screenshot of your data, had you read the Forum FAQ, as all Forum participants are asked to do before posting, you would be aware that screenshots of data are not very helpful. In particular, they cannot be imported into Stata, so it is not possible to use them to develop and test code. Consequently the above code is untested and may contain errors (even assuming my assumptions about the data are correct).

    If this code does not do what you need and you want further help, when you post back:
    1. Provide a clearer explanation of the organization of the data if my assumptions outlined above are incorrect.
    2. Show example data in the most helpful way: use the -dataex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      (I had prepared an answer similar to what Clyde said. For whatever use it might be to you, I have included it here even though it substantially duplicates what he said.)

      You will help yourself get help here if you post some example data using -dataex-, as described in the StataList FAQ. Screenshots are not generally helpful, and in using one, you reduce the chances that someone will help you. It would also help you get a good answer if you typed up an example of how you want your data to be organized after reshaping/reorganizing. Describing data precisely in words is always difficult to understand, even for people with a lot of experience doing that.


      It *appears* that each row for which LBCANT_desc contains "NULL" signifies the beginning of a sequence of observations (rows) that all pertain to that organism. That particular organism applies to all observations until the next observation containing NULL occurs, at which point a sequence starts (??). You might have explained some of that in your previous posting, but you would help yourself get help here if you can describe precisely what identification applies to each observation, and how those organism names are to be spread forward in your data set. Is it true that, in your data set, the links between related observations are carried by the *physical* order in the file? Or, are there some variables which can serve as sort keys that identify which observations are linked? I presume you know the answer to this last question, but it would help others if we were not left to guess about that. I am hoping (guessing) that the organism name and Episode_No uniquely identify related observations, but I'm not certain.

      So: It's possible someone will help you based on what you have presented here, but I recommend: 1) -dataex- list of your data as is; 2) A typed example of what you want your data to look like after reorganization; and 3) A description of how your variables link together related observations in your data set.

      Comment


      • #4
        Apology for delay response !

        Thank for the clarification !, I ran the dataex for your request.

        for Mr.Ckyde's code, it did not reshape, stata said "values of variable LBCANT_Desc not unique within isolate"


        input double LABNumber byte Seq str5 Test_Item_Code str13 LBCANT_Desc str44 Result_Value str7 Order_Code str23 LABOrdItm_Des long URN str11 Episode_Type_Desc double Collection_Date
        2200000415 2 "CULTR" "NULL" "Escherichia coli ESBL" "L5010AE" "Blood Culture Aerobic" 194874 "Out Patient" 1979459760000.0002
        2200000415 3 "AM" "Ampicillin" "Resistant" "L5010AE" "Blood Culture Aerobic" 194874 "Out Patient" 1979459760000.0002
        2200000415 5 "CPE" "Cefepime" "Resistant" "L5010AE" "Blood Culture Aerobic" 194874 "Out Patient" 1979459760000.0002
        2200000415 7 "CAX" "Ceftriaxone" "Resistant" "L5010AE" "Blood Culture Aerobic" 194874 "Out Patient" 1979459760000.0002
        2200000415 8 "CP" "Ciprofloxacin" "Resistant" "L5010AE" "Blood Culture Aerobic" 194874 "Out Patient" 1979459760000.0002
        2200000415 9 "GM" "Gentamicin" "Resistant" "L5010AE" "Blood Culture Aerobic" 194874 "Out Patient" 1979459760000.0002
        2200000415 11 "IMP" "Imipenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 194874 "Out Patient" 1979459760000.0002
        2200000415 12 "MER" "Meropenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 194874 "Out Patient" 1979459760000.0002
        2200000415 14 "AK" "Amikacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 194874 "Out Patient" 1979459760000.0002
        2200000427 2 "CULTR" "NULL" "Klebsiella pneumoniae" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 8 "CAX" "Ceftriaxone" "Resistant" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 9 "CPE" "Cefepime" "Resistant" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 10 "CP" "Ciprofloxacin" "Resistant" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 11 "GM" "Gentamicin" "Resistant" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 12 "CL" "Colistin" "Intermediate" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 13 "TGC" "Tigecycline" "Sensitive" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 14 "IMP" "Imipenem" "Resistant" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 15 "MER" "Meropenem" "Resistant" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 17 "ETP" "Ertapenem" "Resistant" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 19 "AK" "Amikacin" "Resistant" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000427 20 "AM" "Ampicillin" "Resistant" "L5010P" "Blood Culture Pediatric" 1457935 "Out Patient" 1979460779999.9998
        2200000435 2 "CULTR" "NULL" "Acinetobacter baumannii" "L5010AN" "Blood Culture Anaerobic" 791262 "Out Patient" 1979461260000.0002
        2200000435 3 "CAZ" "Ceftazidime" "Sensitive" "L5010AN" "Blood Culture Anaerobic" 791262 "Out Patient" 1979461260000.0002
        2200000435 5 "GM" "Gentamicin" "Sensitive" "L5010AN" "Blood Culture Anaerobic" 791262 "Out Patient" 1979461260000.0002
        2200000435 8 "MER" "Meropenem" "Sensitive" "L5010AN" "Blood Culture Anaerobic" 791262 "Out Patient" 1979461260000.0002
        2200000435 9 "IMP" "Imipenem" "Sensitive" "L5010AN" "Blood Culture Anaerobic" 791262 "Out Patient" 1979461260000.0002
        2200000435 10 "CP" "Ciprofloxacin" "Sensitive" "L5010AN" "Blood Culture Anaerobic" 791262 "Out Patient" 1979461260000.0002
        2200000435 11 "CPE" "Cefepime" "Sensitive" "L5010AN" "Blood Culture Anaerobic" 791262 "Out Patient" 1979461260000.0002
        2200000484 2 "CULTR" "NULL" "Escherichia coli ESBL" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979490720000
        2200000484 4 "AM" "Ampicillin" "Resistant" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979485200000.0002
        2200000484 4 "AM" "Ampicillin" "Resistant" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979490720000
        2200000484 6 "CPE" "Cefepime" "Resistant" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979485200000.0002
        2200000484 6 "CPE" "Cefepime" "Resistant" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979490720000
        2200000484 9 "CAZ" "Ceftazidime" "Resistant" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979485200000.0002
        2200000484 9 "CAZ" "Ceftazidime" "Resistant" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979490720000
        2200000484 10 "CFT" "Cefotaxime" "Resistant" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979485200000.0002
        2200000484 10 "CFT" "Cefotaxime" "Resistant" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979490720000
        2200000484 11 "AK" "Amikacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979485200000.0002
        2200000484 11 "AK" "Amikacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979490720000
        2200000484 12 "ETP" "Ertapenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979485200000.0002
        2200000484 12 "ETP" "Ertapenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979490720000
        2200000484 13 "CP" "Ciprofloxacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979485200000.0002
        2200000484 13 "CP" "Ciprofloxacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979490720000
        2200000484 14 "IMP" "Imipenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979485200000.0002
        2200000484 14 "IMP" "Imipenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979490720000
        2200000484 15 "MER" "Meropenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979485200000.0002
        2200000484 15 "MER" "Meropenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 712554 "In Patient" 1979490720000
        2200001141 2 "CULTR" "NULL" "Staphylococcus aureus" "L5010AN" "Blood Culture Anaerobic" 193237 "In Patient" 1979558939999.9998
        2200001141 7 "CP" "Ciprofloxacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 193237 "In Patient" 1979558939999.9998
        2200001141 9 "OX" "Oxacillin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 193237 "In Patient" 1979558939999.9998
        2200001141 10 "GM" "Gentamicin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 193237 "In Patient" 1979558939999.9998
        2200001905 2 "CULTR" "NULL" "Escherichia coli" "L5010AN" "Blood Culture Anaerobic" 800190 "Emergency" 1979671860000.0002
        2200001905 4 "CFT" "Cefotaxime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 800190 "Emergency" 1979664839999.9998
        2200001905 6 "CAZ" "Ceftazidime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 800190 "Emergency" 1979664839999.9998
        2200001905 8 "CP" "Ciprofloxacin" "Intermediate" "L5010AE" "Blood Culture Aerobic" 800190 "Emergency" 1979664839999.9998
        2200001905 11 "GM" "Gentamicin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 800190 "Emergency" 1979671860000.0002
        2200001905 13 "AM" "Ampicillin" "Resistant" "L5010AE" "Blood Culture Aerobic" 800190 "Emergency" 1979664839999.9998
        2200002094 2 "CULTR" "NULL" "Pseudomonas aeruginosa" "L5010AN" "Blood Culture Anaerobic" 370351 "In Patient" 1979712839999.9998
        2200002094 3 "AK" "Amikacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979710680000.0002
        2200002094 3 "AK" "Amikacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979712839999.9998
        2200002094 4 "MER" "Meropenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979710680000.0002
        2200002094 4 "MER" "Meropenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979712839999.9998
        2200002094 5 "IMP" "Imipenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979710680000.0002
        2200002094 5 "IMP" "Imipenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979712839999.9998
        2200002094 7 "GM" "Gentamicin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979710680000.0002
        2200002094 7 "GM" "Gentamicin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979712839999.9998
        2200002094 9 "CPE" "Cefepime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979710680000.0002
        2200002094 9 "CPE" "Cefepime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979712839999.9998
        2200002094 10 "CAZ" "Ceftazidime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979710680000.0002
        2200002094 10 "CAZ" "Ceftazidime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979712839999.9998
        2200002094 15 "OX" "Oxacillin" "Resistant" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979710680000.0002
        2200002094 15 "OX" "Oxacillin" "Resistant" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979712839999.9998
        2200002094 21 "GM" "Gentamicin" "Resistant" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979710680000.0002
        2200002094 21 "GM" "Gentamicin" "Resistant" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979712839999.9998
        2200002094 22 "CP" "Ciprofloxacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979710680000.0002
        2200002094 22 "CP" "Ciprofloxacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 370351 "In Patient" 1979712839999.9998
        2200008716 2 "CULTR" "NULL" "Klebsiella pneumoniae" "L5010AN" "Blood Culture Anaerobic" 1165732 "In Patient" 1980025680000
        2200008716 4 "AK" "Amikacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 4 "AK" "Amikacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 5 "AM" "Ampicillin" "Resistant" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 5 "AM" "Ampicillin" "Resistant" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 8 "CAZ" "Ceftazidime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 8 "CAZ" "Ceftazidime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 10 "CFT" "Cefotaxime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 10 "CFT" "Cefotaxime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 12 "CFX" "Cefoxitin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 12 "CFX" "Cefoxitin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 14 "CL" "Colistin" "Intermediate" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 14 "CL" "Colistin" "Intermediate" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 15 "CP" "Ciprofloxacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 15 "CP" "Ciprofloxacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 16 "CPE" "Cefepime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 16 "CPE" "Cefepime" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 18 "ETP" "Ertapenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 18 "ETP" "Ertapenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 20 "GM" "Gentamicin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 20 "GM" "Gentamicin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 21 "IMP" "Imipenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        2200008716 21 "IMP" "Imipenem" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1980025680000
        2200008716 22 "LVX" "Levofloxacin" "Sensitive" "L5010AE" "Blood Culture Aerobic" 1165732 "In Patient" 1.9800228e+12
        end
        format %tcnn/dd/ccYY_hh:MM Collection_Date
        [/CODE]


        for Mr. Mike, "I am hoping (guessing) that the organism name and Episode_No uniquely identify related observations", yes it identifies related observations



        I am trying to reshape the data as :

        Col Col Col Col Col Col
        seq "New variable for the organism" LBCANT_Desc _1 Result_Value_1 LBCANT_Desc _2 Result_Value_2

        Raw 1 Escherichia coli ESBL Ampicillin Resistant Cefepime Resistant

        RAW 2 Klebsiella pneumoniae Amikacin Resistant Ampicillin Resistant




        Many Thanks,
        Meshal

        Comment


        • #5
          OK, I don't understand how this data is organized. My original assumption that groups of consecutive observations beginning with one that has LBCANT_Desc == "CULT" form a single isolate appears to be incorrect because within such groups there can be different collection dates and the same antibiotics can be tested more than once. Worse still, even within a group defined by a single LABNumber and Collection Date the same antibiotic can be tested twice with conflicting results:
          Code:
               LABNumber   Collection_D~e   LBCANT_D~c   Result_~e  
              2200002094   9/25/2022 7:38   Gentamicin   Resistant  
              2200002094   9/25/2022 7:38   Gentamicin   Sensitive  
              2200002094   9/25/2022 8:13   Gentamicin   Sensitive  
              2200002094   9/25/2022 8:13   Gentamicin   Resistant
          So I can't really proceed without understanding what constitutes a single isolate here, and until you fix the errors that led to the contradictory results shown above. There may be more such in your full data set. To find them:
          Code:
          by LABNumber Collection_Date LBCANT_Desc (Result_Value), sort: ///
              gen byte problem = Result_Value[1] != Result_Value[_N]
          browse if problem

          Comment


          • #6
            Hi Mr. Clyde

            I attached a sample of the data as dta., after I ran your last code for contradiction. I am still somehow confused about the dates whether to remove or keep them. for any suggestion please ?


            Thanks
            Attached Files

            Comment


            • #7
              I'd like to help, but, like many others, for security reasons, I do not download or open attachments from people I do not know. Please post example data using -dataex-, as you have done before.

              You have two different problems in this data set. One is observations that are exact duplicates of each other (except for the seq variable, and, sometimes, the timestamp). For those, it is satisfactory to remove one from each pair of duplicates. No information will be lost in doing that. However, I do strongly recommend you review the data management that led up to the creation of the data set, because the presence of exact duplicate observations is often the result of an error, and if that is your situation, you should find the error, and then fix that error and any others you find in the course of investigating it.

              The other problem is you have some observations that are contradictory duplicates. That is, they refer to the same isolate and the same antibiotic, but one of them says its sensitive and the other says its resistant. Clearly these can't both be right. The code in #5 shows you how to find all of them in your full data set. So you need to find out which of the observations is correct, and then delete the other one. Again, the fact that you have such observations often is the result of errors in the data management that created the set, so, in addition to solving the immediate problem, you should check all the processes that led to the creation of your data set to try to find the error(s) that led to this situation in the first place, and fix it (them).

              Added: The other concern I have about the data is that the duplicate observations often have a different value for the timestamp. This would lead me to think that they come from different isolates (and in that case, even the contradictory sensitivity results might be correct). But they are still listed under the same observation having LBCANT_Desc == "CULT", which would suggest that they are the same isolate. In that case the contradictions on sensitivity are not acceptable, nor are the contradictions on the timestamp. So in order to proceed, we also need to figure out what is going on here.
              Last edited by Clyde Schechter; 31 Jan 2024, 13:33.

              Comment


              • #8
                Hi Mr.Clyde,

                My sincere apology for delay response!


                I tried to fix it, but still facing the same issue:

                - below text from stata output window after removing the obs that has problem to see if the data will be reshaped wide or not


                . reshape error
                (j = Amikacin Ampicillin Azithromycin Cefepime Cefotaxime Cefoxitin Ceftazidime Ceftriaxone Ciprofloxacin Colistin Ertapenem Gentamicin Imipenem Levofloxacin Meropenem NULL Oxacil
                > lin Spectinomycin Tigecycline)

                i (isolate) indicates the top-level grouping such as subject id.
                j (LBCANT_Desc) indicates the subgrouping such as time.
                The data are in the long form; j should be unique within i.

                There are multiple observations on the same LBCANT_Desc within isolate.

                The following 129 of 511 observations have repeated LBCANT_Desc values:

                +-------------------------+
                | isolate LBCANT_Desc |
                |-------------------------|
                28. | 3 Amikacin |
                29. | 3 Amikacin |
                30. | 3 Ampicillin |
                31. | 3 Ampicillin |
                32. | 3 Cefepime |
                |-------------------------|
                33. | 3 Cefepime |
                34. | 3 Cefotaxime |
                35. | 3 Cefotaxime |
                36. | 3 Ceftazidime |
                37. | 3 Ceftazidime |
                |-------------------------|
                38. | 3 Ciprofloxacin |
                39. | 3 Ciprofloxacin |
                40. | 3 Ertapenem |
                41. | 3 Ertapenem |
                42. | 3 Imipenem |
                |-------------------------|
                43. | 3 Imipenem |
                44. | 3 Meropenem |
                45. | 3 Meropenem |
                57. | 6 Amikacin |
                58. | 6 Amikacin |
                |-------------------------|
                59. | 6 Cefepime |
                60. | 6 Cefepime |
                61. | 6 Ceftazidime |
                62. | 6 Ceftazidime |
                63. | 6 Ciprofloxacin |
                |-------------------------|
                64. | 6 Ciprofloxacin |
                65. | 6 Imipenem |
                66. | 6 Imipenem |
                67. | 6 Meropenem |
                68. | 6 Meropenem |
                |-------------------------|
                71. | 7 Amikacin |
                72. | 7 Amikacin |
                73. | 7 Ampicillin |
                74. | 7 Ampicillin |
                75. | 7 Cefepime |
                |-------------------------|
                76. | 7 Cefepime |
                77. | 7 Cefotaxime |
                78. | 7 Cefotaxime |
                79. | 7 Cefoxitin |
                80. | 7 Cefoxitin |
                |-------------------------|
                81. | 7 Ceftazidime |
                82. | 7 Ceftazidime |
                83. | 7 Ciprofloxacin |
                84. | 7 Ciprofloxacin |
                85. | 7 Colistin |
                |-------------------------|
                86. | 7 Colistin |
                87. | 7 Ertapenem |
                88. | 7 Ertapenem |
                89. | 7 Gentamicin |
                90. | 7 Gentamicin |
                |-------------------------|
                91. | 7 Imipenem |
                92. | 7 Imipenem |
                93. | 7 Levofloxacin |
                94. | 7 Levofloxacin |
                95. | 7 Meropenem |
                |-------------------------|
                96. | 7 Meropenem |
                148. | 12 Ampicillin |
                149. | 12 Ampicillin |
                150. | 12 Cefepime |
                151. | 12 Cefepime |
                |-------------------------|
                152. | 12 Ceftazidime


                for your help, many thanks

                Meshal

                Comment


                • #9
                  Well, the problem is that we have not yet succeeded at defining an isolate. An isolate would be a single organism identified in a single specimen. And crucially, any isolate would have only one test for any given antibiotic. I cannot from your example data perceive a group of variables that serve to identify an isolate. I initially thought that all of the observations that appear between two observations for which LABCANT_Desc == "CULTR" would do that, but in the full data set that proves to be incorrect. None of the information provided in subsequent posts sheds any new light on this issue.

                  I think it is unlikely that further discussion of this problem here will lead to an answer. I think O.P. needs to consult the microbiology lab that create the data and ask them to explain how to identify single isolates within this data. Once that is definitively resolved, -reshape- will easily work when the -i()- variables jointly identify isolates. Or perhaps the lab will explain that approaching this in terms of isolates is misguided, and they do in fact repeatedly test the same isolate against the same antibiotics, and that the results can be contradictory. However, in that case, I do not see any meaningful way to reorganize this data into wide layout.

                  Comment

                  Working...
                  X