Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Without example data that reproduces the problem, I can't give you specific advice. What I can say, at a general level, is that somehow, what you think are "En désaccord" and "En total désaccord" in your data set, really aren't. One possibility is that those responses are padded with leading or trailing blanks, which your eye does not see, but Stata does. If that is the problem
    Code:
    replace `var' = trim(itrim(`var'))
    before the -encode- will resolve the problem.

    If it is not a matter of blanks, there may be "non-printing" characters embedded in Educ_16, which, again, your eye does not see, but Stata does. Those are more difficult to deal with, as there is no simple cleanup function like trim() to remove them. For a start you can run -chartab- (by Robert Picard, available from SSC) to identify all the characters contained in Educ_16. You will then have to use -subinstr()- or -usubinstr()- to remove them.

    Comment


    • #17
      Many thanks John.
      Surprisingly, the options which have disappeared are repeated as you can see it here:
      educat:
      1 En total désaccord
      2 En désaccord
      3 Neutre
      4 D'accord
      5 Tout à fait d’accord
      6 D'accord
      7 En total désaccord
      8 En désaccord
      and what is strange is that when I try to recode them, I get the following message :
      too few variables specified
      . The latter message is given when I use:
      Code:
       foreach var of varlist Educ_1-Educ_18 {
        2.         recode (7=1) (8=2) (6=4)                
        3. }
      Really strange!

      Comment


      • #18
        Dear Clyde,
        Many thanks. I've tried to trim without success. I will look for the Robert Picard as suggested. In the meantime I try to mimic the data structure:

        clear
        input str1 (Educ_1 Educ_2 Educ_3)

        1 "En total désaccord" "D'accord" "D'accord"
        2 "En désaccord" "En désaccord" "Neutre"
        3 "D'accord" "En désaccord" "D'accord"
        4 "En désaccord" "En total désaccord" "D'accord"
        5 "D'accord" "D'accord" "D'accord"
        6 "D'accord" "En total désaccord" "En total désaccord"
        7 "En total désaccord" "En désaccord" "En désaccord"
        8 "Neutre" "Neutre" "En désaccord"
        9 "Neutre" "En total désaccord" "En total désaccord"
        10 "Neutre" "En total désaccord" "En total désaccord"
        11 "Neutre" "Neutre" "Neutre"
        end

        I will look and see how it can help.
        Many thanks

        Comment


        • #19
          I do not see anything wrong here. From your post in #14


          tab Educ_16

          Ensei. compétents | Freq. Percent Cum.
          ----------------------+-----------------------------------
          Neutre | 33 34.74 34.74
          D'accord | 46 48.42 83.16
          Tout à fait d’accord | 2 2.11 85.26
          En total désaccord | 5 5.26 90.53
          En désaccord | 9 9.47 100.00
          ----------------------+-----------------------------------
          Total | 95 100.00
          . tab Educ_16, nolab

          Ensei. |
          compétents | Freq. Percent Cum.
          ------------+-----------------------------------
          3 | 33 34.74 34.74
          4 | 46 48.42 83.16
          5 | 2 2.11 85.26
          6 | 5 5.26 90.53
          7 | 9 9.47 100.00
          ------------+-----------------------------------
          Total | 95 100.00
          The variable Educ_16 does not contain the values 1 and 2. In addition, here is what label list shows


          educat:
          1 En total désaccord
          2 En désaccord
          3 Neutre
          4 D'accord
          5 Tout à fait d’accord
          6 D'accord
          7 En total désaccord
          8 En désaccord
          There is no rule that the same value label cannot be used for more than one value. Therefore, the question is how do you want the label to look like considering that the values that you are labeling are within the range 1-8?

          Comment


          • #20
            Dear Andrew, many thanks. Sorry that the elements I've posted could not be as clearer as required. In fact, in the tablet that served in the collection of the data, I just had 5 options which are also the 5 firsts in the list above. This is what showed me that there was something wrong. I've tried to check and did not find anywhere a cell with "6", "7" or "8" but they are appearing when I encode the variable. I think that what Clyde (post #16) has suggest is likely to be the true problem. I've read and tried "chartab" by Robert Picard but I could not advance much so far.

            Comment


            • #21
              The good thing is that when I code separately all the variables (as opposed to what I reported post #17), everything works perfectly. However, this allows me to see another strange thing: I have in total 1680 observations but only 95 are being used by STATA for the different operations I want to run.

              Code:
              tab enquetecible
              Enquêté cîble | Freq. Percent Cum.
              ------------------------------+-----------------------------------
              Ménage | 1,259 74.90 74.90
              Unité de production | 327 19.45 94.35
              Ménage et Unité de production | 95 5.65 100.00
              ------------------------------+-----------------------------------
              Total | 1,681 100.00


              Only the observations corresponding to the last option are being used for the different operations. The data base is imported from an Excel file using the following command:
              Code:
              import excel "BD_all1_versions_25.01.2020.xlsx", sheet("perception_qlty") firstrow clear
              Additional commands are :
              Code:
              rename DATEDELENQUÊTE date_svy 
              rename    Enquêtécible enquete 
              rename NOMDELENQUÊTÉ name_enqt
              
              gen enquetecible=0
              replace enquetecible=1 if enquete=="1. Ménage"
              replace enquetecible=2 if enquete=="2. Unité de production"
              replace enquetecible=3 if enquete=="3. Ménage et Unité de production"
              label define enqueteciblecode 1 "Ménage"  2 "Unité de production" 3 "Ménage et Unité de production" 
              label value enquetecible enqueteciblecode
              label var enquetecible "Enquêté cîble"
              drop  if enquetecible==0 /*une observation (=0) dont j'ignore l'origine*/
              I have no clue on what can be the origin of this misbehavior of the data. I attach the dataset in Excel for any required precision.
              Many thanks in advance.

              Attached Files

              Comment


              • #22
                The good thing is that when I RECODE separately all the variables (as opposed to what I reported post #17), everything works perfectly. However, this allows me to see another strange thing: I have in total 1680 observations but only 95 are being used by STATA for the different operations I want to run.

                Many thanks in advance.

                Comment


                • #23
                  I would rather fix the problem than to use recode later on. Here is one way which makes the labels consistent beforehand.

                  Code:
                  foreach var of varlist Educ_1-Educ_18 {
                  replace `var'=  "total" if ustrregexm(lower(`var'), "total")
                  replace `var'=  "En  désaccord" if ustrregexm(lower(`var'), "en")
                  replace `var'=  "Neutre" if ustrregexm(lower(`var'), "neut")
                  replace `var'=  "D'accord" if ustrregexm(lower(`var'), "^d")
                  replace `var'=  "Tout à fait d’accord" if ustrregexm(lower(`var'), "tout")              
                  replace `var'= "En  total désaccord"  if ustrregexm(lower(`var'), "total")
                  }
                  label define educat  1"En  total désaccord"    2"En  désaccord"  3"Neutre" 4"D'accord" 5"Tout à fait d’accord"
                  foreach var of varlist Educ_1-Educ_18 {
                      encode `var', gen(`var'_)  label(educat)
                      drop `var'
                      rename `var'_ `var'        
                  }
                  I have in total 1680 observations but only 95 are being used by STATA for the different operations I want to run.
                  This is usually due to missing values in other variables that you use. Stata uses listwise deletion of missing values. Therefore, if an observation of a particular variable is missing, Stata deletes the whole observation. This means that you have in total 95 complete cases where no variable is missing for the particular task that you were undertaking. If you search for "multiple imputation", you will see a way to deal with missing values. Finally, to see the sample after a regression

                  Code:
                  regress ....
                  gen sample = e(sample)
                  browse if sample

                  Comment


                  • #24
                    In #23 there are multiple adjacent spaces in some strings. Following #16 I would clean up all the string variables with trim(itrim()) and also check for non-standard characters and from the other end only use labels with single spaces.

                    Code:
                    tab1 Educ_1-Educ_18
                    would be a further check.

                    Comment


                    • #25
                      Dear Andrew,
                      Many thanks. This code
                      Code:
                      . foreach var of varlist Educ_1-Educ_18 { 
                       replace `var'=  "total" if ustrregexm(lower(`var'), "total") replace `var'=  "En  désaccord" if ustrregexm(lower(`var'), "en") replace `var'=  "Neutre" if ustrregexm(lower(`var'), "neut") replace `var'=  "D'accord" if ustrregexm(lower(`var'), "^d") replace `var'=  "Tout à fait d’accord" if ustrregexm(lower(`var'), "tout")               replace `var'= "En  total désaccord"  if ustrregexm(lower(`var'), "total") } label define educat  1"En  total désaccord"    2"En  désaccord"  3"Neutre" 4"D'accord" 5"Tout à fait d’accord" foreach var of varlist Educ_1-Educ_18 {     encode `var', gen(`var'_)  label(educat)     drop `var'     rename `var'_ `var'         }
                      works perfectly. However, concerning the point on observations which are disappearing, I attached the whole dataset for an indication of the true problem I am facing. The problem is appearing before I reach the level of regression and for option "1" (Ménage) there is no missing observation for most of the observations but they disappear. I even try to run some of the operations by conditioning them to (options any of the tree options "1" and "2" and even there I could not find the operations done for more than the 95 observations. Many thanks,

                      Comment


                      • #26
                        Many thanks, Nick. I will check that also. The solution suggested from the code provided by Andrew in post #23 provides a solution to the problem I had. I will though try the solution you have also suggested.

                        On another issue, Andrew has replied as follows
                        This is usually due to missing values in other variables that you use. Stata uses listwise deletion of missing values. Therefore, if an observation of a particular variable is missing, Stata deletes the whole observation. This means that you have in total 95 complete cases where no variable is missing for the particular task that you were undertaking. If you search for "multiple imputation", you will see a way to deal with missing values. Finally, to see the sample after a regression
                        to my request.

                        I've tried to check whether missing values may be the problem and don't really think they are. The problem is appearing before I reach the level of regression and for option "1" (Ménage) there is no missing observation for most of the observations but they disappear. I even try to run some of the operations by conditioning them to (options any of the tree options "1" and "2" and even there I could not find the operations done for more than the 95 observations. I attached the whole dataset for an indication of the true problem I am facing.

                        Many thanks.
                        Last edited by Kamala Kaghoma; 16 Mar 2020, 09:16.

                        Comment


                        • #27
                          To Nick's point, your value label has spaces which I did not notice as I was copying and pasting, (e.g., "En total désaccord" has 2 spaces between "En" and "total"). This may explain the problem with the initial encode, and it's much better to address this than replacing the labels as I do.

                          Only the observations corresponding to the last option are being used for the different operations.
                          Can you provide code that leads you to conclude that some observations are ignored?
                          Last edited by Andrew Musau; 16 Mar 2020, 10:03.

                          Comment


                          • #28
                            Dear Andrew, many thanks for your reaction.
                            To Nick's point, your value label has spaces which I did not notice as I was copying and pasting, (e.g., "En total désaccord" has 2 spaces between "En" and "total"). This may explain the problem with the initial encode, and it's much better to address this than replacing the labels as I do.
                            This is already considered. I've trimmed all the variable before I run the suggested commands.

                            As for the second question, as I mentioned in post#21, tabulating

                            Code:
                              tab enquetecible
                            shows the three options I have in the dataset. 1 corresponds to "Ménage" (houshold), 2 to "Unité de production" (Production unit) and 3 Ménage et Unité de production (Household and Unit of production), a total of 1681observations. Option 3 corresponds to the situation where the whole questionnaire has been administered to an individual who represents both her household as well as a production unit while in the former to options it is either a representative of a household and thus submitted to one component of the questionnaire or a representative of a unit of production, thus submitted to the productor's component of the questionnaire. However, even when I tabulate, for instance the same variable as is post #21, I get the same results as the one which is there in the post, instead of having it for (95+1259), the number of observations to which the analysis is restricted after I've just keeped part of the set of observations
                            Code:
                            preserve 
                            
                            keep if enquetecible==1 | enquetecible==3
                            ******************************
                            Thanks

                            Comment


                            • #29
                              I am not getting the same result running your code. Below, I rewrite it to save some lines

                              Code:
                              import excel "BD_all1_versions_25.01.2020.xlsx", sheet("perception_qlty") firstrow clear
                              rename (DATEDELENQUÊTE Enquêtécible NOMDELENQUÊTÉ) (date_svy enquete name_enqt)
                              tab enquete
                              gen enquetecible= real(substr(enquete, 1, 1))
                              tab enquetecible
                              preserve
                              keep if inlist(enquetecible,1, 3)
                              tab enquetecible
                              Res.:

                              Code:
                              . tab enquete
                              
                                                  Enquêté cible  |      Freq.     Percent        Cum.
                              -----------------------------------+-----------------------------------
                                                       1. Ménage |      1,259       74.90       74.90
                                          2. Unité de production |        327       19.45       94.35
                                3. Ménage et Unité de production |         95        5.65      100.00
                              -----------------------------------+-----------------------------------
                                                           Total |      1,681      100.00
                              
                              .
                              . gen enquetecible= real(substr(enquete, 1, 1))
                              (1 missing value generated)
                              
                              .
                              . tab enquetecible
                              
                              enquetecibl |
                                        e |      Freq.     Percent        Cum.
                              ------------+-----------------------------------
                                        1 |      1,259       74.90       74.90
                                        2 |        327       19.45       94.35
                                        3 |         95        5.65      100.00
                              ------------+-----------------------------------
                                    Total |      1,681      100.00
                              
                              .
                              . preserve
                              
                              .
                              . keep if inlist(enquetecible,1, 3)
                              (328 observations deleted)
                              
                              .
                              . tab enquetecible
                              
                              enquetecibl |
                                        e |      Freq.     Percent        Cum.
                              ------------+-----------------------------------
                                        1 |      1,259       92.98       92.98
                                        3 |         95        7.02      100.00
                              ------------+-----------------------------------
                                    Total |      1,354      100.00
                              I get the same result running your code in #21. Note that I am using Stata 16, but I don't see a reason why the version should matter here.

                              Comment


                              • #30
                                Dear Andrew,
                                This is quite intriguing. I agree that the version of STATA should not have a lot to do with that. I am using STATA 14 in fact. I've just attached part of my dofile. Maybe you can help me detect a query somewhere in that I am not getting. Below is the output of
                                Code:
                                tab Educ_1
                                and
                                Code:
                                tab enquetecible
                                after running
                                Code:
                                preserve
                                and before I
                                Code:
                                restore
                                . I can't really understand what is wrong.

                                Code:
                                . **********************************************/ 
                                end of do-file
                                
                                tab Educ_1
                                
                                 Ecoles: organisée & |
                                         bien gérées |      Freq.     Percent        Cum.
                                ---------------------+-----------------------------------
                                 En  total désaccord |          9        9.47        9.47
                                       En  désaccord |         35       36.84       46.32
                                              Neutre |         19       20.00       66.32
                                            D'accord |         28       29.47       95.79
                                Tout à fait d’accord |          4        4.21      100.00
                                ---------------------+-----------------------------------
                                               Total |         95      100.00
                                
                                . tab enquetecible
                                
                                                Enquêté cîble |      Freq.     Percent        Cum.
                                ------------------------------+-----------------------------------
                                                       Ménage |      1,258       74.88       74.88
                                          Unité de production |        327       19.46       94.35
                                Ménage et Unité de production |         95        5.65      100.00
                                ------------------------------+-----------------------------------
                                                        Total |      1,680      100.00
                                Many thanks in advance for your help.
                                Attached Files

                                Comment

                                Working...
                                X