Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata command / syntax

    In a study consisting of treated, control and pure control groups (coded as t), education of the household head (educ_hh) is categorized into 0-illiterate, 1-Primary, 2-Secondary and 3-Tertiary. I would like to check whether there is a significant difference in the respective education categories across the three groups. For example i want to check the significant difference on illiterate level across the groups. What stata command is appropriate?


  • #2
    The education categories are hard to investigate in isolation: For example, if the groups differ with respect to illiterate, then the groups have to also differ with respect to at least one other educational outcome as the proportions have to add up to one. So usually we do one test for whether the distribution of education is the same across the tree groups, and use a simple cross-tabulation to inspect the pattern.

    Code:
    dtable i.educ_hh, by(t, tets)
    or

    Code:
    tab educ_hh t, col chi2
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks Maarten for your informative response and for the codes you have shared. Using a chi2 test it shows a general significant difference but when i tabulate and inpsect the pattern across the groups, the difference is so minimal as shown in the attached result. Since i am working on RCT study, i need to demonstrate that randomization worked and so this significance is contradicting the observed pattern. This is my dilemma. How can i navigate this scenario?
      Attached Files

      Comment


      • #4
        There is a pattern in that table: your controlls are somewhat better educated than your treated (2 to 3 percentage points fewer illiterates and 2 to 3 percentage points more college educated.). Especially the fact that the same pattern occurs in both control groups, makes me worried that something went wrong with your randomization. This is something you need to report and investigate.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Well noted sir, i have taken note of that. My understanding in this scenario however is that considering this is an experiment in a typical field situation with rural farmers, and the fact that we performed clustererd randomization of groups of farmers, it it may be not be possible to achieve 100% similarity in the population regarding the control variables. I hope to argue that though there are those observed differences, the trend and pattern seems similar. Am i right?

          Comment


          • #6
            When you are analyzing the data from your experiment you are seeing is a difference in means between the treated and control groups. At this point you don't know whether that difference is due to the treatment or due to the somewhat lower average education in the treated group. You can control for education, but the more concerning part is that this indicates that the randomization did not work as well as you hoped, and you don't know what other variables were also affected.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Thanks so maarten for your input. I will take your concerns into consideration

              Comment


              • #8
                Hello Peter
                I am not quite sure. However, it can be calculated using the formula: Power_effect = 1 - CDF(Z-score) under normal distribution data.
                Z-score = (effect_size/standard_error)
                higher power value implies a greater likelihood of showing the true effect of your skill variables on income.

                **************************
                I have a similar question regarding the effect size and statistical power of a non-normal distribution dataset. I mean, I have executed a quantile regression model, and I want to check the power effect, to confirm my result shows the true effect. I have tried the following code in Stata, but I failed to come up with the final value of the power effect.
                My objective is to estimate the effects of the commercialization index on income.

                * Set parameters
                local effect_size 16.543 // Effect size or coeffect of commercialization index at 50th quantile
                local sample_size 610 // Sample size
                local quantile 0.5 // Median or 50th quantile (for example)
                local n_simulations 1000 // Number of simulations

                * Initialize counter for significant results
                local significant_count 0

                * Loop for simulations
                forvalues i = 1/'n_simulations' {
                * Generate predictor variable (e.g., randomly from a normal distribution)
                set seed 'i'
                drawnorm predictors = rnormal('sample_size', 0, 1)

                * Generate response variable based on quantile regression model with effect size
                gen response = predictors * 'effect_size' + rnormal('sample_size', 0, 1)

                * Fit quantile regression model
                qreg response predictors, quantile('quantile')

                * Check if coefficient estimate is statistically significant
                if _b[predictors] < .05 {
                * Increment counter if significant
                local significant_count 'significant_count' + 1
                }
                }

                * Calculate power
                local power = ('significant_count' / 'n_simulations') * 100

                * Output power estimate
                di "Power: "'power'"%"

                Kindly help me find out where I made an error in this command.

                Comment

                Working...
                X