Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate the amount of phenotype variance explained by a few SNPs?

    Hi
    I'm working with a validation dataset, testing 7 previous SNP association. A few of these SNPs is validated as associated with a binary phenotype of interest.
    Now I want to calculate the amount of phenotype variance that is explained by these SNPs.
    I would like to calculate the amount of phenotype variance explained by
    a) all 7 SNPs in the validation cohort
    b) the few SNPs that are significant in the validation cohort only
    I'm particularly interested in knowing how to carry out this calculation in "Stata" software.
    Secondly, the SNPs also associates with a similar phenotype, however, a phenotype explained by a numeric scale - does this change the method?
    I have read the article by Wray et al. Pitfalls of predicting complex traits from SNPs…however, i'm still not sure about how to calculate this heritability.
    I hope someone can guide me?
    /cheers
    PS: posted the same question in a another forum:
    How to calculate the amount of phenotype variance explained by a few SNPs? - ResearchGate. Available from: https://www.researchgate.net/post/Ho..._by_a_few_SNPs [accessed Apr 26, 2016].

  • #2
    Dear Jacob, did your SNP stands for Scottish National Party? That is what the first google search result was about. Please read the -FAQ section- which mentions that you are expected to clarfiy your questions and elaborate any acronyms/abbreviations as pepople here are from different dicsiplines and may understand completely different thing than what you tried to mean. And possibly a snippet of your data example will help everyone to help you. Please use -dataex- (if you don't have it installed type, ssc install dataex) and provide a small subset of your data say the command for first 30 rows of your data will be dataex in 1/30 . Then copy and paste that data example here using the code delimiters (the [code] delimiters will be at the top and end of your data examples produced by -dataex-) with clarified terms so that people understand your problem and help you to solve your problem.

    It seems like you are after a regression model and variance explained by the SNPs in an outcome variable 'phenotypic variable' ??. Is your data in nesting structure? In that case alculation of variance explained is not straight forward. Unless we see some data, and more clear question, I am afraid it is unlikely that you will get a useful reply.

    Roman

    Comment


    • #3
      Hi Roman
      Thanks for your very constructive input to my post. I have copy-pasted a dataex below, that is similar to my real one.
      First, SNPs are single-nucleotide polymorphisms, i.e. genetic variation on certain points of the dna string. I analysed for association between outcome/phenotype and SNPs, assuming a additive genetic effect. Therefore, a single SNP variable could influence with 0 (no variation), 1 (heterozygous for the snp), or 2 (homozygous).
      I used a multivariate logistic regression model:

      outcome = B1*snpx + B2*var1 + B3*var2 + B4*var3 etc.

      So my question: how much of the variance in outcome does SNPx explain? … how much does snp1-snp4 explain?
      I have an odds ratio for snpx, but can this value answer my question?

      I'm not really sure what you mean about "nesting structure of data"?

      Kind regards, Jacob
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float id long outc_any_none float(outc_numeric snp1 snp2 snp3 snp4) double(var1 var2) byte var3
      2182 0  .1 . . 2 3 5.9 2.375 1
      2195 1 1.6 . . . . 4.2   .75 1
      2140 1 1.6 . . 1 3 5.3  2.25 0
      2104 1 1.2 . . 3 2 6.7   2.5 0
       912 0 -.5 . . 2 1 5.7     . 0
      end

      Comment


      • #4
        You said you have an odds ratio for snpx, which suggests your model is a logistic regression model. In that case, you do not have an error term in your model as you normally would have in a linear model with continuous outcome. Therefore, I am afraid you are not likely to interpret the results in terms of variance explained which is possible in a regression with continuous outcome though the approaches are not free from criticism. However, given that is a different issue, for your logistic model odds ratios are your best friend. Further, you can test the odds/odds ratios of two covariates in your model to see if effect of one is significantly greater than the other. In order to be able to do this you need to use Stata's 'lincom' command. Type - help lincom -
        Roman

        Comment


        • #5
          Thanks Roman.
          Actually, I do have a continuous outcome too, as the categorical outcome is based on values of disease activity (continuous scale) !
          How, can I estimate variance explained using this outcome?
          Thankful for any help.
          Kind regards/ Jacob

          Comment

          Working...
          X