Title: | Correcting the Coverage of Credible Sets from Bayesian Genetic Fine Mapping |
---|---|
Description: | Using a computationally efficient method, the package can be used to find the corrected coverage estimate of a credible set of putative causal variants from Bayesian genetic fine-mapping. The package can also be used to obtain a corrected credible set if required; that is, the smallest set of variants required such that the corrected coverage estimate of the resultant credible set is within some user defined accuracy of the desired coverage. Maller et al. (2012) <doi:10.1038/ng.2435>, Wakefield (2009) <doi:10.1002/gepi.20359>, Fortune and Wallace (2018) <doi:10.1093/bioinformatics/bty898>. |
Authors: | Anna Hutchinson [aut, cre], Chris Wallace [aut], Kevin Kunzmann [ctb] |
Maintainer: | Anna Hutchinson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.0 |
Built: | 2025-01-06 04:48:35 UTC |
Source: | https://github.com/annahutch/corrcoverage |
Does not include posterior probabilities for null model
.zj_abf(Zj, int.Sigma, int.nrep, int.ERR, int.r)
.zj_abf(Zj, int.Sigma, int.nrep, int.ERR, int.r)
Zj |
joint z vector |
int.Sigma |
internal sigma |
int.nrep |
internal nrep |
int.ERR |
internal error matrix |
int.r |
internal r |
Matrix of simulated ABFs, one simulation per row
Internal function: Simulate nrep posterior probabilities of causality from joint Z-score vector
.zj_pp(Zj, int.Sigma, int.nrep, int.ERR, int.r)
.zj_pp(Zj, int.Sigma, int.nrep, int.ERR, int.r)
Zj |
joint z vector |
int.Sigma |
internal sigma |
int.nrep |
internal nrep |
int.ERR |
internal error matrix |
int.r |
internal r |
Does not include posterior probabilities for null model
Matrix of simulated posterior probabilties of causality, one simulation per row
Wakefield's log asymptotic Bayes factor (lABF) with prior standard deviation of effect size as a parameter
approx.bf.p(pvals, f, type, N, s, W = 0.2)
approx.bf.p(pvals, f, type, N, s, W = 0.2)
pvals |
P-values |
f |
Minor allele frequencies |
type |
Type of experiment ('quant' or 'cc') |
N |
Total sample size |
s |
Proportion of cases (N1/N0+N1), ignored if type=='quant' |
W |
Prior for the standard deviation of the effect size parameter beta (W=0.2 default) |
([Wakefield et al. 2009](https://onlinelibrary.wiley.com/doi/abs/10.1002/gepi.20359) This function converts p-values to log ABFs, also reporting intermediate calculations
data.frame containing lABF and intermediate calculations
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) p_values <- 2 * pnorm( - abs ( z_scores ) ) ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) approx.bf.p(pvals = p_values, f = maf, type = "cc", N = N0+N1, s = N1/(N0+N1))
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) p_values <- 2 * pnorm( - abs ( z_scores ) ) ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) approx.bf.p(pvals = p_values, f = maf, type = "cc", N = N0+N1, s = N1/(N0+N1))
Calculate ABFs from Z scores
bf_func(z, V, W = 0.2)
bf_func(z, V, W = 0.2)
z |
Vector of Z-scores |
V |
Variance of the estimated effect size |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
Note, z and V should both be vectors or both be matrices
ABFs
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) varbeta = Var.data.cc(f = maf, N = N0 + N1, s = 0.5) bf_func(z_scores, V = varbeta)
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) varbeta = Var.data.cc(f = maf, N = N0 + N1, s = 0.5) bf_func(z_scores, V = varbeta)
Correlation matrix of SNPs
cor2(x)
cor2(x)
x |
Phased haplotype matrix, rows as samples and columns as SNPs |
Quick function to find a correlation matrix
Correlation matrix
Chris Wallace
Corrected coverage estimate using Z-scores and mafs
corrcov(z, f, N0, N1, Sigma, thr, W = 0.2, nrep = 1000, pp0min = 0.001)
corrcov(z, f, N0, N1, Sigma, thr, W = 0.2, nrep = 1000, pp0min = 0.001)
z |
Marginal Z-scores |
f |
Minor allele frequencies |
N0 |
Number of controls |
N1 |
Number of cases |
Sigma |
SNP correlation matrix |
thr |
Minimum threshold for fine-mapping experiment |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
nrep |
The number of simulated posterior probability systems to consider for the corrected coverage estimate (default 1000) |
pp0min |
Only average over SNPs with pp0 > pp0min |
This function only requires the marginal summary statistics from GWAS
Corrected coverage estimate
Anna Hutchinson
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) # simulate a vector of Z-scores ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) corrcov(z = z_scores, f = maf, N0, N1, Sigma = LD, thr = 0.95)
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) # simulate a vector of Z-scores ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) corrcov(z = z_scores, f = maf, N0, N1, Sigma = LD, thr = 0.95)
Corrected coverage estimate using estimated effect sizes and their standard errors
corrcov_bhat(bhat, V, N0, N1, Sigma, thr, W = 0.2, nrep = 1000, pp0min = 0.001)
corrcov_bhat(bhat, V, N0, N1, Sigma, thr, W = 0.2, nrep = 1000, pp0min = 0.001)
bhat |
Estimated effect sizes from single-SNP logistic regressions |
V |
Variance of estimated effect sizes |
N0 |
Number of controls |
N1 |
Number of cases |
Sigma |
SNP correlation matrix |
thr |
Minimum threshold for fine-mapping experiment |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
nrep |
The number of simulated posterior probability systems to consider for the corrected coverage estimate (default 1000) |
pp0min |
Only average over SNPs with pp0 > pp0min |
This function only requires the marginal summary statistics from GWAS
Corrected coverage estimate
Anna Hutchinson
set.seed(1) nsnps <- 100 N0 <- 1000 # number of controls N1 <- 1000 # number of cases ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1)) bhats = rnorm(nsnps, 0, 0.2) # log OR corrcov_bhat(bhat = bhats, V = varbeta, N0, N1, Sigma = LD, thr = 0.95)
set.seed(1) nsnps <- 100 N0 <- 1000 # number of controls N1 <- 1000 # number of cases ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1)) bhats = rnorm(nsnps, 0, 0.2) # log OR corrcov_bhat(bhat = bhats, V = varbeta, N0, N1, Sigma = LD, thr = 0.95)
Obtain confidence interval for corrected coverage estimate using Z-scores and mafs
corrcov_CI(z, f, N0, N1, Sigma, thr, W = 0.2, nrep = 1000, CI = 0.95, pp0min = 0.001)
corrcov_CI(z, f, N0, N1, Sigma, thr, W = 0.2, nrep = 1000, CI = 0.95, pp0min = 0.001)
z |
Marginal Z-scores |
f |
Minor allele frequencies |
N0 |
Number of controls |
N1 |
Number of cases |
Sigma |
SNP correlation matrix |
thr |
Minimum threshold for fine-mapping experiment |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
nrep |
The number of simulated posterior probability systems to consider for the corrected coverage estimate (nrep = 1000 default) |
CI |
The size of the confidence interval (as a decimal) |
pp0min |
Only average over SNPs with pp0 > pp0min |
CI for corrected coverage estimate
Anna Hutchinson
# this is a long running example set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) # simulate a vector of Z-scores ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) corrcov_CI(z = z_scores, f = maf, N0, N1, Sigma = LD, thr = 0.95)
# this is a long running example set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) # simulate a vector of Z-scores ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) corrcov_CI(z = z_scores, f = maf, N0, N1, Sigma = LD, thr = 0.95)
Obtain confidence interval for corrected coverage estimate using estimated effect sizes and their standard errors
corrcov_CI_bhat(bhat, V, N0, N1, Sigma, thr, W = 0.2, nrep = 1000, CI = 0.95, pp0min = 0.001)
corrcov_CI_bhat(bhat, V, N0, N1, Sigma, thr, W = 0.2, nrep = 1000, CI = 0.95, pp0min = 0.001)
bhat |
Estimated effect sizes from single-SNP logistic regressions |
V |
Variance of estimated effect sizes |
N0 |
Number of controls |
N1 |
Number of cases |
Sigma |
SNP correlation matrix |
thr |
Minimum threshold for fine-mapping experiment |
W |
Prior for the standard deviation of the effect size parameter beta |
nrep |
The number of simulated posterior probability systems to consider for the corrected coverage estimate (nrep = 1000 default) |
CI |
The size of the confidence interval (as a decimal) |
pp0min |
Only average over SNPs with pp0 > pp0min |
CI for corrected coverage estimate
Anna Hutchinson
# this is a long running example set.seed(1) nsnps <- 100 N0 <- 5000 # number of controls N1 <- 5000 # number of cases ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1)) bhats = rnorm(nsnps,0,0.2) # log OR corrcov_CI_bhat(bhat = bhats, V = varbeta, N0, N1, Sigma = LD)
# this is a long running example set.seed(1) nsnps <- 100 N0 <- 5000 # number of controls N1 <- 5000 # number of cases ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1)) bhats = rnorm(nsnps,0,0.2) # log OR corrcov_CI_bhat(bhat = bhats, V = varbeta, N0, N1, Sigma = LD)
Obtain corrected coverage estimate using Z-scores and mafs (limiting simulations used for estimation to those with correct nvar)
corrcov_nvar(z, f, N0, N1, Sigma, nvar, thr, W = 0.2, nrep = 10000, pp0min = 0.001)
corrcov_nvar(z, f, N0, N1, Sigma, nvar, thr, W = 0.2, nrep = 10000, pp0min = 0.001)
z |
Marginal Z-scores |
f |
Minor allele frequencies |
N0 |
Number of controls |
N1 |
Number of cases |
Sigma |
SNP correlation matrix |
nvar |
The number of variants that simulated credible sets used for estimation should contain |
thr |
Minimum threshold for fine-mapping experiment |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
nrep |
The number of simulated posterior probability systems to consider for the corrected coverage estimate (nrep = 10000 default due to trimming) |
pp0min |
Only average over SNPs with pp0 > pp0min |
This function requires the marginal summary statistics from GWAS and an nvar value. It should only be used when nvar is very low (<3) and there is some evidence to suggest that only simulated credible sets with this nvar value should be used to derive the corrected coverage estimate.
Corrected coverage estimate
Anna Hutchinson
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) # simulate a vector of Z-scores ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) corrcov_nvar(z = z_scores, f = maf, N0, N1, Sigma = LD, thr = 0.95, nvar = 1, nrep = 100) # note that nrep should be at least the default value (nrep = 10000) but is # lower here for speed of computation
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) # simulate a vector of Z-scores ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) corrcov_nvar(z = z_scores, f = maf, N0, N1, Sigma = LD, thr = 0.95, nvar = 1, nrep = 100) # note that nrep should be at least the default value (nrep = 10000) but is # lower here for speed of computation
Obtain corrected coverage estimate using estimated effect sizes and their standard errors (limiting simulations used for estimation to those with correct nvar)
corrcov_nvar_bhat(bhat, V, N0, N1, Sigma, nvar, thr, W = 0.2, nrep = 10000, pp0min = 0.001)
corrcov_nvar_bhat(bhat, V, N0, N1, Sigma, nvar, thr, W = 0.2, nrep = 10000, pp0min = 0.001)
bhat |
Estimated effect sizes from single-SNP logistic regressions |
V |
Variance of estimated effect sizes |
N0 |
Number of controls |
N1 |
Number of cases |
Sigma |
SNP correlation matrix |
nvar |
The number of variants that simulated credible sets used for estimation should contain |
thr |
Minimum threshold for fine-mapping experiment |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
nrep |
The number of simulated posterior probability systems to consider for the corrected coverage estimate (nrep = 10000 default due to trimming) |
pp0min |
Only average over SNPs with pp0 > pp0min |
This function requires the marginal summary statistics from GWAS and an nvar value. It should only be used when nvar is very low ($<3$) and there is some evidence to suggest that only simulated credible sets with this nvar value should be used to derive the corrected coverage estimate.
Corrected coverage estimate
Anna Hutchinson
set.seed(1) nsnps <- 100 N0 <- 5000 # number of controls N1 <- 5000 # number of cases ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1)) bhats = rnorm(nsnps,0,0.2) # log OR corrcov_nvar_bhat(bhat = bhats, V = varbeta, N0, N1, Sigma = LD, thr = 0.95, nvar = 1, nrep = 1000) # note that nrep should be at least the default value (nrep = 10000) but is # lower here for speed of computation
set.seed(1) nsnps <- 100 N0 <- 5000 # number of controls N1 <- 5000 # number of cases ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1)) bhats = rnorm(nsnps,0,0.2) # log OR corrcov_nvar_bhat(bhat = bhats, V = varbeta, N0, N1, Sigma = LD, thr = 0.95, nvar = 1, nrep = 1000) # note that nrep should be at least the default value (nrep = 10000) but is # lower here for speed of computation
Corrected coverage estimate of the causal variant in the credible set
corrected_cov(pp0, mu, V, Sigma, thr, W = 0.2, nrep = 1000, pp0min = 0.001)
corrected_cov(pp0, mu, V, Sigma, thr, W = 0.2, nrep = 1000, pp0min = 0.001)
pp0 |
Posterior probabilities of SNPs |
mu |
The true effect at the CV (estimate using corrcoverage::est_mu function) |
V |
Variance of the estimated effect size (can be obtained using coloc::Var.beta.cc function) |
Sigma |
SNP correlation matrix |
thr |
Minimum threshold for fine-mapping experiment |
W |
Prior for the standard deviation of the effect size parameter, beta (W=0.2 default) |
nrep |
Number of posterior probability systems to simulate for each variant considered causal (nrep = 1000 default) |
pp0min |
Only average over SNPs with pp0 > pp0min |
Requires an estimate of the true effect at the CV (e.g. use maximum absolute z-score or output from corrcoverage::est_mu function)
Corrected coverage estimate
Anna Hutchinson
set.seed(1) nsnps <- 100 N0 <- 5000 N1 <- 5000 ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) ## generate V (variance of estimated effect sizes) varbeta <- Var.data.cc(f = maf, N = 5000, s = 0.5) pp <- rnorm(nsnps, 0.2, 0.05) pp <- pp/sum(pp) corrected_cov(pp0 = pp, mu = 4, V = varbeta, Sigma = LD, thr = 0.95, nrep = 100)
set.seed(1) nsnps <- 100 N0 <- 5000 N1 <- 5000 ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) ## generate V (variance of estimated effect sizes) varbeta <- Var.data.cc(f = maf, N = 5000, s = 0.5) pp <- rnorm(nsnps, 0.2, 0.05) pp <- pp/sum(pp) corrected_cov(pp0 = pp, mu = 4, V = varbeta, Sigma = LD, thr = 0.95, nrep = 100)
Corrected credible set using Z-scores and MAFs
corrected_cs(z, f, N0, N1, Sigma, W = 0.2, lower = 0, upper = 1, desired.cov, acc = 0.005, max.iter = 20, pp0min = 0.001)
corrected_cs(z, f, N0, N1, Sigma, W = 0.2, lower = 0, upper = 1, desired.cov, acc = 0.005, max.iter = 20, pp0min = 0.001)
z |
Z-scores |
f |
Minor allele frequencies |
N0 |
Number of controls |
N1 |
Number of cases |
Sigma |
Correlation matrix of SNPs |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
lower |
Lower threshold (default = 0) |
upper |
Upper threshold (default = 1) |
desired.cov |
The desired coverage of the causal variant in the credible set |
acc |
Accuracy of corrected coverage to desired coverage (default = 0.005) |
max.iter |
Maximum iterations (default = 20) |
pp0min |
Only average over SNPs with pp0 > pp0min |
List of variants in credible set, required threshold, the corrected coverage and the size of the credible set
Anna Hutchinson
# this is a long running example # In this example, the function is used to find a corrected 95% credible set # using Z-scores and MAFs, that is the smallest set of variants # required such that the resultant credible set has coverage close to (/within # some accuracy of) the "desired coverage" (here set to 0.95). Max.iter parameter # defines the maximum number of iterations to try in the root bisection algorithm, # this should be increased to ensure convergence to the desired coverage, but is set # to 1 here for speed (and thus the resultant credible set will not be accurate). set.seed(2) nsnps = 200 N0 = 1000 N1 = 1000 z_scores <- rnorm(nsnps, 0, 1) # simulate a vector of Z-scores ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) names(z_scores) <- seq(1,length(z_scores)) corrected_cs(z = z_scores, f = maf, N0, N1, Sigma = LD, desired.cov = 0.9, max.iter = 1) # max.iter set low for speed, should be set to at least # the default to ensure convergence to desired coverage
# this is a long running example # In this example, the function is used to find a corrected 95% credible set # using Z-scores and MAFs, that is the smallest set of variants # required such that the resultant credible set has coverage close to (/within # some accuracy of) the "desired coverage" (here set to 0.95). Max.iter parameter # defines the maximum number of iterations to try in the root bisection algorithm, # this should be increased to ensure convergence to the desired coverage, but is set # to 1 here for speed (and thus the resultant credible set will not be accurate). set.seed(2) nsnps = 200 N0 = 1000 N1 = 1000 z_scores <- rnorm(nsnps, 0, 1) # simulate a vector of Z-scores ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) names(z_scores) <- seq(1,length(z_scores)) corrected_cs(z = z_scores, f = maf, N0, N1, Sigma = LD, desired.cov = 0.9, max.iter = 1) # max.iter set low for speed, should be set to at least # the default to ensure convergence to desired coverage
Corrected credible set using estimated effect sizes and their standard errors
corrected_cs_bhat(bhat, V, N0, N1, Sigma, W = 0.2, lower = 0, upper = 1, desired.cov, acc = 0.005, max.iter = 20, pp0min = 0.001)
corrected_cs_bhat(bhat, V, N0, N1, Sigma, W = 0.2, lower = 0, upper = 1, desired.cov, acc = 0.005, max.iter = 20, pp0min = 0.001)
bhat |
Estimated effect sizes |
V |
Prior variance of estimated effect sizes |
N0 |
Number of controls |
N1 |
Number of cases |
Sigma |
Correlation matrix of SNPs |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
lower |
Lower threshold (default = 0) |
upper |
Upper threshold (default = 1) |
desired.cov |
The desired coverage of the causal variant in the credible set |
acc |
Accuracy of corrected coverage to desired coverage (default = 0.005) |
max.iter |
Maximum iterations (default = 20) |
pp0min |
Only average over SNPs with pp0 > pp0min |
List of variants in credible set, required threshold, the corrected coverage and the size of the credible set
Anna Hutchinson
# this is a long running example # In this example, the function is used to find a corrected 95% credible set # using bhats and their standard errors, that is the smallest set of variants # required such that the resultant credible set has coverage close to (/within # some accuracy of) the "desired coverage" (here set to 0.95). Max.iter parameter # defines the maximum number of iterations to try in the root bisection algorithm, # this should be increased to ensure convergence to the desired coverage, but is set # to 1 here for speed (and thus the resultant credible set will not be accurate). set.seed(18) nsnps <- 100 N0 <- 500 # number of controls N1 <- 500 # number of cases # simulate fake haplotypes to obtain MAFs and LD matrix ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1)) bhats = rnorm(nsnps,0,0.2) # log OR names(bhats) <- seq(1,length(bhats)) corrected_cs_bhat(bhat = bhats, V = varbeta, N0, N1, Sigma = LD, desired.cov = 0.9, max.iter = 1) # max.iter set low for speed, should be set to at # least the default to ensure convergence to desired coverage
# this is a long running example # In this example, the function is used to find a corrected 95% credible set # using bhats and their standard errors, that is the smallest set of variants # required such that the resultant credible set has coverage close to (/within # some accuracy of) the "desired coverage" (here set to 0.95). Max.iter parameter # defines the maximum number of iterations to try in the root bisection algorithm, # this should be increased to ensure convergence to the desired coverage, but is set # to 1 here for speed (and thus the resultant credible set will not be accurate). set.seed(18) nsnps <- 100 N0 <- 500 # number of controls N1 <- 500 # number of cases # simulate fake haplotypes to obtain MAFs and LD matrix ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1)) bhats = rnorm(nsnps,0,0.2) # log OR names(bhats) <- seq(1,length(bhats)) corrected_cs_bhat(bhat = bhats, V = varbeta, N0, N1, Sigma = LD, desired.cov = 0.9, max.iter = 1) # max.iter set low for speed, should be set to at # least the default to ensure convergence to desired coverage
Credible set of putative causal variants
credset(pp, CV, thr)
credset(pp, CV, thr)
pp |
Vector of posterior probabilities of causality |
CV |
Optional parameter: Index of CV |
thr |
Minimum threshold for credible set size |
If the CV parameter is supplied (index of causal variant) then the output includes a binary indicator of whether the CV is contained in the set
list of the variants in the credible set, the claimed.cov (cumulative sum of the posterior probabilities of the variants forming the credible set), binary covered indicator (1 if CV is contained in the credible set) and nvar (number of variants in the set)
Anna Hutchinson
set.seed(1) nsnps <- 100 pp <- rnorm(nsnps, 0.3, 0.05) pp <- pp/sum(pp) credset(pp, thr = 0.9) iCV <- 71 credset(pp, CV = iCV, thr = 0.9)
set.seed(1) nsnps <- 100 pp <- rnorm(nsnps, 0.3, 0.05) pp <- pp/sum(pp) credset(pp, thr = 0.9) iCV <- 71 credset(pp, CV = iCV, thr = 0.9)
Quick credset function for matrix of posterior probabilities (using RCpp)
credsetC(pp, CV, thr)
credsetC(pp, CV, thr)
pp |
Matrix of posterior probabilities of causality (one row per system) |
CV |
Vector of CV indices (one per system/row) |
thr |
Minimum threshold for credible set size |
Data.frame of claimed coverage (sum of posterior probabilities of variants in the set), binary covered indicator and number of variants (nvar).
set.seed(1) nsnps <- 100 # simulate matrix of posterior probabilities # 1 simulation per row pp <- matrix(rnorm(nsnps*100, 0.3, 0.05), ncol = nsnps) pp <- pp/rowSums(pp) iCV <- rep(71, times = dim(pp)[1]) credsetC(pp, CV = iCV, thr = 0.9)
set.seed(1) nsnps <- 100 # simulate matrix of posterior probabilities # 1 simulation per row pp <- matrix(rnorm(nsnps*100, 0.3, 0.05), ncol = nsnps) pp <- pp/rowSums(pp) iCV <- rep(71, times = dim(pp)[1]) credsetC(pp, CV = iCV, thr = 0.9)
Obtain credible sets from a matrix of posterior probabilities
credsetmat(pp, iCV, threshold)
credsetmat(pp, iCV, threshold)
pp |
Matrix of posterior probabilities (one row for each simulation) |
iCV |
A vector of the indices of the CV |
threshold |
The threshold to use to generate the credible set |
Estimate the true effect at the causal variant using Z-scores and MAFs
est_mu(z, f, N0, N1, W = 0.2)
est_mu(z, f, N0, N1, W = 0.2)
z |
Vector of marginal Z-scores |
f |
Minor allele frequencies |
N0 |
Number of controls |
N1 |
Number of cases |
W |
Prior for the standard deviation of the effect size parameter, beta, default 0.2 |
Estimate of the true effect at the causal variant
Anna Hutchinson
nsnps <- 100 z_scores <- rnorm(nsnps, 0, 3) # simulate a vector of Z-scores N0 <- 5000 # number of controls N1 <- 5000 # number of cases maf <- runif(nsnps, 0.05, 0.5) est_mu(z = z_scores, f = maf, N0 = N0, N1 = N1)
nsnps <- 100 z_scores <- rnorm(nsnps, 0, 3) # simulate a vector of Z-scores N0 <- 5000 # number of controls N1 <- 5000 # number of cases maf <- runif(nsnps, 0.05, 0.5) est_mu(z = z_scores, f = maf, N0 = N0, N1 = N1)
Estimate the true effect at the causal variant using estimated effect sizes and their standard errors
est_mu_bhat(bhat, V, N0, N1, p1 = 1e-04, W = 0.2)
est_mu_bhat(bhat, V, N0, N1, p1 = 1e-04, W = 0.2)
bhat |
Vector of estimated effect sizes |
V |
Prior variance for estimated effect sizes |
N0 |
Number of controls |
N1 |
Number of cases |
p1 |
Prior probability a SNP is associated with the trait, default 1e-4 |
W |
Prior for the standard deviation of the effect size parameter, beta |
Estimate of the true effect at the causal variant
Anna Hutchinson
nsnps <- 100 N0 <- 5000 # number of controls N1 <- 5000 # number of cases maf <- runif(nsnps, 0.05, 0.3) varbeta <- Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1)) bhats = rnorm(nsnps,0,0.2) # log(OR) est_mu_bhat(bhat = bhats, V = varbeta, N0 = N0, N1 = N1)
nsnps <- 100 N0 <- 5000 # number of controls N1 <- 5000 # number of cases maf <- runif(nsnps, 0.05, 0.3) varbeta <- Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1)) bhats = rnorm(nsnps,0,0.2) # log(OR) est_mu_bhat(bhat = bhats, V = varbeta, N0 = N0, N1 = N1)
Internal function, logsum
logsum(x)
logsum(x)
x |
numeric vector |
This function calculates the log of the sum of the exponentiated logs taking out the max, i.e. insuring that the sum is not Inf
max(x) + log(sum(exp(x - max(x))))
Chris Wallace
matrix-ified version of logsum to avoid needing apply()
logsum_matrix(x)
logsum_matrix(x)
x |
numeric matrix |
rowwise sums
Chris Wallace
Posterior probabilities of causality from marginal Z-scores
ppfunc(z, V, W = 0.2)
ppfunc(z, V, W = 0.2)
z |
Vector of marginal Z-scores |
V |
Variance of the estimated effect size (can be obtained using Var.beta.cc function) |
W |
Prior for the standard deviation of the effect size parameter, beta (W = 0.2 default) |
This function converts Z-scores to posterior probabilities of causality i.e. not including the null model of no genetic effects, so that the sum of the posterior probabilities for all variants is 1
Vector of posterior probabilities
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0+N1, s = N1/(N0+N1)) res <- ppfunc(z = z_scores, V = varbeta) sum(res) res
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0+N1, s = N1/(N0+N1)) res <- ppfunc(z = z_scores, V = varbeta) sum(res) res
Posterior probabilities of causality from matrix of marginal Z-scores (1 simulation per row)
ppfunc.mat(zstar, V, W = 0.2)
ppfunc.mat(zstar, V, W = 0.2)
zstar |
Matrix of marginal z-scores, one replicate per row |
V |
Variance of the estimated effect size, one element per column of zstar |
W |
Prior for the standard deviation of the effect size parameter, beta |
This function converts a matrix of Z-scores (one row per simulation) to posterior probabilities of causality, not including the null model of no genetic effects, so that the sum of the posterior probabilities for each simulation (each row) is 1.
Matrix of posterior probabilities of causality
Chris Wallace
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0+N1, s = N1/(N0+N1)) # simulate matrix of Z scores # 1 simulation per row z_scores <- matrix(rnorm(nsnps*100, 0, 3), ncol = nsnps) # each row is a vector of simulated PPs res <- ppfunc.mat(zstar = z_scores, V = varbeta) rowSums(res)
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) varbeta <- Var.data.cc(f = maf, N = N0+N1, s = N1/(N0+N1)) # simulate matrix of Z scores # 1 simulation per row z_scores <- matrix(rnorm(nsnps*100, 0, 3), ncol = nsnps) # each row is a vector of simulated PPs res <- ppfunc.mat(zstar = z_scores, V = varbeta) rowSums(res)
Proportion of simulated credible sets containing the causal variant
prop_cov(x)
prop_cov(x)
x |
data.frame with a binary 'covered' column |
Proportion of x with x = 1
Anna Hutchinson
Posterior probabilities of causality from P-values
pvals_pp(pvals, f, type, N, s, W = 0.2, p1 = 1e-04)
pvals_pp(pvals, f, type, N, s, W = 0.2, p1 = 1e-04)
pvals |
P-values of SNPs |
f |
Minor allele frequencies |
type |
Type of experiment ('quant' or 'cc') |
N |
Total sample size |
s |
Proportion of cases (N1/N0+N1), ignored if type=='quant' |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
p1 |
Prior probability a SNP is associated with the trait (default 1e-4) |
This function converts p-values to posterior probabilities of causality, including the null model of no genetic effect
Posterior probabilities of null model (no genetic effect) and causality for each SNP
Anna Hutchinson
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) p_values <- 2 * pnorm( - abs ( z_scores ) ) ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) res <- pvals_pp(pvals = p_values, f = maf, type = "cc", N = N0+N1, s = N1/(N0+N1)) sum(res) res
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) p_values <- 2 * pnorm( - abs ( z_scores ) ) ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) res <- pvals_pp(pvals = p_values, f = maf, type = "cc", N = N0+N1, s = N1/(N0+N1)) sum(res) res
Variance of the estimated effect size for case-control data
Var.data.cc(f, N, s)
Var.data.cc(f, N, s)
f |
Minor allele frequencies |
N |
Total sample size (N0+N1) |
s |
Proportion of cases (N1/N0+N1) |
Variance of estimated effect size , V.
Chris Wallace
maf = runif(100, 0.05, 0.5) N0 = 5000 # number of controls N1 = 5000 # number of cases Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1))
maf = runif(100, 0.05, 0.5) N0 = 5000 # number of controls N1 = 5000 # number of cases Var.data.cc(f = maf, N = N0 + N1, s = N1/(N0+N1))
Simulate marginal z-scores () from the joint z-scores (
) using
and
z_sim(Zj, Sigma, nrep)
z_sim(Zj, Sigma, nrep)
Zj |
Vector of joint Z-scores (a vector of 0s except at the CV) |
Sigma |
SNP correlation matrix |
nrep |
Number of Z-score systems to simulate |
Matrix of simulated posterior probabilties, one simulation per row
Anna Hutchinson
set.seed(1) nsnps <- 100 # derive joint Z score vector Zj <- rep(0, nsnps) iCV <- 4 # index of CV mu <- 5 # true effect at CV Zj[iCV] <- mu ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) res <- z_sim(Zj, Sigma = LD, nrep = 100) res[c(1:5), c(1:5)]
set.seed(1) nsnps <- 100 # derive joint Z score vector Zj <- rep(0, nsnps) iCV <- 4 # index of CV mu <- 5 # true effect at CV Zj[iCV] <- mu ## generate example LD matrix library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) res <- z_sim(Zj, Sigma = LD, nrep = 100) res[c(1:5), c(1:5)]
Posterior probabilities of causality from marginal Z-scores with prior SD as a parameter
z0_pp(z, f, type, N, s, W = 0.2, p1 = 1e-04)
z0_pp(z, f, type, N, s, W = 0.2, p1 = 1e-04)
z |
Marginal Z-scores of SNPs |
f |
Minor allele frequencies |
type |
Type of experiment ('quant' or 'cc') |
N |
Total sample size |
s |
Proportion of cases (N1/N0+N1), ignored if type=='quant' |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
p1 |
Prior probability a SNP is associated with the trait (default 1e-4) |
Converts Z-scores to posterior probabilities of causality, including the null model of no genetic effects
Posterior probabilities of null model (no genetic effect) and causality for each SNP
Anna Hutchinson
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) res <- z0_pp(z = z_scores, f = maf, type = "cc", N = N0+N1, s = N1/(N0+N1)) sum(res) res
set.seed(1) nsnps = 100 N0 = 5000 N1 = 5000 z_scores <- rnorm(nsnps, 0, 3) ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) maf <- colMeans(X) res <- z0_pp(z = z_scores, f = maf, type = "cc", N = N0+N1, s = N1/(N0+N1)) sum(res) res
Simulate nrep marginal Z-scores from joint Z-scores and convert these to posterior probabilities of causality
zj_pp(Zj, V, nrep = 1000, W = 0.2, Sigma)
zj_pp(Zj, V, nrep = 1000, W = 0.2, Sigma)
Zj |
Vector of joint Z-scores (0s except at CV) |
V |
Variance of the estimated effect size (can be obtained using Var.beta.cc function) |
nrep |
Number of posterior probability systems to simulate (default 1000) |
W |
Prior for the standard deviation of the effect size parameter, beta (default 0.2) |
Sigma |
SNP correlation matrix |
Does not include posterior probabilities for null model
Matrix of simulated posterior probabilties, one simulation per row
Anna Hutchinson
set.seed(1) nsnps <- 100 Zj <- rep(0, nsnps) iCV <- 4 # index of CV mu <- 5 # true effect at CV Zj[iCV] <- mu ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) ## generate V (variance of estimated effect sizes) varbeta <- Var.data.cc(f = maf, N = 5000, s = 0.5) res <- zj_pp(Zj, V = varbeta, nrep = 5, W = 0.2, Sigma = LD) res[c(1:5), c(1:5)]
set.seed(1) nsnps <- 100 Zj <- rep(0, nsnps) iCV <- 4 # index of CV mu <- 5 # true effect at CV Zj[iCV] <- mu ## generate example LD matrix and MAFs library(mvtnorm) nsamples = 1000 simx <- function(nsnps, nsamples, S, maf=0.1) { mu <- rep(0,nsnps) rawvars <- rmvnorm(n=nsamples, mean=mu, sigma=S) pvars <- pnorm(rawvars) x <- qbinom(1-pvars, 1, maf) } S <- (1 - (abs(outer(1:nsnps,1:nsnps,`-`))/nsnps))^4 X <- simx(nsnps,nsamples,S) LD <- cor2(X) maf <- colMeans(X) ## generate V (variance of estimated effect sizes) varbeta <- Var.data.cc(f = maf, N = 5000, s = 0.5) res <- zj_pp(Zj, V = varbeta, nrep = 5, W = 0.2, Sigma = LD) res[c(1:5), c(1:5)]
Obtain pp from a matrix of Zj and ERR
zj_pp_arma(Zj, sigma, nrep, ERR, r)
zj_pp_arma(Zj, sigma, nrep, ERR, r)
Zj |
vector of joint Z scores |
sigma |
SNP correlation matrix |
nrep |
Number of simulations |
ERR |
Error matrix |
r |
Shrinkage values |
pp Matrix of posterior probabilities (one row for each simulation)