Title: | Flexible cFDR |
---|---|
Description: | Provides functions to implement the Flexible cFDR (Hutchinson et al. (2021) <doi:10.1371/journal.pgen.1009853>) and Binary cFDR (Hutchinson et al. (2021) <doi:10.1101/2021.10.21.465274>) methodologies to leverage auxiliary data from arbitrary distributions, for example functional genomic data, with GWAS p-values to generate re-weighted p-values. |
Authors: | Anna Hutchinson [aut, cre] , Chris Wallace [aut], Thomas Willis [ctb, aut], James Liley [ctb] |
Maintainer: | Anna Hutchinson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2025-01-03 05:12:36 UTC |
Source: | https://github.com/annahutch/fcfdr |
Perform cFDR leveraging binary auxiliary covariates
binary_cfdr(p, q, group)
binary_cfdr(p, q, group)
p |
p-values for principal trait (vector of length n) |
q |
binary auxiliary data values (vector of length n) |
group |
group membership of each SNP for leave-one-out procedure (vector of length n) (e.g. chromosome number or LD block) |
data.frame of p, q and v values
# In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the parameters_in_locfdr() function to extract the parameters estimated by # the locfdr function. # generate p set.seed(2) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q q <- rbinom(n, 1, 0.1) group <- c(rep("A", n/2), rep("B", n/2)) binary_cfdr(p, q, group)
# In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the parameters_in_locfdr() function to extract the parameters estimated by # the locfdr function. # generate p set.seed(2) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q q <- rbinom(n, 1, 0.1) group <- c(rep("A", n/2), rep("B", n/2)) binary_cfdr(p, q, group)
Violin plot of p-values for quantiles of q
corr_plot(p, q, ylim = c(0, 1.5))
corr_plot(p, q, ylim = c(0, 1.5))
p |
p values for principal trait (vector of length n) |
q |
auxiliary data values (vector of length n) |
ylim |
y-axis limits (-log10) |
Can be used to investigate the relationship between p and q
If this shows a non-monotonic relationship then the cFDR framework should not be used
(because e.g. cFDR cannot simultaneously shrink v-values for high p and low p)
ggplot object
# In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the corr_plot() function to visualise the relationship between p and q. # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q mixture_comp1 <- function(x) rnorm(x, mean = -0.5, sd = 0.5) mixture_comp2 <- function(x) rnorm(x, mean = 2, sd = 1) q <- c(mixture_comp1(n1p), mixture_comp2(n-n1p)) corr_plot(p, q)
# In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the corr_plot() function to visualise the relationship between p and q. # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q mixture_comp1 <- function(x) rnorm(x, mean = -0.5, sd = 0.5) mixture_comp2 <- function(x) rnorm(x, mean = 2, sd = 1) q <- c(mixture_comp1(n1p), mixture_comp2(n-n1p)) corr_plot(p, q)
Performs Flexible cFDR for continuous auxiliary covariates
flexible_cfdr( p, q, indep_index, res_p = 300, res_q = 500, nxbin = 1000, gridp = 50, splinecorr = TRUE, dist_thr = 0.5, locfdr_df = 10, plot = TRUE, maf = NULL, check_indep_cor = TRUE, enforce_p_q_cor = TRUE )
flexible_cfdr( p, q, indep_index, res_p = 300, res_q = 500, nxbin = 1000, gridp = 50, splinecorr = TRUE, dist_thr = 0.5, locfdr_df = 10, plot = TRUE, maf = NULL, check_indep_cor = TRUE, enforce_p_q_cor = TRUE )
p |
p-values for principal trait (vector of length n) |
q |
continuous auxiliary data values (vector of length n) |
indep_index |
indices of independent SNPs |
res_p |
number of grid points in x-direction ( |
res_q |
number of grid points in y-direction ( |
nxbin |
number of bins in x-direction ( |
gridp |
number of data points required in a KDE grid point for left-censoring |
splinecorr |
logical value for whether spline correction should be implemented |
dist_thr |
distance threshold for spline correction |
locfdr_df |
|
plot |
logical value for whether to produce plots to assess KDE fit |
maf |
minor allele frequencies for SNPs to which |
check_indep_cor |
check that sign of the correlation between |
enforce_p_q_cor |
if |
If maf
is specified, then the independent SNPs will be down-sampled to match the minor allele frequency distribution.
List of length two: (1) data.frame of p-values, q-values and v-values (2) data.frame of auxiliary data (q_low used for left censoring, how many data-points were left censored and/or spline corrected)
# this is a long running example # In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the flexible_cfdr() function to generate v-values using default parameter values. # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q mixture_comp1 <- function(x) rnorm(x, mean = -0.5, sd = 0.5) mixture_comp2 <- function(x) rnorm(x, mean = 2, sd = 1) q <- c(mixture_comp1(n1p), mixture_comp2(n-n1p)) n_indep <- n flexible_cfdr(p, q, indep_index = 1:n_indep)
# this is a long running example # In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the flexible_cfdr() function to generate v-values using default parameter values. # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q mixture_comp1 <- function(x) rnorm(x, mean = -0.5, sd = 0.5) mixture_comp2 <- function(x) rnorm(x, mean = 2, sd = 1) q <- c(mixture_comp1(n1p), mixture_comp2(n-n1p)) n_indep <- n flexible_cfdr(p, q, indep_index = 1:n_indep)
Plot -log10(p) against -log10(v) and colour by q
log10pv_plot(p, q, v, axis_lim = c(0, 20))
log10pv_plot(p, q, v, axis_lim = c(0, 20))
p |
p values for principal trait (vector of length n) |
q |
auxiliary data values (vector of length n) |
v |
v values from cFDR |
axis_lim |
Optional axis limits |
Can be used to visualise the results from Flexible cFDR
ggplot object
# this is a long running example # In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the flexible_cfdr() function to generate v-values and then the log10pv_plot() function # to visualise the results. # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q mixture_comp1 <- function(x) rnorm(x, mean = -0.5, sd = 0.5) mixture_comp2 <- function(x) rnorm(x, mean = 2, sd = 1) q <- c(mixture_comp1(n1p), mixture_comp2(n-n1p)) n_indep <- n res <- flexible_cfdr(p, q, indep_index = 1:n_indep) log10pv_plot(p = res[[1]]$p, q = res[[1]]$q, v = res[[1]]$v)
# this is a long running example # In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the flexible_cfdr() function to generate v-values and then the log10pv_plot() function # to visualise the results. # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q mixture_comp1 <- function(x) rnorm(x, mean = -0.5, sd = 0.5) mixture_comp2 <- function(x) rnorm(x, mean = 2, sd = 1) q <- c(mixture_comp1(n1p), mixture_comp2(n-n1p)) n_indep <- n res <- flexible_cfdr(p, q, indep_index = 1:n_indep) log10pv_plot(p = res[[1]]$p, q = res[[1]]$q, v = res[[1]]$v)
Matches MAF distribution of independent set of SNPs to MAF distribution of whole set of SNPs to avoid MAF-based confounding.
match_ind_maf(maf, indep_index)
match_ind_maf(maf, indep_index)
maf |
minor allele frequencies of (all) SNPs |
indep_index |
indices of independent SNPs |
Must supply maf values from the whole data set, not just the independent SNPs.
indices of independent SNP in chosen in sample
parameters_in_locfdr
parameters_in_locfdr( p, q, indep_index, res_p = 300, res_q = 500, maf = NULL, check_indep_cor = TRUE, enforce_p_q_cor = TRUE )
parameters_in_locfdr( p, q, indep_index, res_p = 300, res_q = 500, maf = NULL, check_indep_cor = TRUE, enforce_p_q_cor = TRUE )
p |
p values for principal trait (vector of length n) |
q |
continuous auxiliary data values (vector of length n) |
indep_index |
indices of independent SNPs |
res_p |
resolution for p |
res_q |
resolution for q |
maf |
minor allele frequencies for SNPs to which |
check_indep_cor |
check that sign of the correlation between |
enforce_p_q_cor |
if |
list of values used as input into locfdr::locfdr
function intrinsically in flexible_cfdr
# In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the parameters_in_locfdr() function to extract the parameters estimated by # the locfdr function. # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q mixture_comp1 <- function(x) rnorm(x, mean = -0.5, sd = 0.5) mixture_comp2 <- function(x) rnorm(x, mean = 2, sd = 1) q <- c(mixture_comp1(n1p), mixture_comp2(n-n1p)) n_indep <- n parameters_in_locfdr(p, q, indep_index = 1:n_indep)
# In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the parameters_in_locfdr() function to extract the parameters estimated by # the locfdr function. # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q mixture_comp1 <- function(x) rnorm(x, mean = -0.5, sd = 0.5) mixture_comp2 <- function(x) rnorm(x, mean = 2, sd = 1) q <- c(mixture_comp1(n1p), mixture_comp2(n-n1p)) n_indep <- n parameters_in_locfdr(p, q, indep_index = 1:n_indep)
Plot p against v and colour by q
pv_plot(p, q, v, axis_lim = c(0, 1))
pv_plot(p, q, v, axis_lim = c(0, 1))
p |
p values for principal trait (vector of length n) |
q |
auxiliary data values (vector of length n) |
v |
v values from cFDR |
axis_lim |
Optional axis limits |
Can be used to visualise the results from Flexible cFDR
ggplot object
# this is a long running example # In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the flexible_cfdr() function to generate v-values and then the pv_plot() function # to visualise the results. # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q mixture_comp1 <- function(x) rnorm(x, mean = -0.5, sd = 0.5) mixture_comp2 <- function(x) rnorm(x, mean = 2, sd = 1) q <- c(mixture_comp1(n1p), mixture_comp2(n-n1p)) n_indep <- n res <- flexible_cfdr(p, q, indep_index = 1:n_indep) pv_plot(p = res[[1]]$p, q = res[[1]]$q, v = res[[1]]$v)
# this is a long running example # In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing functional genomic data). # We use the flexible_cfdr() function to generate v-values and then the pv_plot() function # to visualise the results. # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q mixture_comp1 <- function(x) rnorm(x, mean = -0.5, sd = 0.5) mixture_comp2 <- function(x) rnorm(x, mean = 2, sd = 1) q <- c(mixture_comp1(n1p), mixture_comp2(n-n1p)) n_indep <- n res <- flexible_cfdr(p, q, indep_index = 1:n_indep) pv_plot(p = res[[1]]$p, q = res[[1]]$q, v = res[[1]]$v)
Stratified Q-Q plot.
stratified_qqplot( data_frame, prin_value_label, cond_value_label = NULL, thresholds = c(1, 0.1, 0.01, 0.001, 1e-04) )
stratified_qqplot( data_frame, prin_value_label, cond_value_label = NULL, thresholds = c(1, 0.1, 0.01, 0.001, 1e-04) )
data_frame |
|
prin_value_label |
label of principal p-value column in |
cond_value_label |
label of conditional trait column in |
thresholds |
threshold values to define strata |
Can be used to investigate the relationship between p and q
Note that this function does not do the heavy lifting of styling the plot's aesthetics.
ggplot object
# In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing GWAS p-values for a related trait). # We use the stratified_qqplot() function to examine the relationship between p and q # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q zq <- c(rnorm(n1p, sd=4), rnorm(n-n1p, sd=1.2)) q <- 2*pnorm(-abs(zq)) df <- data.frame(p, q) stratified_qqplot(data_frame = df, prin_value_label = "p", cond_value_label = "q")
# In this example, we generate some p-values (representing GWAS p-values) # and some arbitrary auxiliary data values (e.g. representing GWAS p-values for a related trait). # We use the stratified_qqplot() function to examine the relationship between p and q # generate p set.seed(1) n <- 1000 n1p <- 50 zp <- c(rnorm(n1p, sd=5), rnorm(n-n1p, sd=1)) p <- 2*pnorm(-abs(zp)) # generate q zq <- c(rnorm(n1p, sd=4), rnorm(n-n1p, sd=1.2)) q <- 2*pnorm(-abs(zq)) df <- data.frame(p, q) stratified_qqplot(data_frame = df, prin_value_label = "p", cond_value_label = "q")
A data.frame containing the rsID, chromosome (CHR19) and base pair position (BP19) in hg19, reference allele (REF), alternative allele (ALLT), type 1 diabetes GWAS p-value (T1D_pval), minor allele frequency (MAF), LDAK weight (LDAK_weight), rheumatoid arthritis GWAS p-value (RA_pval), binary regulatory factor binding site overlap (DGF), average H3K27ac fold change value in T1D-relevant cell types (H3K27ac) for 113,543 SNPs in the T1D GWAS (https://www.nature.com/articles/ng.3245)
T1D_application_data
T1D_application_data
A data frame with 113543 rows and 11 variables:
Minor allele frequencies estimated from the CEU sub-population samples in the 1000 Genomes Project Phase 3 data set. Missing values were replaced by drawing samples from the empirical distribution of MAFs