Skip to contents

Adjusts the minor allele frequency (MAF) values in each segment for systematic bias using a panel of normal (PoN) reference. For each segment, compares the segment MAF to the PoN MAF distribution, and applies a logit-based correction if the segment MAF is not significantly different from the PoN. If the segment MAF is significantly different, retains the original value.

Usage

CorrectBias(tmp_seg, pon_ref, tmp_maf)

Arguments

tmp_seg

Data frame or data.table. Segmented data to be corrected, must include columns: Chromosome, Start, End, Sample, Num_Probes, Segment_Mean, gatk_SM_raw, gatk_count, gatk_baselinecov, gatk_gender, pipeline_gender, MAF, MAF_Probes, MAF_gmm_G, MAF_gmm_weight, size.

pon_ref

Data frame or data.table. Panel of normal reference, must include columns: Chromosome, Start, End, pon_mafs (comma-separated string of PoN MAF values).

tmp_maf

Data frame or data.table. Per-SNP MAF data, must include columns: Pos, maf.

Value

A data frame with bias-corrected MAF values for each segment. The MAF column is updated with the corrected value, and columns gmm_mean_corr, each_mafs, and pon_mafs are removed.

Details

For each segment, the function performs an interval join with the PoN reference to obtain the PoN MAF distribution. If the segment MAF is not significantly different from the PoN (by Wilcoxon test or threshold), applies a logit-based correction. Otherwise, the original segment MAF is retained. The function uses the panel median MAF for centering and clamps corrected values to 0, 0.5.