Skip to contents

Refines breakpoints within a segment using minor allele frequency (MAF) data. If enough informative MAF sites are present, the segment is binned and can be split into finer regions using either stepwise merging or CBS (circular binary segmentation). Optionally, PON-based bias correction is applied to the resulting segments.

Usage

SearchBreakpoint(
  seg_row,
  maf,
  pon_ref,
  gender,
  mergeai = 0.15,
  snpmin = 3,
  maxgap = 1e+06,
  snpnum = 20,
  maxbinsize = 1e+06,
  minbinsize = 5e+05,
  minsnpcov = 20,
  segmethod = "cbs",
  cbssmooth = "no"
)

Arguments

seg_row

Data frame row (list or tibble row) representing a single segment. Must have columns: Sample, Chromosome, Start, End, Num_Probes, Segment_Mean, Segment_Mean_raw, Count, Baseline_cov, gatk_gender, pipeline_gender, size.

maf

Data frame or tibble containing MAF data. Must include columns: Chromosome, Pos, maf.

pon_ref

Data frame. Panel of normal reference for bias correction (required for bias correction step).

gender

Character. If "female", the X chromosome will also be proceed.

mergeai

Numeric. Threshold for the difference in MAF (gmm_mean) between adjacent segments to allow merging under "merge" mode segmentation.

snpmin

Numeric. Minimum SNP count required for a segment to be considered as a separate segment under "merge" mode segmentation.

maxgap

Numeric. Maximum allowed gap between SNPs within a bin.

snpnum

Integer. Target number of SNPs per bin.

maxbinsize

Numeric. Maximum allowed bin size (bp).

minbinsize

Numeric. Minimum allowed bin size (bp). The minimum segment size under "merge" mode is 2*minbinsize.

minsnpcov

Integer. Minimum coverage of SNP sites to be included.

segmethod

Character. Segmentation method to use: if "merge", perform stepwise merging; if "cbs", perform CBS (circular binary segmentation).

cbssmooth

Character. If using the "cbs" segmentation method, set to "yes" to apply smoothing before segmentation, or "no" to skip smoothing.

Value

A data frame with the refined segment(s), including updated breakpoints, MAF metrics, and a BreakpointSource column indicating whether breakpoints were post-processed or from GATK.

Details

The function first bins the MAF data within the segment. If segmethod = "merge", segments are merged stepwise based on the MAF difference and SNP count. If segmethod = "cbs", CBS segmentation is performed on the binned MAF values, with optional smoothing. After segmentation, bias correction using the panel of normal can be applied. The function returns refined segments with updated metrics and a BreakpointSource label.