This operation is similar to bedtools merge.

merge_bed(x, max_dist = 0, operation = NULL)

Arguments

x

A GRanges object.

max_dist

Maximum distance between features allowed for features to be merged. Default is 0. That is, overlapping and/or book-ended features are merged.

operation

Specify what operations should be applied to merged intervals. Default is NULL, i.e. do not apply any operation and only return the first three columns (chrom, start, end). Must be a list in the following format: list(col1_save = list(on = col1, func = func1), col2_save = list(on = col2, func = func2), ...), where col1 and col2 are column names to apply merge functions, col1_save and col2_save are columns to save the results, and func1 and func2 are univariate merge functions. For example: list(max_score = list(on = "score", func = max)) means for all intervals merged into one group, take the sum of the score column, and save to max_score. Similar to bedtools merge's -c and -o arguments.

Value

A GRanges object containing merged intervals.

References

Manual page of bedtools merge: https://bedtools.readthedocs.io/en/latest/content/tools/merge.html

Examples

bedtbl <- read_bed(system.file("extdata", "example_merge.bed", package = "bedtorch"))
merged <- merge_bed(bedtbl)
head(merged)
#> GRanges object with 6 ranges and 0 metadata columns:
#>       seqnames    ranges strand
#>          <Rle> <IRanges>  <Rle>
#>   [1]       21       4-6      *
#>   [2]       21     11-25      *
#>   [3]       21     27-35      *
#>   [4]       21     37-50      *
#>   [5]       21     54-56      *
#>   [6]       22     28-41      *
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

merged <- merge_bed(bedtbl, max_dist = 10, 
          operation = list(score1 = list(on = "score", func = mean),
                           score2 = list(on = "score", func = sum)))
head(merged)
#> GRanges object with 2 ranges and 2 metadata columns:
#>       seqnames    ranges strand |    score1    score2
#>          <Rle> <IRanges>  <Rle> | <numeric> <integer>
#>   [1]       21      4-56      * |       5.3        53
#>   [2]       22     28-72      * |       6.3        63
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths