This function loads the input file as a data.table object. The file can be either local or remote, and can be either plain text or gzip-compressed. Furthermore, this function supports range-loading by providing a genomic range in the following syntax: "chr1:1-100".

read_bed(
  input = NULL,
  file_path = NULL,
  cmd = NULL,
  range = NULL,
  genome = NULL,
  use_gr = TRUE,
  ...
)

Arguments

file_path

Path to the data file. It can be either a local file, or a remote URL.

range

A genomic range character vector. Must follow standard genomic range notation format, e.g. chr1:1001-2000

genome

Specify the reference genome for the BED file. genome can be a valid genome name in GenomeInfoDb::Seqinfo(), e.g. GRCh37, or hs37-1kg, which is a genome shipped with this package, or any custom chromosome size files (local or remote). Here is a good resource for such files: https://github.com/igvteam/igv/tree/master/genomes/sizes.

use_gr

If TRUE, will read the data as a GenomicRanges object, otherwise a data.table object. Generally, we recommend using GenomicRanges.

...

Other arguments to be passed to data.table::fread().

compression

Indicate the compression type. If detect, this function will try to guess from file_path.

tabix_index

A character value indicating the location of the tabix index file. Can be either local or remote. If NULL, it will be derived from file_path.

download_index

Whether to download (cache) the tabix index at current directory.

sep

The separator between columns. By default, BED files are tab-delimited, and sep should be \t. However, sometimes you will encounter non-standard table files. In such cases, you need to specify the separator. If auto, read_bed will try to guess the separator. For more details, refer to data.table::fread().

Details

Note: for loading remote data files, currently this function depends on tabix.c 0.2.5, which doesn't not support HTTPS protocol. In the next step, I plan to turn to htslib, and the this function can load remote data files through HTTPS.

See also

Examples

bedtbl <- read_bed(system.file("extdata", "example_merge.bed", package = "bedtorch"))
head(bedtbl)
#> GRanges object with 6 ranges and 1 metadata column:
#>       seqnames    ranges strand |     score
#>          <Rle> <IRanges>  <Rle> | <integer>
#>   [1]       21       4-6      * |         8
#>   [2]       21     11-16      * |         5
#>   [3]       21     13-17      * |         4
#>   [4]       21     16-22      * |         5
#>   [5]       21     23-25      * |         7
#>   [6]       21     27-30      * |         7
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

# Basic usage
bedtbl <- read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"),
                  range = "1:3001-4000")
#> Error: tabix is required.
head(bedtbl)
#> GRanges object with 6 ranges and 1 metadata column:
#>       seqnames    ranges strand |     score
#>          <Rle> <IRanges>  <Rle> | <integer>
#>   [1]       21       4-6      * |         8
#>   [2]       21     11-16      * |         5
#>   [3]       21     13-17      * |         4
#>   [4]       21     16-22      * |         5
#>   [5]       21     23-25      * |         7
#>   [6]       21     27-30      * |         7
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths

# Specify the reference genome
head(read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"),
              range = "1:3001-4000",
              genome = "hs37-1kg"))
#> Error: tabix is required.

head(read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"),
              range = "1:3001-4000",
              genome = "GRCh37"))
#> Error: tabix is required.

head(read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"),
              range = "1:3001-4000",
              genome = "https://raw.githubusercontent.com/igvteam/igv/master/genomes/sizes/1kg_v37.chrom.sizes"))
#> Error: Unknown genome: https://raw.githubusercontent.com/igvteam/igv/master/genomes/sizes/1kg_v37.chrom.sizes

# Load remote BGZIP files with tabix index specified
head(read_bed("https://git.io/JYATB", range = "22:20000001-30000001", tabix_index = "https://git.io/JYAkT"))
#> Error: Range seeking is only supported for bgzip data files.