This function loads the input file as a data.table
object. The file can be
either local or remote, and can be either plain text or gzip-compressed.
Furthermore, this function supports range-loading by providing a genomic
range in the following syntax: "chr1:1-100".
read_bed( input = NULL, file_path = NULL, cmd = NULL, range = NULL, genome = NULL, use_gr = TRUE, ... )
file_path | Path to the data file. It can be either a local file, or a remote URL. |
---|---|
range | A genomic range character vector. Must follow standard genomic range notation format, e.g. chr1:1001-2000 |
genome | Specify the reference genome for the BED file. |
use_gr | If |
... | Other arguments to be passed to |
compression | Indicate the compression type. If |
tabix_index | A character value indicating the location of the tabix
index file. Can be either local or remote. If |
download_index | Whether to download (cache) the tabix index at current directory. |
sep | The separator between columns. By default, BED files are
tab-delimited, and |
Note: for loading remote data files, currently this function depends on tabix.c 0.2.5, which doesn't not support HTTPS protocol. In the next step, I plan to turn to htslib, and the this function can load remote data files through HTTPS.
bedtbl <- read_bed(system.file("extdata", "example_merge.bed", package = "bedtorch")) head(bedtbl) #> GRanges object with 6 ranges and 1 metadata column: #> seqnames ranges strand | score #> <Rle> <IRanges> <Rle> | <integer> #> [1] 21 4-6 * | 8 #> [2] 21 11-16 * | 5 #> [3] 21 13-17 * | 4 #> [4] 21 16-22 * | 5 #> [5] 21 23-25 * | 7 #> [6] 21 27-30 * | 7 #> ------- #> seqinfo: 2 sequences from an unspecified genome; no seqlengths # Basic usage bedtbl <- read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"), range = "1:3001-4000") #> Error: tabix is required. head(bedtbl) #> GRanges object with 6 ranges and 1 metadata column: #> seqnames ranges strand | score #> <Rle> <IRanges> <Rle> | <integer> #> [1] 21 4-6 * | 8 #> [2] 21 11-16 * | 5 #> [3] 21 13-17 * | 4 #> [4] 21 16-22 * | 5 #> [5] 21 23-25 * | 7 #> [6] 21 27-30 * | 7 #> ------- #> seqinfo: 2 sequences from an unspecified genome; no seqlengths # Specify the reference genome head(read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"), range = "1:3001-4000", genome = "hs37-1kg")) #> Error: tabix is required. head(read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"), range = "1:3001-4000", genome = "GRCh37")) #> Error: tabix is required. head(read_bed(system.file("extdata", "example2.bed.gz", package = "bedtorch"), range = "1:3001-4000", genome = "https://raw.githubusercontent.com/igvteam/igv/master/genomes/sizes/1kg_v37.chrom.sizes")) #> Error: Unknown genome: https://raw.githubusercontent.com/igvteam/igv/master/genomes/sizes/1kg_v37.chrom.sizes # Load remote BGZIP files with tabix index specified head(read_bed("https://git.io/JYATB", range = "22:20000001-30000001", tabix_index = "https://git.io/JYAkT")) #> Error: Range seeking is only supported for bgzip data files.