Package 'ensemblr' reference manual

Title:	R Client for the Ensembl REST API
Description:	R Client for the Ensembl REST API.
Authors:	Ramiro Magno [aut, cre] , Dany Mukesha [aut] , Isabel Duarte [aut] , Ana-Teresa Maia [aut] , CINTESIS [fnd], Pattern Institute [cph, fnd]
Maintainer:	Ramiro Magno <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.0
Built:	2025-02-16 03:30:42 UTC
Source:	https://github.com/ramiromagno/ensemblr

Create genomic range strings

Description

This function converts three vectors: chr, start, and end to strings of the form {chr}:{start}..{end}.

Usage

genomic_range(chr, start, end, starting_position_index = 1L)
genomic_range(chr, start, end, starting_position_index = 1L)

Arguments

`chr`	A character vector of chromosome names.
`start`	An integer vector of start positions.
`end`	An integer vector of end positions.
`starting_position_index`	Use this argument to indicate if the positions are 0-based (`0L`) or 1-based (`1L`). This value is used to check if positions are equal or above this number.

Value

Returns a character vector whose strings are genomic ranges of the form {chr}:{start}..{end}.

Examples

genomic_range("1", 10000L, 20000L) # Returns "1:10000..20000"

genomic_range("1", 10000L, 20000L) # Returns "1:10000..20000"

Get analyses behind Ensembl databases

Description

This function retrieves a table of analyses involved in the generation of data for the different Ensembl databases.

Usage

get_analyses(
  species_name,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_analyses(
  species_name,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`verbose`	Whether to be verbose about the http requests and respective responses' status.
`warnings`	Whether to show warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble of 3 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
database: Ensembl database. Typically one of 'core', 'rnaseq', 'cdna', 'funcgen' and 'otherfeatures'.
analysis: Analysis.

Get details about the genome assembly of a species

Description

This functions retrieves details about the assembly of a queried species.

Usage

get_assemblies(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_assemblies(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble, each row being a toplevel sequence, of 4 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
assembly_name: Assembly name.
assembly_date: Assembly date.
genebuild_method: Annotation method.
golden_path_length: Golden path length.
genebuild_initial_release_date: Genebuild release date.
default_coord_system_version: Default coordinate system version.
assembly_accession: Assembly accession.
genebuild_start_date: Genebuild start date.
genebuild_last_geneset_update: Genebuild last geneset update.

Examples

# Get details about the human assembly
get_assemblies()

# Get details about the Mouse and the Fruit Fly genomes
get_assemblies(c('mus_musculus', 'drosophila_melanogaster'))

# Get details about the human assembly
get_assemblies()

# Get details about the Mouse and the Fruit Fly genomes
get_assemblies(c('mus_musculus', 'drosophila_melanogaster'))

Get cytogenetic bands by species

Description

This function retrieves cytogenetic bands. If no cytogenetic information is available for the queried species then it will be omitted from in the returned value.

Usage

get_cytogenetic_bands(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_cytogenetic_bands(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble, each row being a cytogenetic band, of 8 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
assembly_name: Assembly name.
cytogenetic_band: Name of the cytogenetic_band.
chromosome: Chromosome name.
start: Genomic start position of the cytogenetic band. Starts at 1.
end: Genomic end position of the cytogenetic band. End position is included in the band interval.
stain: Giemsa stain results: Giemsa negative, 'gneg'; Giemsa positive, of increasing intensities, 'gpos25', 'gpos50', 'gpos75', and 'gpos100'; centromeric region, 'acen'; heterochromatin, either pericentric or telomeric, 'gvar'; and short arm of acrocentric chromosomes are coded as 'stalk'.
strand: Strand.

Examples

# Get toplevel sequences for the human genome (default)
get_cytogenetic_bands()

# Get toplevel sequences for Mus musculus
get_cytogenetic_bands('mus_musculus')

# Get toplevel sequences for the human genome (default)
get_cytogenetic_bands()

# Get toplevel sequences for Mus musculus
get_cytogenetic_bands('mus_musculus')

Retrieve the data release version(s) available on the Ensembl REST server.

Description

Retrieve the data release version(s) available on the Ensembl REST server.

Usage

get_data_versions(verbose = FALSE, warnings = TRUE)
get_data_versions(verbose = FALSE, warnings = TRUE)

Arguments

`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.

Value

An integer vector of release version(s).

Retrieve Ensembl divisions

Description

This function retrieves Ensembl divisions. Ensembl data is split up in separate databases which are loosely based on taxonomic divisions or sub-groups.

Usage

get_divisions(verbose = FALSE, warnings = TRUE)
get_divisions(verbose = FALSE, warnings = TRUE)

Arguments

`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.

Value

A character vector of Ensembl divisions.

Examples

# Retrieve a character vector of Ensembl divisions
get_divisions()

# Retrieve a character vector of Ensembl divisions
get_divisions()

Get Ensembl Genomes version

Description

Returns the Ensembl Genomes version of the databases backing this service.

Usage

get_ensembl_genomes_version(verbose = FALSE, warnings = TRUE)
get_ensembl_genomes_version(verbose = FALSE, warnings = TRUE)

Arguments

`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.

Value

An integer value: the Ensembl Genomes version.

Examples

get_ensembl_genomes_version()

get_ensembl_genomes_version()

Get details about an Ensembl identifier

Description

This function retrieves information about one or more Ensembl identifiers. Ensembl identifiers for which information is available are: genes, exons, transcripts and proteins.

Usage

get_id(id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
get_id(id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)

Arguments

`id`	A character vector of Ensembl identifiers. Ensembl identifiers have the form ENS[species prefix][feature type prefix][a unique eleven digit number]. `id` should not contain NAs. Please note that while `'ENSG00000157764'` is a valid identifier as a query, `'ENSG00000157764.13'` is not.
`verbose`	Whether to be verbose about the http requests and respective responses' status.
`warnings`	Whether to show warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble of 9 variables:

id: Ensembl identifier.
id_latest: Ensembl identifier including the version suffix.
type: Entity type: gene ('Gene'), exon ('Exon'), transcript ('Transcript'), and protein ('Translation').
id_version: Ensembl identifier version, indicates how many times that entity has changed during its time in Ensembl.
release: Ensembl release version.
is_current: Is this the latest identifier for the represented entity.
genome_assembly_name: Code name of the genome assembly.
peptide: TODO
possible_replacement: TODO

Examples

get_id(c('ENSDARE00000830915', 'ENSG00000248378', 'ENSDART00000033574', 'ENSP00000000233'))

get_id(c('ENSDARE00000830915', 'ENSG00000248378', 'ENSDART00000033574', 'ENSP00000000233'))

Get individuals for a population

Description

This function retrieves individual-level information. The data is returned as a tibble where each row is an individual of a given species and the columns are metadata about each individual. See below under section Value for details about each column. Use the function get_populations() to discover the available populations for a species.

Usage

get_individuals(
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_individuals(
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`population`	Population name. Find the available populations for a given species with `get_populations`.
`verbose`	Whether to be verbose about the http requests and respective responses' status.
`warnings`	Whether to show warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble of 5 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
population: Population.
description: Description of the population.
individual: Individual identifier.
gender: Individual gender.

Ensembl REST API endpoints

get_individuals() makes GET requests to /info/variation/populations/:species:/:population_name.

Examples

# Get human individuals for populaton "1000GENOMES:phase_3:CEU" (default)
get_individuals()

# Get Finnish individuals ("1000GENOMES:phase_3:FIN")
get_individuals(population = '1000GENOMES:phase_3:FIN')

# Get human individuals for populaton "1000GENOMES:phase_3:CEU" (default)
get_individuals()

# Get Finnish individuals ("1000GENOMES:phase_3:FIN")
get_individuals(population = '1000GENOMES:phase_3:FIN')

Get the karyotype of a species

Description

This function retrieves the set of chromosomes of a species.

Usage

get_karyotypes(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_karyotypes(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble, each row being a chromosome, of 4 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
coord_system: Coordinate system type.
chromosome: Chromosome name.
length: Genomic length of the chromsome in base pairs.

Examples

# Get the karyotype of Caenorhabditis elegans
get_karyotypes('caenorhabditis_elegans')

# Get the karyotype of the Giant panda
get_karyotypes('ailuropoda_melanoleuca')

# Get the karyotype of Caenorhabditis elegans
get_karyotypes('caenorhabditis_elegans')

# Get the karyotype of the Giant panda
get_karyotypes('ailuropoda_melanoleuca')

Get linkage disequilibrium data for variants

Description

Gets linkage disequilibrium data for variants from Ensembl REST API. There are four ways to query, either by:

Genomic window centred on variants:: get_ld_variants_by_window(variant_id, genomic_window_size, ...)
Pairs of variants:: get_ld_variants_by_pair(variant_id1, variant_id2, ...)
Genomic range:: get_ld_variants_by_range(genomic_range, ...)
All pair combinations of variants:: get_ld_variants_by_pair_combn(variant_id, ...)

Usage

get_ld_variants_by_window(
  variant_id,
  genomic_window_size = 500L,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair(
  variant_id1,
  variant_id2,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_range(
  genomic_range,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair_combn(
  variant_id,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_ld_variants_by_window(
  variant_id,
  genomic_window_size = 500L,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair(
  variant_id1,
  variant_id2,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_range(
  genomic_range,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair_combn(
  variant_id,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`variant_id`	Variant identifiers, e.g., `'rs123'`. This argument is to be used with either function `get_ld_variants_by_window()` or `get_ld_variants_by_pair_combn()`. In the case of `get_ld_variants_by_pair_combn()` all pairwise combinations of elements of `variant_id` are used to define pairs of variants for querying. Note that this argument is not the same as `variant_id1` or `variant_id2`, to be used with function `get_ld_variants_by_pair`.
`genomic_window_size`	An integer vector specifying the genomic window size in kilobases (kb) around the variant indicated in `variant_id`. This argument is to be used with function `get_ld_variants_by_window()`. At the moment, the Ensembl REST API does not allow values greater than 500kb. A window size of 500 means looking 250kb upstream and downstream the variant passed as `variant_id`. The minimum value for this argument is `1L`, not `0L`.
`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`population`	Population for which to compute linkage disequilibrium. See `get_populations` on how to find available populations for a species.
`d_prime`	$D'$ is a measure of linkage disequilibrium. `d_prime` defines a cut-off threshold: only variants whose $D' \ge$ `d_prime` are returned.
`r_squared`	$r^2$ is a measure of linkage disequilibrium. `r_squared` defines a cut-off threshold: only variants whose $r^2 \ge$ `r_squared` are returned. The lower bound for `r_squared` is `0.05`, not `0`; the upper bound is `1`.
`verbose`	Whether to be verbose about the http requests and respective responses' status.
`warnings`	Whether to show warnings.
`progress_bar`	Whether to show a progress bar.
`variant_id1`	The first variant of a pair of variants. Used with `variant_id2`. Note that this argument is not the same as `variant_id`. This argument is to be used with function `get_ld_variants_by_pair()`.
`variant_id2`	The second variant of a pair of variants. Used with `variant_id1`. Note that this argument is not the same as `variant_id`. This argument is to be used with function `get_ld_variants_by_pair()`.
`genomic_range`	Genomic range formatted as a string `"chr:start..end"`, e.g., `"X:1..10000"`. Check function `genomic_range` to easily create these ranges from vectors of start and end positions. This argument is to be used with function `get_ld_variants_by_range()`.

Value

A tibble of 6 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
population: Population for which to compute linkage disequilibrium.
variant_id1: First variant identifier.
variant_id2: Second variant identifier.
d_prime: $D'$ between the two variants.
r_squared: $r^2$ between the two variants.

Examples

# Retrieve variants in LD by a window size of 1kb:
# 1kb: 500 bp upstream and 500 bp downstream of variant.
get_ld_variants_by_window('rs123', genomic_window_size = 1L)

# Retrieve LD measures for pairs of variants:
get_ld_variants_by_pair(
  variant_id1 = c('rs123', 'rs35439278'),
  variant_id2 = c('rs122', 'rs35174522')
)

# Retrieve variants in LD within a genomic range
get_ld_variants_by_range('7:100000..100500')

# Retrieve all pair combinations of variants in LD
get_ld_variants_by_pair_combn(c('rs6978506', 'rs12718102', 'rs13307200'))

# Retrieve variants in LD by a window size of 1kb:
# 1kb: 500 bp upstream and 500 bp downstream of variant.
get_ld_variants_by_window('rs123', genomic_window_size = 1L)

# Retrieve LD measures for pairs of variants:
get_ld_variants_by_pair(
  variant_id1 = c('rs123', 'rs35439278'),
  variant_id2 = c('rs122', 'rs35174522')
)

# Retrieve variants in LD within a genomic range
get_ld_variants_by_range('7:100000..100500')

# Retrieve all pair combinations of variants in LD
get_ld_variants_by_pair_combn(c('rs6978506', 'rs12718102', 'rs13307200'))

Get populations for a species

Description

This function retrieves population-level information. The data is returned as a tibble where each row is a population of a given species and the columns are metadata about each population. See below under section Value for details about each column. The parameter ld_only to restrict populations returned to only populations with linkage disequilibrium information.

Usage

get_populations(
  species_name = "homo_sapiens",
  ld_only = TRUE,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_populations(
  species_name = "homo_sapiens",
  ld_only = TRUE,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`ld_only`	Whether to restrict populations returned to only populations with linkage disequilibrium data.
`verbose`	Whether to be verbose about the http requests and respective responses' status.
`warnings`	Whether to show warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble of 4 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
population: Population.
description: Description of the population.
cohort_size: Cohort sample size.

Ensembl REST API endpoints

get_populations() makes GET requests to /info/variation/populations/:species.

Examples

# Get all human populations with linkage disequilibrium data
get_populations(species_name = 'homo_sapiens', ld_only = TRUE)

# Get all human populations
get_populations(species_name = 'homo_sapiens', ld_only = FALSE)

# Get all human populations with linkage disequilibrium data
get_populations(species_name = 'homo_sapiens', ld_only = TRUE)

# Get all human populations
get_populations(species_name = 'homo_sapiens', ld_only = FALSE)

Retrieve the current version of the Ensembl REST API

Description

Retrieve the current version of the Ensembl REST API

Usage

get_rest_version(verbose = FALSE, warnings = TRUE)
get_rest_version(verbose = FALSE, warnings = TRUE)

Arguments

`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.

Value

A scalar character vector with Ensembl REST API version.

Retrieve the Perl API version

Description

Retrieve the Perl API version

Usage

get_software_version(verbose = FALSE, warnings = TRUE)
get_software_version(verbose = FALSE, warnings = TRUE)

Arguments

`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.

Value

A scalar integer vector with the Perl API version.

Get Ensembl species

Description

This function retrieves species-level information. The data is returned as a tibble where each row is a species and the columns are metadata about each species. See below under section Value for details about each column.

Usage

get_species(
  division = get_divisions(),
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_species(
  division = get_divisions(),
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`division`	Ensembl division, e.g., `"EnsemblVertebrates"` or `"EnsemblBacteria"`, or a combination of several divisions. Check function `get_divisions` to get available Ensembl divisions.
`verbose`	Whether to be verbose about the http requests and respective responses' status.
`warnings`	Whether to show warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble of 12 variables:

division: Ensembl division: "EnsemblVertebrates", "EnsemblMetazoa", "EnsemblPlants", "EnsemblProtists", "EnsemblFungi" or "EnsemblBacteria".
taxon_id: NCBI taxon identifier.
species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
species_display_name: Species display name: the name used for display on Ensembl website.
species_common_name: Species common name.
release: Ensembl release version.
genome_assembly_name: Code name of the genome assembly.
genbank_assembly_accession: Genbank assembly accession identifier.
strain: Species strain.
strain_collection: Species strain collection.
species_aliases: Other names or acronyms used to refer to the species. Note that this column is of the list type.
groups: Ensembl databases for which data exists for this species. Note that this column is of the list type.

Get toplevel sequences details

Description

This function retrieves a few extra details about a toplevel sequence. These sequences correspond to genomic regions in the genome assembly that are not a component of another sequence region. Thus, toplevel sequences will be chromosomes and any unlocalised or unplaced scaffolds.

Usage

get_toplevel_sequence_info(
  species_name = "homo_sapiens",
  toplevel_sequence = c(1:22, "X", "Y", "MT"),
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_toplevel_sequence_info(
  species_name = "homo_sapiens",
  toplevel_sequence = c(1:22, "X", "Y", "MT"),
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`toplevel_sequence`	A toplevel sequence name, e.g. chromosome names such as `"1"`, `"X"`, or `"Y"`, or a non-chromosome sequence, e.g., a scaffold such as `"KI270757.1"`.
`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble, each row being a toplevel sequence, of 8 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
toplevel_sequence: Name of the toplevel sequence.
is_chromosome: A logical indicating whether the toplevel sequence is a chromosome (TRUE) or not (FALSE).
coord_system: Coordinate system type.
assembly_exception_type: Coordinate system type.
is_circular: A logical indicating whether the toplevel sequence is a circular sequence (TRUE) or not (FALSE).
assembly_name: Assembly name.
length: Genomic length toplevel sequence in base pairs.

Examples

# Get details about human chromosomes (default)
get_toplevel_sequence_info()

# Get details about a scaffold
# (To find available toplevel sequences to query use the function
# `get_toplevel_sequences()`)
get_toplevel_sequence_info(species_name = 'homo_sapiens', toplevel_sequence = 'KI270757.1')

# Get details about human chromosomes (default)
get_toplevel_sequence_info()

# Get details about a scaffold
# (To find available toplevel sequences to query use the function
# `get_toplevel_sequences()`)
get_toplevel_sequence_info(species_name = 'homo_sapiens', toplevel_sequence = 'KI270757.1')

Get toplevel sequences by species

Description

This function retrieves toplevel sequences. These sequences correspond to genomic regions in the genome assembly that are not a component of another sequence region. Thus, toplevel sequences will be chromosomes and any unlocalised or unplaced scaffolds.

Usage

get_toplevel_sequences(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_toplevel_sequences(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble, each row being a toplevel sequence, of 4 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
coord_system: Coordinate system type.
toplevel_sequence: Name of the toplevel sequence.
length: Genomic length toplevel sequence in base pairs.

Examples

# Get toplevel sequences for the human genome (default)
get_toplevel_sequences()

# Get toplevel sequences for Caenorhabditis elegans
get_toplevel_sequences('caenorhabditis_elegans')

# Get toplevel sequences for the human genome (default)
get_toplevel_sequences()

# Get toplevel sequences for Caenorhabditis elegans
get_toplevel_sequences('caenorhabditis_elegans')

Retrieve variant consequences

Description

This function retrieves variant consequence types. For more details check Ensembl Variation - Calculated variant consequences.

Usage

get_variant_consequences(verbose = FALSE, warnings = TRUE)
get_variant_consequences(verbose = FALSE, warnings = TRUE)

Arguments

`verbose`	Whether to be chatty about the underlying requests.
`warnings`	Whether to print warnings.

Details

A rule-based approach is used to predict the effects that each allele of a variant may have on each transcript. These effects are variant consequences, that are catalogued as consequence terms, defined by the Sequence Ontology.

See below a diagram showing the location of each display term relative to the transcript structure:

Figure: consequences-fs8.png

Value

A tibble, each row being a variant consequence, of four variables:

SO_accession: Sequence Ontology accession, e.g., 'SO:0001626'.
SO_term: Sequence Ontology term, e.g., 'incomplete_terminal_codon_variant'.
label: Display term.
description: Sequence Ontology description.

Ensembl REST API endpoints

get_variant_consequence_types makes GET requests to /info/variation/consequence_types.

Examples

# Retrieve variant consequence types
get_variant_consequences()

# Retrieve variant consequence types
get_variant_consequences()

Retrieve variant sources

Description

This function retrieves variant sources, i.e. a list of databases used by Ensembl from which variant information is retrieved.

Usage

get_variation_sources(
  species_name = "human",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_variation_sources(
  species_name = "human",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble, each row being a variant database, of 8 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
db_name: Database name.
type: Database type, e.g., chip (genotyping chip) or lsdb (locus-specific database).
version: Database version.
somatic_status: Somatic status.
description: Database description.
url: Database's URL.
data_types: Data types to be found at database.

Ensembl REST API endpoints

get_variation_sources makes GET requests to info/variation/:species.

Examples

# Retrieve variant sources for human (default)
get_variation_sources()

# Retrieve variant sources for mouse
get_variation_sources(species_name = 'mus_musculus')

# Retrieve variant sources for human (default)
get_variation_sources()

# Retrieve variant sources for mouse
get_variation_sources(species_name = 'mus_musculus')

Retrieve Ensembl REST versions

Description

This function gets the versions of the different entities involved in the REST API requests. When accessing the Ensembl REST API, you are actually accessing three interconnected entities:

Ensembl databases (data).
Perl API (software).
REST API (rest).

Usage

get_versioning(verbose = FALSE, warnings = TRUE)
get_versioning(verbose = FALSE, warnings = TRUE)

Arguments

`verbose`	Whether to be chatty.
`warnings`	Whether to print warnings.

Value

A named list of three elements: data, software and rest.

Examples

# Get the versions of the different entities involved in the REST API
# requests.
get_versioning()

# Get the versions of the different entities involved in the REST API
# requests.
get_versioning()

Get cross-references by Ensembl ID

Description

This function retrieves cross-references to external databases by Ensembl identifier. The data is returned as a tibble where each row is a cross reference related to the provided Ensembl identifier. See below under section Value for details about each column.

Usage

get_xrefs_by_ensembl_id(
  species_name,
  ensembl_id,
  all_levels = FALSE,
  ensembl_db = "core",
  external_db = "",
  feature = "",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_xrefs_by_ensembl_id(
  species_name,
  ensembl_id,
  all_levels = FALSE,
  ensembl_db = "core",
  external_db = "",
  feature = "",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`ensembl_id`	An Ensembl stable identifier, e.g. `"ENSG00000248234378"`.
`all_levels`	A `logical` vector. Set to find all genetic features linked to the stable ID, and fetch all external references for them. Specifying this on a gene will also return values from its transcripts and translations.
`ensembl_db`	Restrict the search to an Ensembl database: typically one of `'core'`, `'rnaseq'`, `'cdna'`, `'funcgen'` and `'otherfeatures'`.
`external_db`	External database to be filtered by. By default no filtering is applied.
`feature`	Restrict search to a feature type: gene (`'gene'`), exon (`'exon'`), transcript (`'transcript'`), and protein (`'translation'`).
`verbose`	Whether to be verbose about the http requests and respective responses' status.
`warnings`	Whether to show warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble of 12 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
ensembl_id: An Ensembl stable identifier, e.g. "ENSG00000248234378".
ensembl_db: Ensembl database.
primary_id: Primary identification in external database.
display_id: Display identification in external database.
external_db_name: External database name.
external_db_display_name: External database display name.
version: TODO
info_type: There are two types of external cross references (XRef): direct ('DIRECT') or dependent ('DEPENDENT'). A direct cross reference is one that can be directly linked to a gene, transcript or translation object in Ensembl Genomes by synonymy or sequence similarity. A dependent cross reference is one that is transitively linked to the object via the direct cross reference. The value can also be 'UNMAPPED' for unmapped cross references, or 'PROJECTION' for TODO.
info_text: TODO
synonyms: Other names or acronyms used to refer to the the external database entry. Note that this column is of the list type.
description: Brief description of the external database entry.

Ensembl REST API endpoints

get_xrefs_by_ensembl_id() makes GET requests to /xrefs/id/:id.

Examples

get_xrefs_by_ensembl_id('human', 'ENSG00000248378')

get_xrefs_by_ensembl_id('human', 'ENSG00000248378', all_levels = TRUE)
get_xrefs_by_ensembl_id('human', 'ENSG00000248378')

get_xrefs_by_ensembl_id('human', 'ENSG00000248378', all_levels = TRUE)

Get cross references by gene symbol or name

Description

This function retrieves cross references by symbol or display name of a gene. The data is returned as a tibble where each row is a cross reference related to the provided symbol or display name of a gene. See below under section Value for details about each column.

Usage

get_xrefs_by_gene(
  species_name,
  gene,
  ensembl_db = "core",
  external_db = "",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)
get_xrefs_by_gene(
  species_name,
  gene,
  ensembl_db = "core",
  external_db = "",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

`species_name`	The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: `'homo_sapiens'` (human), `'ovis_aries'` (Domestic sheep) or `'capra_hircus'` (Goat).
`gene`	Symbol or display name of a gene, e.g., `'ACTB'` or `'BRCA2'`.
`ensembl_db`	Restrict the search to a database other than the default. Ensembl's default database is `'core'`.
`external_db`	Filter by external database, e.g. `'HGNC'`. An empty string indicates no filtering.
`verbose`	Whether to be verbose about the http requests and respective responses' status.
`warnings`	Whether to show warnings.
`progress_bar`	Whether to show a progress bar.

Value

A tibble of 12 variables:

species_name: Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.
gene: Gene symbol.
ensembl_db: Ensembl database.
primary_id: Primary identification in external database.
display_id: Display identification in external database.
external_db_name: External database name.
external_db_display_name: External database display name.
version: TODO
info_type: There are two types of external cross references (XRef): direct ('DIRECT') or dependent ('DEPENDENT'). A direct cross reference is one that can be directly linked to a gene, transcript or translation object in Ensembl Genomes by synonymy or sequence similarity. A dependent cross reference is one that is transitively linked to the object via the direct cross reference. The value can also be 'UNMAPPED' for unmapped cross references, or 'PROJECTION' for TODO.
info_text: TODO
synonyms: Other names or acronyms used to refer to the gene. Note that this column is of the list type.
description: Brief description of the external database entry.

Ensembl REST API endpoints

get_xrefs_by_gene() makes GET requests to /xrefs/name/:species/:name.

Examples

# Get cross references that relate to gene BRCA2
get_xrefs_by_gene(species_name = 'human', gene = 'BRCA2')

# Get cross references that relate to gene BRCA2
get_xrefs_by_gene(species_name = 'human', gene = 'BRCA2')

Is the Ensembl REST API server reachable?

Description

Check if the Ensembl server where REST API service is running is reachable. This function attempts to connect to https://rest.ensembl.org, returning TRUE on success, and FALSE otherwise. Set verbose = TRUE for a step by step description of the connection attempt.

Usage

is_ensembl_reachable(url = ensembl_server(), port = 443L, verbose = FALSE)
is_ensembl_reachable(url = ensembl_server(), port = 443L, verbose = FALSE)

Arguments

`url`	Ensembl REST API server URL. Default is https://rest.ensembl.org. You should not need to change this parameter.
`port`	Network port on which to ping the server. You should not need to change this parameter.
`verbose`	Whether to be verbose (`TRUE`) or not (`FALSE`).

Value

A logical value: TRUE if EBI server is reachable, FALSE otherwise.

Examples

# Check if the Ensembl Server is reachable
is_ensembl_reachable() # Returns TRUE or FALSE.

# Check if the GWAS Catalog Server is reachable
# and show exactly at what step is it failing (if that is the case)
is_ensembl_reachable(verbose = TRUE)

# Check if the Ensembl Server is reachable
is_ensembl_reachable() # Returns TRUE or FALSE.

# Check if the GWAS Catalog Server is reachable
# and show exactly at what step is it failing (if that is the case)
is_ensembl_reachable(verbose = TRUE)

Ensembl REST API Endpoints.

Description

A dataset containing the Ensembl REST API endpoints, as listed in https://rest.ensembl.org/.

Usage

rest_api_endpoints
rest_api_endpoints

Format

A data frame with 118 rows and 4 variables:

section: Section.
endpoint: Ensembl REST API endpoint.
description: A short description of the resource.
last_update_date: Time stamp of last time this dataset was downloaded from Ensembl.

Source

https://rest.ensembl.org/

Package 'ensemblr'

Help Index

Create genomic range strings

Description

Usage

Arguments

Value

Examples

Get analyses behind Ensembl databases

Description

Usage

Arguments

Value

Get details about the genome assembly of a species

Description

Usage

Arguments

Value

Examples

Get cytogenetic bands by species

Description

Usage

Arguments

Value

Examples

Retrieve the data release version(s) available on the Ensembl REST server.

Description

Usage

Arguments

Value

Retrieve Ensembl divisions

Description

Usage

Arguments

Value

Examples

Get Ensembl Genomes version

Description

Usage

Arguments

Value

Examples

Get details about an Ensembl identifier

Description

Usage

Arguments

Value

Examples

Get individuals for a population

Description

Usage

Arguments

Value

Ensembl REST API endpoints

Examples

Get the karyotype of a species

Description

Usage

Arguments

Value

Examples

Get linkage disequilibrium data for variants

Description

Usage

Arguments

Value

Examples

Get populations for a species

Description

Usage

Arguments

Value

Ensembl REST API endpoints

Examples

Retrieve the current version of the Ensembl REST API

Description

Usage

Arguments

Value

Retrieve the Perl API version