Title: | 'REST' 'API' Client for the 'NHGRI'-'EBI' 'GWAS' Catalog |
---|---|
Description: | 'GWAS' R 'API' Data Download. This package provides easy access to the 'NHGRI'-'EBI' 'GWAS' Catalog data by accessing the 'REST' 'API' <https://www.ebi.ac.uk/gwas/rest/docs/api/>. |
Authors: | Ramiro Magno [aut, cre] , Ana-Teresa Maia [aut] , CINTESIS [cph, fnd], Pattern Institute [cph, fnd] |
Maintainer: | Ramiro Magno <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.99.17 |
Built: | 2024-11-14 04:44:39 UTC |
Source: | https://github.com/ramiromagno/gwasrapidd |
See magrittr::%>%
for details.
The same as the rhs.
c(1,2,3) %>% mean()
c(1,2,3) %>% mean()
Map an association accession identifier to a study accession identifier.
association_to_study(association_id, verbose = FALSE, warnings = TRUE)
association_to_study(association_id, verbose = FALSE, warnings = TRUE)
association_id |
A character vector of association accession identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the association identifier and the second column is the study identifier.
## Not run: # Map GWAS association identifiers to study identifiers association_to_study(c('24300097', '24299759')) ## End(Not run)
## Not run: # Map GWAS association identifiers to study identifiers association_to_study(c('24300097', '24299759')) ## End(Not run)
Map an association accession identifier to an EFO trait id.
association_to_trait(association_id, verbose = FALSE, warnings = TRUE)
association_to_trait(association_id, verbose = FALSE, warnings = TRUE)
association_id |
A character vector of association accession identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the association identifier and the second column is the EFO trait identifier.
## Not run: # Map GWAS association identifiers to EFO trait identifiers association_to_trait(c('24300097', '24299759')) ## End(Not run)
## Not run: # Map GWAS association identifiers to EFO trait identifiers association_to_trait(c('24300097', '24299759')) ## End(Not run)
Map an association accession identifier to a variant identifier.
association_to_variant(association_id, verbose = FALSE, warnings = TRUE)
association_to_variant(association_id, verbose = FALSE, warnings = TRUE)
association_id |
A character vector of association accession identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the association identifier and the second column is the variant identifier.
## Not run: # Map GWAS association identifiers to variant identifiers association_to_variant(c('24300097', '24299759')) ## End(Not run)
## Not run: # Map GWAS association identifiers to variant identifiers association_to_variant(c('24300097', '24299759')) ## End(Not run)
The association object consists of six slots, each a table
(tibble
), that combined form a relational database of a
subset of GWAS Catalog associations. Each association is an observation (row)
in the associations
table — main table. All tables have the column
association_id
as primary key.
associations
A tibble
listing associations.
Columns:
GWAS Catalog association accession identifier, e.g.,
"20250"
.
Reported p-value for strongest variant risk or effect allele.
Information describing context of p-value.
Mantissa of p-value.
Exponent of p-value.
Whether the association is for a multi-SNP haplotype.
Whether the association is for a SNP-SNP interaction.
Whether the SNP has previously been reported. Either
'known'
or 'novel'
.
Reported risk/effect allele frequency associated with strongest SNP in controls.
Standard error of the effect size.
Reported 95% confidence interval associated with strongest SNP risk allele, along with unit in the case of beta coefficients. If 95% CIs have not been not reported, these are estimated using the standard error, when available.
Reported odds ratio (OR) associated with strongest SNP risk allele. Note that all ORs included in the Catalog are >1.
Beta coefficient associated with strongest SNP risk allele.
Beta coefficient unit.
Beta coefficient direction, either 'decrease'
or
'increase'
.
Additional beta coefficient comment.
Last time this association was mapped to Ensembl.
Last time this association was updated.
loci
A tibble
listing loci. Columns:
GWAS Catalog association accession identifier, e.g.,
"20250"
.
A locus identifier referring to a single variant locus or to a multi-loci entity such as a multi-SNP haplotype.
Number of variants per locus. Most loci are
single-SNP loci, i.e., there is a one to one relationship between a variant
and a locus_id
(haplotype_snp_count == NA
). There are however
cases of associations involving multiple loci at once, such as SNP-SNP
interactions and multi-SNP haplotypes. This is signalled in the columns:
multiple_snp_haplotype
and snp_interaction
with value
TRUE
.
Description of the locus identifier, e.g.,
'Single variant'
, SNP x SNP interaction
, or 3-SNP
Haplotype
.
risk_alleles
A tibble
listing risk alleles.
Columns:
GWAS Catalog association accession identifier, e.g.,
"20250"
.
A locus identifier referring to a single variant locus or to a multi-loci entity such as a multi-SNP haplotype.
Variant identifier, e.g., 'rs1333048'
.
Risk allele or effect allele.
Reported risk/effect allele frequency associated with strongest SNP in controls (if not available among all controls, among the control group with the largest sample size). If the associated locus is a haplotype the haplotype frequency will be extracted.
Whether this variant allele has been part of a genome-wide study or not.
Undocumented.
genes
A tibble
listing author reported genes.
Columns:
GWAS Catalog association accession identifier, e.g.,
"20250"
.
A locus identifier referring to a single variant locus or to a multi-loci entity such as a multi-SNP haplotype.
Gene symbol according to HUGO Gene Nomenclature (HGNC).
ensembl_ids
A tibble
listing Ensembl gene
identifiers. Columns:
GWAS Catalog association accession identifier, e.g.,
"20250"
.
A locus identifier referring to a single variant locus or to a multi-loci entity such as a multi-SNP haplotype.
Gene symbol according to HUGO Gene Nomenclature (HGNC).
The Ensembl identifier of an Ensembl gene, see Section Gene annotation in Ensembl for more information.
entrez_ids
A tibble
listing Entrez gene
identifiers. Columns:
GWAS Catalog association accession identifier, e.g.,
"20250"
.
A locus identifier referring to a single variant locus or to a multi-loci entity such as a multi-SNP haplotype.
Gene symbol according to HUGO Gene Nomenclature (HGNC).
The Entrez identifier of a gene, see ref. doi:10.1093/nar/gkq1237 for more information.
Binds together GWAS Catalog objects of the same class. Note that
bind()
preserves duplicates whereas
union
does not.
bind(x, ...)
bind(x, ...)
x |
An object of class: studies, associations, variants, or traits. |
... |
Objects of the same class as |
An object of the same class as x
.
# Join two studies objects. bind(studies_ex01, studies_ex02) # Join two associations objects. bind(associations_ex01, associations_ex02) # Join two variants objects. bind(variants_ex01, variants_ex02) # Join two traits objects. bind(traits_ex01, traits_ex02)
# Join two studies objects. bind(studies_ex01, studies_ex02) # Join two associations objects. bind(associations_ex01, associations_ex02) # Join two variants objects. bind(variants_ex01, variants_ex02) # Join two traits objects. bind(traits_ex01, traits_ex02)
A dataset containing the GRCh38 human cytogenetic bands and their genomic coordinates.
cytogenetic_bands
cytogenetic_bands
A data frame with 862 rows and 8 variables:
Cytogenetic band name. See Cytogenetic Nomenclature below.
Chromosome name: 1 through 22 (the autosomes), X or Y.
Genomic start position of the cytogenetic band. Starts at 1.
Genomic end position of the cytogenetic band. End position is included in the band interval.
Length of the genomic interval of cytogenetic band.
Assembly version, should be 'GRCh38'.
Giemsa
stain results: Giemsa negative, 'gneg'
; Giemsa positive, of
increasing intensities, 'gpos25'
, 'gpos50'
, 'gpos75'
,
and 'gpos100'
; centromeric region, 'acen'
; heterochromatin,
either pericentric or telomeric, 'gvar'
; and short arm of
acrocentric chromosomes 13, 14, 15, 21, and 22 are coded as
'stalk'
.
Time stamp of last time this dataset was downloaded from Ensembl.
Genomic coordinates are for fully closed intervals.
Cytogenetic bands are numbered from the centromere outwards in both directions towards the telomeres on the shorter p arm and the longer q arm.
The first number or letter represents the chromosome. Chromosomes 1 through 22 (the autosomes) are designated by their chromosome number. The sex chromosomes are designated by X or Y. The next letter represents the arm of the chromosome: p or q.
The numbers cannot be read in the normal decimal numeric system e.g. 36, but rather 3-6 (region 3 band 6). Counting starts at the centromere as region 1 (or 1-0), to 11 (1-1) to 21 (2-1) to 22 (2-2) etc. Subbands are added in a similar way, e.g. 21.1 to 21.2, if the bands are small or only appear at a higher resolution.
https://rest.ensembl.org/info/assembly/homo_sapiens?content-type=application/json&bands=1
This function attempts to get a variant by its variant identifier and checks the response code. If the response code is 200 then the response has been successful, meaning that the variant does exist in the GWAS Catalog. If the response is 404 then the variant is not found in the Catalog database. Other errors are mapped to NA.
exists_variant(variant_id = NULL, verbose = FALSE, page_size = 20L)
exists_variant(variant_id = NULL, verbose = FALSE, page_size = 20L)
variant_id |
A character vector of GWAS Catalog variant identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
page_size |
An integer scalar indicating the
page
value to be used in the JSON requests, can be between |
A named logical vector, TRUE
indicates that the variant does
exist in the Catalog, FALSE
otherwise. NA
codes other types
of errors. The names of the vector are the variant identifiers passed as
variant_id
.
exists_variant('rs12345') exists_variant('rs11235813')
exists_variant('rs12345') exists_variant('rs11235813')
These are examples of GWAS Catalog entities shipped with gwasrapidd:
studies_ex01 studies_ex02 associations_ex01 associations_ex02 variants_ex01 variants_ex02 traits_ex01 traits_ex02
studies_ex01 studies_ex02 associations_ex01 associations_ex02 variants_ex01 variants_ex02 traits_ex01 traits_ex02
An S4 studies object of 2 studies:
'GCST001585'
and 'GCST003985'
.
An S4 studies object of 2 studies:
'GCST001585'
and 'GCST006655'
.
An S4 associations object of 4
associations: '22509'
, '22505'
, '19537565'
and
'19537593'
.
An S4 associations object of 3
associations: '19537593'
, '31665940'
and '34944736'
.
An S4 variants object of 3 variants:
'rs146992477'
, 'rs56261590'
and 'rs4725504'
.
An S4 variants object of 4 variants:
'rs56261590'
, 'rs4725504'
, 'rs11099757'
and
'rs16871509'
.
An S4 traits object of 3 traits:
'EFO_0004884'
, 'EFO_0004343'
and 'EFO_0005299'
.
An S4 traits object of 4 traits:
'EFO_0007845'
, 'EFO_0004699'
, 'EFO_0004884'
and
'EFO_0004875'
.
An object of class studies
of length 1.
An object of class associations
of length 1.
An object of class associations
of length 1.
An object of class variants
of length 1.
An object of class variants
of length 1.
An object of class traits
of length 1.
An object of class traits
of length 1.
Retrieves associations via the NHGRI-EBI GWAS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all associations that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
associations that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
get_associations( study_id = NULL, association_id = NULL, variant_id = NULL, efo_id = NULL, pubmed_id = NULL, efo_trait = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE )
get_associations( study_id = NULL, association_id = NULL, variant_id = NULL, efo_id = NULL, pubmed_id = NULL, efo_trait = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE )
study_id |
A |
association_id |
A |
variant_id |
A |
efo_id |
A character vector of EFO identifiers. |
pubmed_id |
An |
efo_trait |
A |
set_operation |
Either |
interactive |
A logical. If all associations are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
Please note that all search criteria are vectorised, thus allowing for batch
mode search, e.g., one can search by multiple variant identifiers at once by
passing a vector of identifiers to variant_id
.
An associations object.
## Not run: # Get an association by study identifier get_associations(study_id = 'GCST001085', warnings = FALSE) # Get an association by association identifier get_associations(association_id = '25389945', warnings = FALSE) # Get associations by variant identifier get_associations(variant_id = 'rs3798440', warnings = FALSE) # Get associations by EFO trait identifier get_associations(efo_id = 'EFO_0005537', warnings = FALSE) ## End(Not run)
## Not run: # Get an association by study identifier get_associations(study_id = 'GCST001085', warnings = FALSE) # Get an association by association identifier get_associations(association_id = '25389945', warnings = FALSE) # Get associations by variant identifier get_associations(variant_id = 'rs3798440', warnings = FALSE) # Get associations by EFO trait identifier get_associations(efo_id = 'EFO_0005537', warnings = FALSE) ## End(Not run)
Get all child terms of this trait in the EFO hierarchy
get_child_efo( efo_id, verbose = FALSE, warnings = TRUE, page_size = 20L, progress_bar = TRUE )
get_child_efo( efo_id, verbose = FALSE, warnings = TRUE, page_size = 20L, progress_bar = TRUE )
efo_id |
A character vector of EFO identifiers. |
verbose |
A |
warnings |
A |
page_size |
An |
progress_bar |
Whether to show a progress bar as the paginated resources are retrieved. |
A named list whose values are character vectors of EFO identifiers.
## Not run: get_child_efo(c('EFO_0004884', 'EFO_0004343', 'EFO_0005299')) ## End(Not run)
## Not run: get_child_efo(c('EFO_0004884', 'EFO_0004343', 'EFO_0005299')) ## End(Not run)
Provides a list of the resources the GWAS Catalog data is currently mapped against: Ensembl release number, Genome build version and dbSNP version.In addition, the date since this combination of resource versions has been in use is also returned.
get_metadata(verbose = FALSE, warnings = TRUE)
get_metadata(verbose = FALSE, warnings = TRUE)
verbose |
Whether to be chatty. |
warnings |
Whether to trigger a warning if the request is not successful. |
ensembl_release_number
: Ensembl release number;
genome_build_version
: Genome build version;
dbsnp_version
: dbSNP version.
usage_start_date
: Date since this combination of resource versions has been in use.
## Not run: get_metadata(warnings = FALSE) ## End(Not run)
## Not run: get_metadata(warnings = FALSE) ## End(Not run)
Retrieves studies via the NHGRI-EBI GWAS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all studies that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
studies that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
get_studies( study_id = NULL, association_id = NULL, variant_id = NULL, efo_id = NULL, pubmed_id = NULL, user_requested = NULL, full_pvalue_set = NULL, efo_uri = NULL, efo_trait = NULL, reported_trait = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE )
get_studies( study_id = NULL, association_id = NULL, variant_id = NULL, efo_id = NULL, pubmed_id = NULL, user_requested = NULL, full_pvalue_set = NULL, efo_uri = NULL, efo_trait = NULL, reported_trait = NULL, set_operation = "union", interactive = TRUE, verbose = FALSE, warnings = TRUE )
study_id |
A character vector of GWAS Catalog study accession identifiers. |
association_id |
A character vector of GWAS Catalog association identifiers. |
variant_id |
A character vector of GWAS Catalog variant identifiers. |
efo_id |
A character vector of EFO identifiers. |
pubmed_id |
An integer vector of PubMed identifiers. |
user_requested |
A |
full_pvalue_set |
A |
efo_uri |
A character vector of EFO URIs. |
efo_trait |
A character vector of
EFO trait descriptions, e.g.,
|
reported_trait |
A character vector of phenotypic traits as reported by the original authors of the study. |
set_operation |
Either |
interactive |
A logical. If all studies are requested, whether to ask interactively if we really want to proceed. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
Please note that all search criteria are vectorised, thus allowing for batch
mode search, e.g., one can search by multiple variant identifiers at once by
passing a vector of identifiers to variant_id
.
A studies object.
## Not run: # Get a study by its accession identifier get_studies(study_id = 'GCST001085', warnings = FALSE) # Get a study by association identifier get_studies(association_id = '25389945', warnings = FALSE) # Get studies by variant identifier get_studies(variant_id = 'rs3798440', warnings = FALSE) # Get studies by EFO trait identifier get_studies(efo_id = 'EFO_0005537', warnings = FALSE) ## End(Not run)
## Not run: # Get a study by its accession identifier get_studies(study_id = 'GCST001085', warnings = FALSE) # Get a study by association identifier get_studies(association_id = '25389945', warnings = FALSE) # Get studies by variant identifier get_studies(variant_id = 'rs3798440', warnings = FALSE) # Get studies by EFO trait identifier get_studies(efo_id = 'EFO_0005537', warnings = FALSE) ## End(Not run)
Retrieves traits via the NHGRI-EBI GWAS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all traits that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
traits that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
get_traits( study_id = NULL, association_id = NULL, efo_id = NULL, pubmed_id = NULL, efo_uri = NULL, efo_trait = NULL, set_operation = "union", verbose = FALSE, warnings = TRUE )
get_traits( study_id = NULL, association_id = NULL, efo_id = NULL, pubmed_id = NULL, efo_uri = NULL, efo_trait = NULL, set_operation = "union", verbose = FALSE, warnings = TRUE )
study_id |
A |
association_id |
A |
efo_id |
A character vector of EFO identifiers. |
pubmed_id |
An |
efo_uri |
A |
efo_trait |
A |
set_operation |
Either |
verbose |
A |
warnings |
A |
Please note that all search criteria are vectorised, thus allowing for batch
mode search, e.g., one can search by multiple trait identifiers at once by
passing a vector of identifiers to efo_id
.
A traits object.
## Not run: # Get traits by study identifier get_traits(study_id = 'GCST001085', warnings = FALSE) # Get traits by association identifier get_traits(association_id = '25389945', warnings = FALSE) # Get a trait by its EFO identifier get_traits(efo_id = 'EFO_0005537', warnings = FALSE) ## End(Not run)
## Not run: # Get traits by study identifier get_traits(study_id = 'GCST001085', warnings = FALSE) # Get traits by association identifier get_traits(association_id = '25389945', warnings = FALSE) # Get a trait by its EFO identifier get_traits(efo_id = 'EFO_0005537', warnings = FALSE) ## End(Not run)
Retrieves variants via the NHGRI-EBI GWAS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all variants that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
variants that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
get_variants( study_id = NULL, association_id = NULL, variant_id = NULL, efo_id = NULL, pubmed_id = NULL, genomic_range = NULL, cytogenetic_band = NULL, gene_name = NULL, efo_trait = NULL, reported_trait = NULL, set_operation = "union", interactive = TRUE, std_chromosomes_only = TRUE, verbose = FALSE, warnings = TRUE )
get_variants( study_id = NULL, association_id = NULL, variant_id = NULL, efo_id = NULL, pubmed_id = NULL, genomic_range = NULL, cytogenetic_band = NULL, gene_name = NULL, efo_trait = NULL, reported_trait = NULL, set_operation = "union", interactive = TRUE, std_chromosomes_only = TRUE, verbose = FALSE, warnings = TRUE )
study_id |
A character vector of GWAS Catalog study accession identifiers. |
association_id |
A character vector of GWAS Catalog association identifiers. |
variant_id |
A character vector of GWAS Catalog variant identifiers. |
efo_id |
A character vector of EFO identifiers. |
pubmed_id |
An integer vector of PubMed identifiers. |
genomic_range |
A named list of three vectors:
The three vectors need to be of the same length so that |
cytogenetic_band |
A character vector of cytogenetic bands of the form
|
gene_name |
Gene symbol according to HUGO Gene Nomenclature (HGNC). |
efo_trait |
A character vector of
EFO trait descriptions, e.g.,
|
reported_trait |
A character vector of phenotypic traits as reported by the original authors of the study. |
set_operation |
Either |
interactive |
A logical. If all variants are requested, whether to ask interactively if we really want to proceed. |
std_chromosomes_only |
Whether to return only variants mapped to standard chromosomes: 1 thru 22, X, Y, and MT. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
Please note that all search criteria are vectorised, thus allowing for batch
mode search, e.g., one can search by multiple variant identifiers at once by
passing a vector of identifiers to variant_id
.
A variants object.
# Get variants by study identifier get_variants(study_id = 'GCST001085', warnings = FALSE) # Get a variant by its identifier ## Not run: get_variants(variant_id = 'rs3798440', warnings = FALSE) ## End(Not run)
# Get variants by study identifier get_variants(study_id = 'GCST001085', warnings = FALSE) # Get a variant by its identifier ## Not run: get_variants(variant_id = 'rs3798440', warnings = FALSE) ## End(Not run)
Check if the EBI server where the GWAS Catalog REST API server is running is
reachable. This function attempts to connect to
https://www.ebi.ac.uk, returning TRUE
on
success, and FALSE
otherwise. Set chatty = TRUE
for a step by
step description of the connection attempt.
is_ebi_reachable(url = "https://www.ebi.ac.uk", port = 443L, chatty = FALSE)
is_ebi_reachable(url = "https://www.ebi.ac.uk", port = 443L, chatty = FALSE)
url |
NHGRI-EBI GWAS Catalog server URL. Default is https://www.ebi.ac.uk. You should not need to change this parameter. |
port |
Network port on which to ping the server. You should not need to change this parameter. |
chatty |
Whether to be verbose ( |
A logical value: TRUE
if EBI server is reachable, FALSE
otherwise.
# Check if the GWAS Catalog Server is reachable is_ebi_reachable() # Returns TRUE or FALSE. # Check if the GWAS Catalog Server is reachable # and show exactly at what step is it failing (if that is the case) is_ebi_reachable(chatty = TRUE)
# Check if the GWAS Catalog Server is reachable is_ebi_reachable() # Returns TRUE or FALSE. # Check if the GWAS Catalog Server is reachable # and show exactly at what step is it failing (if that is the case) is_ebi_reachable(chatty = TRUE)
This function returns the number of unique entities in a GWAS Catalog object.
n(x, unique = FALSE) ## S4 method for signature 'studies' n(x, unique = FALSE) ## S4 method for signature 'associations' n(x, unique = FALSE) ## S4 method for signature 'variants' n(x, unique = FALSE) ## S4 method for signature 'traits' n(x, unique = FALSE)
n(x, unique = FALSE) ## S4 method for signature 'studies' n(x, unique = FALSE) ## S4 method for signature 'associations' n(x, unique = FALSE) ## S4 method for signature 'variants' n(x, unique = FALSE) ## S4 method for signature 'traits' n(x, unique = FALSE)
x |
A studies, an associations, a variants, or a traits object. |
unique |
Whether to count only unique entries ( |
An integer scalar.
# Determine number of studies n(studies_ex01) # Determine number of associations n(associations_ex01) # Determine number of variants n(variants_ex01) # Determine number of traits n(traits_ex01)
# Determine number of studies n(studies_ex01) # Determine number of associations n(associations_ex01) # Determine number of variants n(variants_ex01) # Determine number of traits n(traits_ex01)
This function launches the web browser at dbSNP and opens a tab for each SNP identifier.
open_in_dbsnp(variant_id)
open_in_dbsnp(variant_id)
variant_id |
A variant identifier, a character vector. |
Returns TRUE
if successful. Note however that this
function is run for its side effect.
open_in_dbsnp('rs56261590')
open_in_dbsnp('rs56261590')
This function launches the web browser at the GTEx Portal and opens a tab for each SNP identifier.
open_in_gtex(variant_id)
open_in_gtex(variant_id)
variant_id |
A variant identifier, a character vector. |
Returns TRUE
if successful. Note however that this
function is run for its side effect.
open_in_gtex('rs56261590')
open_in_gtex('rs56261590')
This function launches the web browser and opens a tab for each identifier on the GWAS web graphical user interface: https://www.ebi.ac.uk/gwas.
open_in_gwas_catalog( identifier, gwas_catalog_entity = c("study", "variant", "trait", "gene", "region", "publication") )
open_in_gwas_catalog( identifier, gwas_catalog_entity = c("study", "variant", "trait", "gene", "region", "publication") )
identifier |
A vector of identifiers. The identifiers can be: study accession identifiers, variant identifiers, EFO trait identifiers, gene symbol names, cytogenetic regions, or PubMed identifiers. |
gwas_catalog_entity |
Either |
Returns TRUE
if successful, or FALSE
otherwise. But
note that this function is run for its side effect.
# Open studies in GWAS Web Graphical User Interface open_in_gwas_catalog(c('GCST000016', 'GCST001115')) # Open variants open_in_gwas_catalog(c('rs146992477', 'rs56261590'), gwas_catalog_entity = 'variant') # Open EFO traits open_in_gwas_catalog(c('EFO_0004884', 'EFO_0004343'), gwas_catalog_entity = 'trait') # Open genes open_in_gwas_catalog(c('DPP6', 'MCCC2'), gwas_catalog_entity = 'gene') # Open cytogenetic regions open_in_gwas_catalog(c('2q37.1', '1p36.11'), gwas_catalog_entity = 'region') # Open publications open_in_gwas_catalog(c('25533513', '24376627'), gwas_catalog_entity = 'publication')
# Open studies in GWAS Web Graphical User Interface open_in_gwas_catalog(c('GCST000016', 'GCST001115')) # Open variants open_in_gwas_catalog(c('rs146992477', 'rs56261590'), gwas_catalog_entity = 'variant') # Open EFO traits open_in_gwas_catalog(c('EFO_0004884', 'EFO_0004343'), gwas_catalog_entity = 'trait') # Open genes open_in_gwas_catalog(c('DPP6', 'MCCC2'), gwas_catalog_entity = 'gene') # Open cytogenetic regions open_in_gwas_catalog(c('2q37.1', '1p36.11'), gwas_catalog_entity = 'region') # Open publications open_in_gwas_catalog(c('25533513', '24376627'), gwas_catalog_entity = 'publication')
This function launches the web browser and opens a tab for each PubMed citation.
open_in_pubmed(pubmed_id)
open_in_pubmed(pubmed_id)
pubmed_id |
A PubMed identifier, either a character or an integer vector. |
Returns TRUE
if successful. Note however that this
function is run for its side effect.
open_in_pubmed(c('26301688', '30595370'))
open_in_pubmed(c('26301688', '30595370'))
Performs set union, intersection, and (asymmetric!) difference on two objects
of either class studies, associations,
variants, or traits. Note that union()
removes duplicated entities, whereas bind()
does
not.
union(x, y, ...) intersect(x, y, ...) setdiff(x, y, ...) setequal(x, y, ...)
union(x, y, ...) intersect(x, y, ...) setdiff(x, y, ...) setequal(x, y, ...)
x , y
|
Objects of either class studies, associations, variants, or traits. |
... |
other arguments passed on to methods. |
An object of the same class as x
and y
, i.e.,
studies, associations, variants,
or traits.
# # union() # # Combine studies and remove duplicates union(studies_ex01, studies_ex02) # Combine associations and remove duplicates union(associations_ex01, associations_ex02) # Combine variants and remove duplicates union(variants_ex01, variants_ex02) # Combine traits and remove duplicates union(traits_ex01, traits_ex02) # # intersect() # # Intersect common studies intersect(studies_ex01, studies_ex02) # Intersect common associations intersect(associations_ex01, associations_ex02) # Intersect common variants intersect(variants_ex01, variants_ex02) # Intersect common traits intersect(traits_ex01, traits_ex02) # # setdiff() # # Remove studies from ex01 that are also present in ex02 setdiff(studies_ex01, studies_ex02) # Remove associations from ex01 that are also present in ex02 setdiff(associations_ex01, associations_ex02) # Remove variants from ex01 that are also present in ex02 setdiff(variants_ex01, variants_ex02) # Remove traits from ex01 that are also present in ex02 setdiff(traits_ex01, traits_ex02) # # setequal() # # Compare two studies objects setequal(studies_ex01, studies_ex01) setequal(studies_ex01, studies_ex02) # Compare two associations objects setequal(associations_ex01, associations_ex01) setequal(associations_ex01, associations_ex02) # Compare two variants objects setequal(variants_ex01, variants_ex01) setequal(variants_ex01, variants_ex02) # Compare two traits objects setequal(traits_ex01, traits_ex01) setequal(traits_ex01, traits_ex02)
# # union() # # Combine studies and remove duplicates union(studies_ex01, studies_ex02) # Combine associations and remove duplicates union(associations_ex01, associations_ex02) # Combine variants and remove duplicates union(variants_ex01, variants_ex02) # Combine traits and remove duplicates union(traits_ex01, traits_ex02) # # intersect() # # Intersect common studies intersect(studies_ex01, studies_ex02) # Intersect common associations intersect(associations_ex01, associations_ex02) # Intersect common variants intersect(variants_ex01, variants_ex02) # Intersect common traits intersect(traits_ex01, traits_ex02) # # setdiff() # # Remove studies from ex01 that are also present in ex02 setdiff(studies_ex01, studies_ex02) # Remove associations from ex01 that are also present in ex02 setdiff(associations_ex01, associations_ex02) # Remove variants from ex01 that are also present in ex02 setdiff(variants_ex01, variants_ex02) # Remove traits from ex01 that are also present in ex02 setdiff(traits_ex01, traits_ex02) # # setequal() # # Compare two studies objects setequal(studies_ex01, studies_ex01) setequal(studies_ex01, studies_ex02) # Compare two associations objects setequal(associations_ex01, associations_ex01) setequal(associations_ex01, associations_ex02) # Compare two variants objects setequal(variants_ex01, variants_ex01) setequal(variants_ex01, variants_ex02) # Compare two traits objects setequal(traits_ex01, traits_ex01) setequal(traits_ex01, traits_ex02)
The studies object consists of eight slots, each a table
(tibble
), that combined form a relational database of a
subset of GWAS Catalog studies. Each study is an observation (row) in the
studies
table — main table. All tables have the column
study_id
as primary key.
studies
GWAS Catalog study accession identifier, e.g.,
"GCST002735"
.
Phenotypic trait as reported by the authors of the
study, e.g. "Breast cancer"
.
Free text description of the initial cohort sample size.
Free text description of the replication cohort sample size.
Whether the study investigates a gene-environment interaction.
Whether the study investigates a gene-gene interaction.
Number of variants passing quality control.
Qualifier of number of variants passing quality control.
Whether variants were imputed.
Whether samples were pooled.
Any other relevant study design information.
Whether full summary statistics are available for this study.
Whether the addition of this study to the GWAS Catalog was requested by a user.
genotyping_techs
A tibble
listing genotyping
technologies employed in each study. Columns:
GWAS Catalog study accession identifier.
Genotyping technology employed, e.g.
"Exome genotyping array"
, "Exome-wide sequencing"
,
"Genome-wide genotyping array"
, "Genome-wide sequencing"
, or
"Targeted genotyping array"
.
platforms
A tibble
listing platforms used per
study.
GWAS Catalog study accession identifier.
Platform manufacturer, e.g., "Affymetrix"
,
"Illumina"
, or "Perlegen"
.
ancestries
A tibble
listing ancestry of samples
used in each study.
GWAS Catalog study accession identifier.
Ancestry identifier.
Stage of the ancestry sample: either 'initial'
or
'replication'
.
Number of individuals comprising this ancestry sample.
ancestral_groups
A tibble
listing ancestral
groups used in each ancestry.
GWAS Catalog study accession identifier.
Ancestry identifier.
Genetic ancestry groups present in the sample.
countries_of_origin
A tibble
listing countries of
origin of samples.
GWAS Catalog study accession identifier.
Ancestry identifier.
Country name, according to The United Nations M49 Standard of Geographic Regions.
Region name, according to The United Nations M49 Standard of Geographic Regions.
Sub-region name, according to The United Nations M49 Standard of Geographic Regions.
countries_of_recruitment
A tibble
listing
countries of recruitment of samples.
GWAS Catalog study accession identifier.
Ancestry identifier.
Country name, according to The United Nations M49 Standard of Geographic Regions.
Region name, according to The United Nations M49 Standard of Geographic Regions.
Sub-region name, according to The United Nations M49 Standard of Geographic Regions.
publications
A tibble
listing publications
associated with each study.
GWAS Catalog study accession identifier.
PubMed identifier.
Publication date (online date if available) formatted
as ymd
.
Abbreviated journal name.
Publication title.
Last name and initials of first author.
Author's ORCID iD (Open Researcher and Contributor ID).
Map a study accession identifier to an association accession identifier.
study_to_association(study_id, verbose = FALSE, warnings = TRUE)
study_to_association(study_id, verbose = FALSE, warnings = TRUE)
study_id |
A character vector of study accession identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the study identifier and the second column is the association identifier.
## Not run: # Map GWAS study identifiers to association identifiers study_to_association(c('GCST001084', 'GCST001085')) ## End(Not run)
## Not run: # Map GWAS study identifiers to association identifiers study_to_association(c('GCST001084', 'GCST001085')) ## End(Not run)
Map a study accession identifier to a EFO trait identifier.
study_to_trait(study_id, verbose = FALSE, warnings = TRUE)
study_to_trait(study_id, verbose = FALSE, warnings = TRUE)
study_id |
A character vector of study accession identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the study identifier and the second column is the EFO identifier.
## Not run: # Map GWAS study identifiers to EFO trait identifiers study_to_trait(c('GCST001084', 'GCST001085')) ## End(Not run)
## Not run: # Map GWAS study identifiers to EFO trait identifiers study_to_trait(c('GCST001084', 'GCST001085')) ## End(Not run)
Map a study accession identifier to a variant accession identifier.
study_to_variant(study_id, verbose = FALSE, warnings = TRUE)
study_to_variant(study_id, verbose = FALSE, warnings = TRUE)
study_id |
A character vector of study accession identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the study identifier and the second column is the variant identifier.
## Not run: # Map GWAS study identifiers to variant identifiers study_to_variant(c('GCST001084', 'GCST001085')) ## End(Not run)
## Not run: # Map GWAS study identifiers to variant identifiers study_to_variant(c('GCST001084', 'GCST001085')) ## End(Not run)
You can subset associations by identifier or by position using the
`[`
operator.
## S4 method for signature 'associations,missing,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'associations,numeric,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'associations,character,missing,missing' x[i, j, ..., drop = FALSE]
## S4 method for signature 'associations,missing,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'associations,numeric,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'associations,character,missing,missing' x[i, j, ..., drop = FALSE]
x |
A associations object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
A associations object.
# Subset an associations object by identifier associations_ex01['22505'] # Or by its position in table associations associations_ex01[2] # Keep all associations except the second associations_ex01[-2]
# Subset an associations object by identifier associations_ex01['22505'] # Or by its position in table associations associations_ex01[2] # Keep all associations except the second associations_ex01[-2]
You can subset studies by identifier or by position using the
`[`
operator.
## S4 method for signature 'studies,missing,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'studies,numeric,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'studies,character,missing,missing' x[i, j, ..., drop = FALSE]
## S4 method for signature 'studies,missing,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'studies,numeric,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'studies,character,missing,missing' x[i, j, ..., drop = FALSE]
x |
A studies object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
A studies object.
# Subset a studies object by identifier studies_ex01['GCST001585'] # Or by its position in table studies studies_ex01[1] # Keep all studies except the first studies_ex01[-1]
# Subset a studies object by identifier studies_ex01['GCST001585'] # Or by its position in table studies studies_ex01[1] # Keep all studies except the first studies_ex01[-1]
You can subset traits by identifier or by position using the
`[`
operator.
## S4 method for signature 'traits,missing,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'traits,numeric,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'traits,character,missing,missing' x[i, j, ..., drop = FALSE]
## S4 method for signature 'traits,missing,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'traits,numeric,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'traits,character,missing,missing' x[i, j, ..., drop = FALSE]
x |
A traits object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
A traits object.
# Subset a traits object by identifier traits_ex01['EFO_0004884'] # Or by its position in table traits traits_ex01[1] # Keep all traits except the second traits_ex01[-2]
# Subset a traits object by identifier traits_ex01['EFO_0004884'] # Or by its position in table traits traits_ex01[1] # Keep all traits except the second traits_ex01[-2]
You can subset variants by identifier or by position using the
`[`
operator.
## S4 method for signature 'variants,missing,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'variants,numeric,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'variants,character,missing,missing' x[i, j, ..., drop = FALSE]
## S4 method for signature 'variants,missing,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'variants,numeric,missing,missing' x[i, j, ..., drop = FALSE] ## S4 method for signature 'variants,character,missing,missing' x[i, j, ..., drop = FALSE]
x |
A variants object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
A variants object.
# Subset a variants object by identifier variants_ex01['rs4725504'] # Or by its position in table variants variants_ex01[3] # Keep all variants except the third variants_ex01[-3]
# Subset a variants object by identifier variants_ex01['rs4725504'] # Or by its position in table variants variants_ex01[3] # Keep all variants except the third variants_ex01[-3]
Map an EFO trait id to an association identifier.
trait_to_association(efo_id, verbose = FALSE, warnings = TRUE)
trait_to_association(efo_id, verbose = FALSE, warnings = TRUE)
efo_id |
A character vector of EFO trait identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the EFO trait identifier and the second column is the association identifier.
## Not run: # Map EFO trait identifiers to association identifiers trait_to_association(c('EFO_0005108', 'EFO_0005109')) ## End(Not run)
## Not run: # Map EFO trait identifiers to association identifiers trait_to_association(c('EFO_0005108', 'EFO_0005109')) ## End(Not run)
Map an EFO trait id to a study accession identifier.
trait_to_study(efo_id, verbose = FALSE, warnings = TRUE)
trait_to_study(efo_id, verbose = FALSE, warnings = TRUE)
efo_id |
A character vector of EFO trait identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the EFO trait identifier and the second column is the study identifier.
## Not run: # Map EFO trait identifiers to study identifiers trait_to_study(c('EFO_0005108', 'EFO_0005109')) ## End(Not run)
## Not run: # Map EFO trait identifiers to study identifiers trait_to_study(c('EFO_0005108', 'EFO_0005109')) ## End(Not run)
Map an EFO trait id to a variant identifier.
trait_to_variant(efo_id, verbose = FALSE, warnings = TRUE)
trait_to_variant(efo_id, verbose = FALSE, warnings = TRUE)
efo_id |
A character vector of EFO trait identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the EFO trait identifier and the second column is the variant identifier.
## Not run: # Map EFO trait identifiers to variant identifiers trait_to_variant('EFO_0005229') ## End(Not run)
## Not run: # Map EFO trait identifiers to variant identifiers trait_to_variant('EFO_0005229') ## End(Not run)
The traits object consists of one slot only, a table
(tibble
) of GWAS Catalog EFO traits. Each EFO trait is
an observation (row) in the traits
table — main table.
traits
A tibble
listing EFO traits. Columns:
Map a variant identifier to an association identifier.
variant_to_association(variant_id, verbose = FALSE, warnings = TRUE)
variant_to_association(variant_id, verbose = FALSE, warnings = TRUE)
variant_id |
A character vector of variant identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the variant identifier and the second column is the association identifier.
## Not run: # Map GWAS variant identifiers to association identifiers variant_to_association(c('rs7904579', 'rs138331350')) ## End(Not run)
## Not run: # Map GWAS variant identifiers to association identifiers variant_to_association(c('rs7904579', 'rs138331350')) ## End(Not run)
Map a variant identifier to a study accession identifier.
variant_to_study(variant_id, verbose = FALSE, warnings = TRUE)
variant_to_study(variant_id, verbose = FALSE, warnings = TRUE)
variant_id |
A character vector of variant identifiers. |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two identifiers. First column is the variant identifier and the second column is the study identifier.
## Not run: # Map GWAS variant identifiers to study identifiers variant_to_study(c('rs7904579', 'rs138331350')) ## End(Not run)
## Not run: # Map GWAS variant identifiers to study identifiers variant_to_study(c('rs7904579', 'rs138331350')) ## End(Not run)
Map a variant identifier to an EFO trait identifier. Variants are first
mapped to association identifiers, and then to EFO traits. Set the option
keep_association_id
to TRUE
to keep the intermediate mapping,
i.e., the association identifiers.
variant_to_trait( variant_id, keep_association_id = FALSE, verbose = FALSE, warnings = TRUE )
variant_to_trait( variant_id, keep_association_id = FALSE, verbose = FALSE, warnings = TRUE )
variant_id |
A character vector of variant identifiers. |
keep_association_id |
Whether to keep the association identifier
in the final output (default is |
verbose |
Whether the function should be verbose about the different queries or not. |
warnings |
Whether to print warnings. |
A dataframe of two or three identifiers. If
keep_association_id
is set to FALSE
, the first column is the
variant identifier and the second column is the EFO trait identifier,
otherwise the variable association_id
is also included as the second
column.
## Not run: # Map GWAS variant identifiers to EFO trait identifiers variant_to_trait(c('rs7904579', 'rs138331350')) # Map GWAS variant identifiers to EFO trait identifiers # but keep the intermediate association identifier variant_to_trait(c('rs7904579', 'rs138331350'), keep_association_id = TRUE) ## End(Not run)
## Not run: # Map GWAS variant identifiers to EFO trait identifiers variant_to_trait(c('rs7904579', 'rs138331350')) # Map GWAS variant identifiers to EFO trait identifiers # but keep the intermediate association identifier variant_to_trait(c('rs7904579', 'rs138331350'), keep_association_id = TRUE) ## End(Not run)
The variants object consists of four slots, each a table
(tibble
), that combined form a relational database of a
subset of GWAS Catalog variants. Each variant is an observation (row) in the
variants
table — main table. All tables have the column
variant_id
as primary key.
variants
A tibble
listing variants. Columns:
Variant identifier, e.g., 'rs1333048'
.
Whether this SNP has been merged with another SNP in a newer genome build.
Class according to Ensembl's predicted consequences that each variant allele may have on transcripts. See Ensembl Variation - Calculated variant consequences.
Chromosome name.
Chromosome position.
Last time this variant was updated.
genomic_contexts
A tibble
listing genomic
contexts associated with each variant. Columns:
Variant identifier.
Gene symbol according to HUGO Gene Nomenclature (HGNC).
Chromosome name.
Chromosome position.
Genomic distance between the variant and the gene (in base pairs).
Whether this is a mapped gene to this variant. A mapped gene is either an overlapping gene with the variant or the two closest genes upstream and downstream of the variant. Moreover, only genes whose mapping source is 'Ensembl' are considered.
Whether this is the closest gene to this variant.
Whether this variant is intergenic, i.e, if there is no gene up or downstream within 100kb.
Whether this variant is upstream of this gene.
Whether this variant is downstream of this gene.
Gene mapping source, either Ensembl
or NCBI
.
Gene mapping method.
ensembl_ids
A tibble
listing gene Ensembl
identifiers associated with each genomic context. Columns:
Variant identifier.
Gene symbol according to HUGO Gene Nomenclature (HGNC).
The Ensembl identifier of an Ensembl gene, see Section Gene annotation in Ensembl for more information.
entrez_ids
A tibble
listing gene Entrez
identifiers associated with each genomic context. Columns:
Variant identifier.
Gene symbol according to HUGO Gene Nomenclature (HGNC).
The Entrez identifier of a gene, see ref. doi:10.1093/nar/gkq1237 for more information.
This function exports a GWAS Catalog object to Microsoft Excel xlsx file. Each table (slot) is saved in its own sheet.
write_xlsx(x, file = stop("`file` must be specified"))
write_xlsx(x, file = stop("`file` must be specified"))
x |
A studies, associations, variants or traits object. |
file |
A file name to write to. |
Although this function is run for its side effect of writing an xlsx file, the path to the exported file is returned.
# Initial setup .old_wd <- setwd(tempdir()) # Save an `associations` object, e.g. `associations_ex01`, to xlsx. write_xlsx(associations_ex01, "associations.xlsx") # Cleanup unlink("associations.xlsx") setwd(.old_wd)
# Initial setup .old_wd <- setwd(tempdir()) # Save an `associations` object, e.g. `associations_ex01`, to xlsx. write_xlsx(associations_ex01, "associations.xlsx") # Cleanup unlink("associations.xlsx") setwd(.old_wd)