vignettes/rols.Rmd
rols.Rmd
Abstract
How to query the Ontology Lookup Service directly from R and how to create and parse controlled vocabulary.
rols is a Bioconductor package and should hence be installed using the dedicated functionality
## try http:// if https:// URLs are not supported
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("rols")
To get help, either post your question on the Bioconductor support site or open an issue on the rols github page.
The Ontology Lookup Service (OLS) [1, 2] is originally spin-off of the PRoteomics IDEntifications database (PRIDE) service, located at the EBI, and is now developed and maintained by the Samples, Phenotypes and Ontologies team at EMBL-EBI.
The OLS provides a REST interface to hundreds of ontologies from a single location with a unified output format. The rols package make this possible from within R. Do do so, it relies on the httr package to query the REST interface, and access and retrieve data.
There are 244 ontologies available in the OLS, listed in the table below. Their name is to be use to defined which ontology to query.
The rols package is build around a few classes that enable to query the OLS and retrieve, store and manipulate data. Each of these classes are described in more details in their respective manual pages. We start by loading the package.
The Ontology
and Ontologies
classes can
store information about single of multiple ontologies. The latter can be
easily subset using [
and [[
, as one would for
lists.
ol <- Ontologies()
ol
## Object of class 'Ontologies' with 244 entries
## AISM, AMPHX ... CPONT, CCF
head(olsNamespace(ol))
## aism amphx ado apo agro apollo_sv
## "aism" "amphx" "ado" "apo" "agro" "apollo_sv"
ol[["go"]]
## Ontology: (go)
##
## Loaded: Updated: 2023-10-09 Version:
## 0 terms 0 properties 0 individuals
It is also possible to initialise a single ontology
go <- Ontology("go")
go
## Ontology: (go)
##
## Loaded: Updated: 2023-10-09 Version:
## 0 terms 0 properties 0 individuals
Single ontology terms are stored in Term
objects. When
more terms need to be manipulated, they are stored as Terms
objects. It is easy to obtain all terms of an ontology of interest, and
the resulting Terms
object can be subset using
[
and [[
, as one would for lists.
gotrms <- terms(go) ## or terms("go")
gotrms
## Object of class 'Terms' with 51053 entries
## From the GO ontology
## GO:0005230, GO:0015276 ... GO:0032942, GO:0032947
gotrms[1:10]
## Object of class 'Terms' with 10 entries
## From the GO ontology
## GO:0005230, GO:0015276 ... GO:0001819, GO:0044831
gotrms[["GO:0090575"]]
## A Term from the GO ontology: GO:0090575
## Label: RNA polymerase II transcription regulator complex
## No description
It is also possible to initialise a single term
## [1] "GO:0090575"
termLabel(trm)
## [1] "RNA polymerase II transcription regulator complex"
It is then possible to extract the ancestors
,
descendants
, parents
and children
terms. Each of these functions return a Terms
object
parents(trm)
## Object of class 'Terms' with 2 entries
## From the GO ontology
## GO:0005667, GO:0140513
children(trm)
## Object of class 'Terms' with 38 entries
## From the GO ontology
## GO:0062071, GO:0008230 ... GO:0034718, GO:0030232
Similarly, the partOf
and derivesFrom
functions return, for an input term, the terms it is a part of and
derived from.
Finally, a single term or terms object can be coerced to a
data.frame
using as(x, "data.frame")
.
Properties (relationships) of single or multiple terms or complete
ontologies can be queries with the properties
method, as
briefly illustrated below.
trm <- term("uberon", "UBERON:0002107")
trm
## A Term from the UBERON ontology: UBERON:0002107
## Label: liver
## No description
p <- properties(trm)
p
## Object of class 'Properties' with 147 entries
## From the UBERON ontology
## digestive system gland, abdomen element ... liver lobule, Vertebrata <vertebrates>
p[[1]]
## A Property from the UBERON ontology: UBERON:0006925
## Label: digestive system gland
termLabel(p[[1]])
## [1] "digestive system gland"
A researcher might be interested in the trans-Golgi network.
Searching the OLS is assured by the OlsSearch
and
olsSearch
classes/functions. The first step is to defined
the search query with OlsSearch
, as shown below. This
creates an search object of class OlsSearch
that stores the
query and its parameters. In records the number of requested results
(default is 20) and the total number of possible results (there are
17884 results across all ontologies, in this case). At this stage, the
results have not yet been downloaded, as shown by the 0 responses.
OlsSearch(q = "trans-golgi network")
## Object of class 'OlsSearch':
## query: trans-golgi network
## requested: 20 (out of 17884)
## response(s): 0
17884 results are probably too many to be relevant. Below we show how
to perform an exact search by setting exact = TRUE
, and
limiting the search the the GO ontology by specifying
ontology = "GO"
, or doing both.
OlsSearch(q = "trans-golgi network", exact = TRUE)
## Object of class 'OlsSearch':
## query: trans-golgi network
## requested: 20 (out of 12)
## response(s): 0
OlsSearch(q = "trans-golgi network", ontology = "GO")
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 20 (out of 353)
## response(s): 0
OlsSearch(q = "trans-golgi network", ontology = "GO", exact = TRUE)
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 20 (out of 1)
## response(s): 0
One case set the rows
argument to set the number of
desired results.
OlsSearch(q = "trans-golgi network", ontology = "GO", rows = 200)
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 200 (out of 353)
## response(s): 0
Alternatively, one can call the allRows
function to
request all results.
(tgnq <- OlsSearch(q = "trans-golgi network", ontology = "GO"))
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 20 (out of 353)
## response(s): 0
(tgnq <- allRows(tgnq))
## Object of class 'OlsSearch':
## ontolgy: GO
## query: trans-golgi network
## requested: 353 (out of 353)
## response(s): 0
Let’s proceed with the exact search and retrieve the results. Even if
we request the default 20 results, only the 12 relevant result will be
retrieved. The olsSearch
function updates the previously
created object (called qry
below) by adding the results to
it.
## Object of class 'OlsSearch':
## query: trans-golgi network
## requested: 20 (out of 12)
## response(s): 12
We can now transform this search result object into a fully fledged
Terms
object or a data.frame
.
(qtrms <- as(qry, "Terms"))
## Warning in asMethod(object): 1 term failed to be instantiated.
## Object of class 'Terms' with 11 entries
## From 8 ontologies
## NCIT:C33802, OMIT:0020822 ... GO:0005802, GO:0005802
## 'data.frame': 12 obs. of 10 variables:
## $ id : chr "ncit:class:http://purl.obolibrary.org/obo/NCIT_C33802" "omit:class:http://purl.obolibrary.org/obo/OMIT_0020822" "go:class:http://purl.obolibrary.org/obo/GO_0005802" "cco:http://purl.obolibrary.org/obo/GO_0005802" ...
## $ iri : chr "http://purl.obolibrary.org/obo/NCIT_C33802" "http://purl.obolibrary.org/obo/OMIT_0020822" "http://purl.obolibrary.org/obo/GO_0005802" "http://purl.obolibrary.org/obo/GO_0005802" ...
## $ short_form : chr "NCIT_C33802" "OMIT_0020822" "GO_0005802" "GO_0005802" ...
## $ obo_id : chr "NCIT:C33802" "OMIT:0020822" "GO:0005802" "GO:0005802" ...
## $ label : chr "Trans-Golgi Network" "trans-Golgi Network" "trans-Golgi network" "trans-Golgi network" ...
## $ description :List of 12
## ..$ : chr "A network of membrane components where vesicles bud off the Golgi apparatus to bring proteins, membranes and ot"| __truncated__
## ..$ : NULL
## ..$ : NULL
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## ..$ : chr "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__
## $ ontology_name : chr "ncit" "omit" "go" "cco" ...
## $ ontology_prefix : chr "NCIT" "OMIT" "GO" "CCO" ...
## $ type : chr "class" "class" "class" "class" ...
## $ is_defining_ontology: logi TRUE TRUE TRUE FALSE FALSE FALSE ...
In this case, we can see that we actually retrieve the same term used across different ontologies. In such cases, it might be useful to keep only non-redundant term instances. Here, this would have been equivalent to searching the ncit, omit, go ontology
qtrms <- unique(qtrms)
termOntology(qtrms)
## NCIT:C33802 OMIT:0020822 GO:0005802
## "ncit" "omit" "go"
termNamespace(qtrms)
## $`NCIT:C33802`
## NULL
##
## $`OMIT:0020822`
## NULL
##
## $`GO:0005802`
## [1] "cellular_component"
Below, we execute the same query using the GO.db package.
library("GO.db")
GOTERM[["GO:0005802"]]
## GOID: GO:0005802
## Term: trans-Golgi network
## Ontology: CC
## Definition: The network of interconnected tubular and cisternal
## structures located within the Golgi apparatus on the side distal to
## the endoplasmic reticulum, from which secretory vesicles emerge.
## The trans-Golgi network is important in the later stages of protein
## secretion where it is thought to play a key role in the sorting and
## targeting of secreted proteins to the correct destination.
## Synonym: TGN
## Synonym: trans Golgi network
## Synonym: Golgi trans face
## Synonym: Golgi trans-face
## Synonym: late Golgi
## Synonym: maturing face
## Synonym: trans face
It is possible to observe different results with rols and GO.db, as a result of the different ways they access the data. rols or biomaRt perform direct online queries, while GO.db and other annotation packages use database snapshot that are updated every release.
Both approaches have advantages. While online queries allow to obtain
the latest up-to-date information, such approaches rely on network
availability and quality. If reproducibility is a major issue, the
version of the database to be queried can easily be controlled with
off-line approaches. In the case of rols,
although the load date of a specific ontology can be queried with
olsVersion
, it is not possible to query a specific version
of an ontology.
rols 2.0 has
substantially changed. While the table below shows some correspondence
between the old and new interface, this is not always the case. The new
interface relies on the Ontology
/Ontologies
,
Term
/Terms
and OlsSearch
classes,
that need to be instantiated and can then be queried, as described
above.
version < 1.99 | version >= 1.99 |
---|---|
ontologyLoadDate |
olsLoaded and olsUpdated
|
ontologyNames |
Ontologies |
olsVersion |
olsVersion |
allIds |
terms |
isIdObsolete |
isObsolete |
rootId |
olsRoot |
olsQuery |
OlsSearch and olsSearch
|
Not all functionality is currently available. If there is anything that you need but not available in the new version, please contact the maintained by opening an issue on the package development site.
The CVParam
class is used to handle controlled
vocabulary. It can be used for user-defined parameters
CVParam(name = "A user param", value = "the value")
## [, , A user param, the value]
or official controlled vocabulary (which triggers a query to the OLS service)
CVParam(label = "MS", accession = "MS:1000073")
## [MS, MS:1000073, electrospray ionization, ]
See ?CVParam
for more details and examples.
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DT_0.30 rols_2.29.1 GO.db_3.18.0
## [4] AnnotationDbi_1.63.2 IRanges_2.35.3 S4Vectors_0.39.3
## [7] Biobase_2.61.0 BiocGenerics_0.47.0 BiocStyle_2.29.2
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.41.4 xfun_0.40 bslib_0.5.1
## [4] htmlwidgets_1.6.2 vctrs_0.6.4 tools_4.3.1
## [7] crosstalk_1.2.0 bitops_1.0-7 curl_5.1.0
## [10] RSQLite_2.3.1 blob_1.2.4 pkgconfig_2.0.3
## [13] desc_1.4.2 lifecycle_1.0.3 GenomeInfoDbData_1.2.10
## [16] compiler_4.3.1 stringr_1.5.0 textshaping_0.3.7
## [19] Biostrings_2.69.2 progress_1.2.2 GenomeInfoDb_1.37.6
## [22] htmltools_0.5.6.1 sass_0.4.7 RCurl_1.98-1.12
## [25] yaml_2.3.7 pkgdown_2.0.7.9000 crayon_1.5.2
## [28] jquerylib_0.1.4 ellipsis_0.3.2 cachem_1.0.8
## [31] digest_0.6.33 stringi_1.7.12 purrr_1.0.2
## [34] bookdown_0.35 rprojroot_2.0.3 fastmap_1.1.1
## [37] cli_3.6.1 magrittr_2.0.3 prettyunits_1.2.0
## [40] bit64_4.0.5 rmarkdown_2.25 XVector_0.41.1
## [43] httr_1.4.7 bit_4.0.5 ragg_1.2.6
## [46] png_0.1-8 hms_1.1.3 memoise_2.0.1
## [49] evaluate_0.22 knitr_1.44 rlang_1.1.1
## [52] glue_1.6.2 DBI_1.1.3 BiocManager_1.30.22
## [55] jsonlite_1.8.7 R6_2.5.1 systemfonts_1.0.5
## [58] fs_1.6.3 zlibbioc_1.47.0