Abstract

How to query the Ontology Lookup Service directly from R and how to create and parse controlled vocabulary.

Introduction

Installation

rols is a Bioconductor package and should hence be installed using the dedicated functionality

## try http:// if https:// URLs are not supported
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("rols")

Getting help

To get help, either post your question on the Bioconductor support site or open an issue on the r Biocpkg("rols") github page.

The resource

The Ontology Lookup Service (OLS) [1, 2] is originally spin-off of the PRoteomics IDEntifications database (PRIDE) service, located at the EBI, and is now developed and maintained by the Samples, Phenotypes and Ontologies team at EMBL-EBI.

The package

The OLS provides a REST interface to hundreds of ontologies from a single location with a unified output format. The rols package make this possible from within R. Do do so, it relies on the httr2 package to query the REST interface, and access and retrieve data.

There are 255 ontologies available in the OLS, listed in the table below. Their name is to be use to defined which ontology to query.

A Brief rols overview

The rols package is build around a few classes that enable to query the OLS and retrieve, store and manipulate data. Each of these classes are described in more details in their respective manual pages. We start by loading the package.

Ontologies

The Ontology and Ontologies classes can store information about single of multiple ontologies. The latter can be easily subset using [ and [[, as one would for lists.

## ⠙ Iterating 9 done (4.5/s) | 2s
## ⠙ Iterating 13 done (4.8/s) | 2.7s
ol
## Object of class 'olsOntologies' with 255 entries
##    ADO, AGRO ... CCF, CPONT
## [1] "ado"       "agro"      "aism"      "amphx"     "apo"       "apollo_sv"
ol[["bspo"]]
## olsOntology: Biological Spatial Ontology (bspo)  
##   An ontology for respresenting spatial concepts, anatomical axes,
##   gradients, regions, planes, sides and surfaces. These concepts can be
##   used at multiple biological scales and in a diversity of taxa,
##   including plants, animals and fungi. The BSPO is used to provide a
##   source of anatomical location descriptors for logically defining
##   anatomical entity classes in anatomy ontologies.
##    Loaded: 2024-04-24 Updated: 2024-04-24 Version: 2023-05-27 
##    169 terms  236 properties  18 individuals

It is also possible to initialise a single ontology

bspo <- olsOntology("bspo")
bspo
## olsOntology: Biological Spatial Ontology (bspo)  
##   An ontology for respresenting spatial concepts, anatomical axes,
##   gradients, regions, planes, sides and surfaces. These concepts can be
##   used at multiple biological scales and in a diversity of taxa,
##   including plants, animals and fungi. The BSPO is used to provide a
##   source of anatomical location descriptors for logically defining
##   anatomical entity classes in anatomy ontologies.
##    Loaded: 2024-04-24 Updated: 2024-04-24 Version: 2023-05-27 
##    169 terms  236 properties  18 individuals

Terms

Single ontology terms are stored in olsTerm objects. When more terms need to be manipulated, they are stored as olsTerms objects. It is easy to obtain all terms of an ontology of interest, and the resulting olsTerms object can be subset using [ and [[, as one would for lists.

bspotrms <- olsTerms(bspo) ## or olsTerms("bspo")
bspotrms
## Object of class 'olsTerms' with 169 entries
##  From the BSPO ontology
##   BFO:0000002, BFO:0000003 ... IAO:0000409, PATO:0000001
bspotrms[1:10]
## Object of class 'olsTerms' with 10 entries
##  From the BSPO ontology
##   BFO:0000002, BFO:0000003 ... BFO:0000023, BFO:0000031
bspotrms[["BSPO:0000092"]]
## A olsTerm from the BSPO ontology: BSPO:0000092 
##  Label: anatomical compartment boundary
##   to be merged into CARO

It is also possible to initialise a single term

trm <- olsTerm(bspo, "BSPO:0000092")
termId(trm)
## [1] "BSPO:0000092"
## [1] "anatomical compartment boundary"

It is then possible to extract the ancestors, descendants, parents and children terms. Each of these functions return a olsTerms object

parents(trm)
## Object of class 'olsTerms' with 1 entries
##  From the BSPO ontology
## CARO:0000010
## Object of class 'olsTerms' with 6 entries
##  From the BSPO ontology
##   BSPO:0000094, BSPO:0000093 ... BSPO:0000041, BSPO:0000040

Finally, a single term or terms object can be coerced to a data.frame using as(x, "data.frame").

Properties

Properties (relationships) of single or multiple terms or complete ontologies can be queries with the properties method, as briefly illustrated below.

trm <- olsTerm("uberon", "UBERON:0002107")
trm
## A olsTerm from the UBERON ontology: UBERON:0002107 
##  Label: liver
##   An exocrine gland which secretes bile and functions in metabolism of
##   protein and carbohydrate and fat, synthesizes substances involved in
##   the clotting of the blood, synthesizes vitamin A, detoxifies poisonous
##   substances, stores glycogen, and breaks down worn-out erythrocytes[GO].
p <- olsProperties(trm)
p
## Object of class 'olsProperties' with 160 entries
##  From the UBERON ontology
##   hepatobiliary system, exocrine system ... liver serosa, liver subserosa
p[[1]]
## A olsProperty from the UBERON ontology: UBERON:0002423 
##  Label: hepatobiliary system
termLabel(p[[1]])
## [1] "hepatobiliary system"

Use case

A researcher might be interested in the trans-Golgi network. Searching the OLS is assured by the OlsSearch and olsSearch classes/functions. The first step is to defined the search query with OlsSearch, as shown below. This creates an search object of class OlsSearch that stores the query and its parameters. In records the number of requested results (default is 20) and the total number of possible results (there are 16850 results across all ontologies, in this case). At this stage, the results have not yet been downloaded, as shown by the 0 responses.

OlsSearch(q = "trans-golgi network")
## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 16850)
##   response(s): 0

16850 results are probably too many to be relevant. Below we show how to perform an exact search by setting exact = TRUE, and limiting the search the the GO ontology by specifying ontology = "GO", or doing both.

OlsSearch(q = "trans-golgi network", exact = TRUE)
## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 217)
##   response(s): 0
OlsSearch(q = "trans-golgi network", ontology = "GO")
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 20 (out of 1097)
##   response(s): 0
OlsSearch(q = "trans-golgi network", ontology = "GO", exact = TRUE)
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 20 (out of 25)
##   response(s): 0

One case set the rows argument to set the number of desired results.

OlsSearch(q = "trans-golgi network", ontology = "GO", rows = 200)
## Object of class 'OlsSearch':
##   ontolgy: GO 
##   query: trans-golgi network 
##   requested: 200 (out of 1097)
##   response(s): 0

See ?OlsSearch for details about retrieving many results.

Let’s proceed with the exact search and retrieve the results. Even if we request the default 20 results, only the 217 relevant result will be retrieved. The olsSearch function updates the previously created object (called qry below) by adding the results to it.

qry <- OlsSearch(q = "trans-golgi network", exact = TRUE)
(qry <- olsSearch(qry))
## Object of class 'OlsSearch':
##   query: trans-golgi network 
##   requested: 20 (out of 217)
##   response(s): 20

We can now transform this search result object into a fully fledged olsTerms object or a data.frame.

(qtrms <- as(qry, "olsTerms"))
## Object of class 'olsTerms' with 20 entries
##  From the NCIT, PR, GO, ZP, PW ontologies
##   NCIT:C33802, PR:O43493 ... GO:0042147, PW:0000426
str(qdrf <- as(qry, "data.frame"))
## 'data.frame':    20 obs. of  8 variables:
##  $ iri            : chr  "http://purl.obolibrary.org/obo/NCIT_C33802" "http://purl.obolibrary.org/obo/PR_O43493" "http://purl.obolibrary.org/obo/GO_0005802" "http://purl.obolibrary.org/obo/GO_0032588" ...
##  $ ontology_name  : chr  "ncit" "pr" "go" "go" ...
##  $ ontology_prefix: chr  "NCIT" "PR" "GO" "GO" ...
##  $ short_form     : chr  "NCIT_C33802" "PR_O43493" "GO_0005802" "GO_0032588" ...
##  $ description    :List of 20
##   ..$ : chr "A network of membrane components where vesicles bud off the Golgi apparatus to bring proteins, membranes and ot"| __truncated__
##   ..$ : chr  "A trans-Golgi network integral membrane protein 2 that is encoded in the genome of human." "Category=organism-gene."
##   ..$ : chr  "The network of interconnected tubular and cisternal structures located within the Golgi apparatus on the side d"| __truncated__ "There are different opinions about whether the TGN should be considered part of the Golgi apparatus or not. We "| __truncated__
##   ..$ : chr "The lipid bilayer surrounding any of the compartments that make up the trans-Golgi network."
##   ..$ : chr "Abnormal(ly) mislocalised (of) enterocyte of trans-Golgi network."
##   ..$ : chr "A vesicle that mediates transport between the trans-Golgi network and other parts of the cell."
##   ..$ : chr "The side (leaflet) of the trans-Golgi network transport vesicle membrane that faces the lumen."
##   ..$ : chr "The side (leaflet) of the trans-Golgi network transport vesicle membrane that faces the cytoplasm."
##   ..$ : chr "The lipid bilayer surrounding a vesicle transporting substances between the trans-Golgi network and other parts of the cell."
##   ..$ : chr "The volume enclosed within the membrane of a trans-Golgi network transport vesicle."
##   ..$ : chr "A clathrin coat found on a vesicle of the trans-Golgi network."
##   ..$ : chr  "A trans-Golgi network integral membrane protein 1 that is encoded in the genome of rat." "Category=organism-gene."
##   ..$ : chr "A process which results in the assembly, arrangement of constituent parts, or disassembly of a trans-Golgi network membrane."
##   ..$ : chr  "A trans-Golgi network integral membrane protein 1 that is encoded in the genome of mouse." "Category=organism-gene."
##   ..$ : chr "The directed movement of substances, in membrane-bounded vesicles, from the trans-Golgi network to the recycling endosomes."
##   ..$ : chr "The directed movement of substances from the plasma membrane back to the trans-Golgi network, mediated by vesicles."
##   ..$ : chr "The directed movement of proteins from the Golgi to the plasma membrane in transport vesicles that move from th"| __truncated__
##   ..$ : chr "The directed movement of substances from the vacuole to the trans-Golgi network; this occurs in yeast via the p"| __truncated__
##   ..$ : chr "The directed movement of membrane-bounded vesicles from endosomes back to the trans-Golgi network where they ar"| __truncated__
##   ..$ : chr "In the secretory pathway, protein sorting, mainly in trans-Golgi Network (TGN), but also in other compartments,"| __truncated__
##  $ label          : chr  "Trans-Golgi Network" "trans-Golgi network integral membrane protein 2 (human)" "trans-Golgi network" "trans-Golgi network membrane" ...
##  $ obo_id         : chr  "NCIT:C33802" "PR:O43493" "GO:0005802" "GO:0032588" ...
##  $ type           : chr  "class" "class" "class" "class" ...

In this case, we can see that we actually retrieve the same term used across different ontologies. In such cases, it might be useful to keep only non-redundant term instances. Here, this would have been equivalent to searching the ncit, pr, go, go, zp, go, go, go, go, go, go, pr, go, pr, go, go, go, go, go, pw ontology

qtrms <- unique(qtrms)
termOntology(qtrms)
## NCIT:C33802   PR:O43493  GO:0005802  GO:0032588  ZP:0142408  GO:0030140 
##      "ncit"        "pr"        "go"        "go"        "zp"        "go" 
##  GO:0098540  GO:0098541  GO:0012510  GO:0098564  GO:0030130   PR:P19814 
##        "go"        "go"        "go"        "go"        "go"        "pr" 
##  GO:0098629   PR:Q62313  GO:0044795  GO:0035526  GO:0043001  GO:0045018 
##        "go"        "pr"        "go"        "go"        "go"        "go" 
##  GO:0042147  PW:0000426 
##        "go"        "pw"
## $`NCIT:C33802`
## NULL
## 
## $`PR:O43493`
## [1] "protein"
## 
## $`GO:0005802`
## [1] "cellular_component"
## 
## $`GO:0032588`
## [1] "cellular_component"
## 
## $`ZP:0142408`
## NULL
## 
## $`GO:0030140`
## [1] "cellular_component"
## 
## $`GO:0098540`
## [1] "cellular_component"
## 
## $`GO:0098541`
## [1] "cellular_component"
## 
## $`GO:0012510`
## [1] "cellular_component"
## 
## $`GO:0098564`
## [1] "cellular_component"
## 
## $`GO:0030130`
## [1] "cellular_component"
## 
## $`PR:P19814`
## [1] "protein"
## 
## $`GO:0098629`
## [1] "biological_process"
## 
## $`PR:Q62313`
## [1] "protein"
## 
## $`GO:0044795`
## [1] "biological_process"
## 
## $`GO:0035526`
## [1] "biological_process"
## 
## $`GO:0043001`
## [1] "biological_process"
## 
## $`GO:0045018`
## [1] "biological_process"
## 
## $`GO:0042147`
## [1] "biological_process"
## 
## $`PW:0000426`
## [1] "pathway"

Below, we execute the same query using the GO.db package.

library("GO.db")
GOTERM[["GO:0005802"]]
## GOID: GO:0005802
## Term: trans-Golgi network
## Ontology: CC
## Definition: The network of interconnected tubular and cisternal
##     structures located within the Golgi apparatus on the side distal to
##     the endoplasmic reticulum, from which secretory vesicles emerge.
##     The trans-Golgi network is important in the later stages of protein
##     secretion where it is thought to play a key role in the sorting and
##     targeting of secreted proteins to the correct destination.
## Synonym: TGN
## Synonym: trans Golgi network
## Synonym: Golgi trans face
## Synonym: Golgi trans-face
## Synonym: late Golgi
## Synonym: maturing face
## Synonym: trans face

On-line vs. off-line data

It is possible to observe different results with rols and GO.db, as a result of the different ways they access the data. rols or biomaRt perform direct online queries, while GO.db and other annotation packages use database snapshot that are updated every release.

Both approaches have advantages. While online queries allow to obtain the latest up-to-date information, such approaches rely on network availability and quality. If reproducibility is a major issue, the version of the database to be queried can easily be controlled with off-line approaches. In the case of rols, although the load date of a specific ontology can be queried with olsVersion, it is not possible to query a specific version of an ontology.

Changes

Version 2.0

rols 2.0 has substantially changed. While the table below shows some correspondence between the old and new interface, this is not always the case. The new interface relies on the Ontology/Ontologies, olsTerm/olsTerms and OlsSearch classes, that need to be instantiated and can then be queried, as described above.

version < 1.99 version >= 1.99
ontologyLoadDate olsLoaded and olsUpdated
ontologyNames Ontologies
olsVersion olsVersion
allIds terms
isIdObsolete isObsolete
rootId olsRoot
olsQuery OlsSearch and olsSearch

Not all functionality is currently available. If there is anything that you need but not available in the new version, please contact the maintained by opening an issue on the package development site.

Version 2.99

  • rols version >= 2.99 has been refactored to use the OLS4 REST API.
  • REST queries now use httr2 instead of superseded httr.
  • The term(s) constructors are capitalised as olsTerm() and olsTerms().
  • The properties constructor is capitalised as Properties().
  • Some class definitions have been updated to accomodate changes in the data received by OLS. Some function have been dropped.
  • The Ontology and Ontologies classes and constructors have been renames olsOntology and olsOntologies to avoid clashes with AnnontationDbi::Ontology().
  • The Term and Terms classes and constructors have been renames olsTerm and olsTerms to avoid clashes with AnnontationDbi::Term().

CVParams

The CVParam class is used to handle controlled vocabulary. It can be used for user-defined parameters

CVParam(name = "A user param", value = "the value")
## [, , A user param, the value]

or official controlled vocabulary (which triggers a query to the OLS service)

CVParam(label = "GO", accession = "GO:0035145")
## [GO, GO:0035145, exon-exon junction complex, ]

See ?CVParam for more details and examples.

Session information

## R version 4.4.0 RC (2024-04-16 r86441)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] DT_0.33              rols_2.99.5          GO.db_3.19.1        
## [4] AnnotationDbi_1.65.2 IRanges_2.37.1       S4Vectors_0.41.7    
## [7] Biobase_2.63.1       BiocGenerics_0.49.1  BiocStyle_2.31.0    
## 
## loaded via a namespace (and not attached):
##  [1] rappdirs_0.3.3          sass_0.4.9              bitops_1.0-7           
##  [4] RSQLite_2.3.6           digest_0.6.35           magrittr_2.0.3         
##  [7] evaluate_0.23           bookdown_0.39           fastmap_1.1.1          
## [10] blob_1.2.4              jsonlite_1.8.8          GenomeInfoDb_1.39.14   
## [13] DBI_1.2.2               BiocManager_1.30.22     httr_1.4.7             
## [16] purrr_1.0.2             crosstalk_1.2.1         UCSC.utils_0.99.7      
## [19] Biostrings_2.71.6       httr2_1.0.1             textshaping_0.3.7      
## [22] jquerylib_0.1.4         cli_3.6.2               rlang_1.1.3            
## [25] crayon_1.5.2            XVector_0.43.1          bit64_4.0.5            
## [28] cachem_1.0.8            yaml_2.3.8              tools_4.4.0            
## [31] memoise_2.0.1           GenomeInfoDbData_1.2.12 curl_5.2.1             
## [34] vctrs_0.6.5             R6_2.5.1                png_0.1-8              
## [37] lifecycle_1.0.4         zlibbioc_1.49.3         KEGGREST_1.43.0        
## [40] fs_1.6.4                htmlwidgets_1.6.4       bit_4.0.5              
## [43] ragg_1.3.0              pkgconfig_2.0.3         desc_1.4.3             
## [46] pkgdown_2.0.9.9000      bslib_0.7.0             glue_1.7.0             
## [49] systemfonts_1.0.6       xfun_0.43               knitr_1.46             
## [52] htmltools_0.5.8.1       rmarkdown_2.26          compiler_4.4.0         
## [55] RCurl_1.98-1.14