Introduction

The fmsne package

The fmsne R package implements the fast multi-scale neighbour embedding methods developed by Cyril de Bodt.

The fast algorithms which are implemented are described in the article Fast Multiscale Neighbor Embedding, from Cyril de Bodt, Dounia Mulders, Michel Verleysen and John A. Lee, published in IEEE Transactions on Neural Networks and Learning Systems, in 2020.

The implementations are provided using the python programming language, but involve some C and Cython codes for performance purposes.

If you use the codes in this repository or the article, please cite as (Bodt et al. 2022):

  • C. de Bodt, D. Mulders, M. Verleysen and J. A. Lee, “Fast Multiscale Neighbor Embedding,” in IEEE Transactions on Neural Networks and Learning Systems, 2020, doi: 10.1109/TNNLS.2020.3042807.

and this package:

citation("fmsne")
#> Warning in citation("fmsne"): could not determine year for 'fmsne' from package
#> DESCRIPTION file
#> To cite package 'fmsne' in publications use:
#> 
#>   Cyril de Bodt, Laurent Gatto (????). _fmsne: Fast Multi-scale
#>   Neighbour Embedding_. R package version 0.8.1.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {fmsne: Fast Multi-scale Neighbour Embedding},
#>     author = {{Cyril de Bodt} and {Laurent Gatto}},
#>     note = {R package version 0.8.1},
#>   }

Installation instructions

To install this R package, simply run

BiocManager::install("lgatto/fmsne")

The package on the following Bioconductor packages:

  • SingleCellExperiment for the infrastructure to hold the single-cell and reduced dimension data.

  • basilisk to install and run the underlying Python implementation.

If you are looking to apply fast multi-scale neighbor embedding in Pyhton, you can install the fmsne python package with

pip install fmsne

Package functionality

Neighbor Embedding

  • runMSSNE(): nonlinear dimensionality reduction through multi-scale (MS) stochastic neighbor embedding (SNE) (Maaten and Hinton 2008; Van Der Maaten 2014), as presented in the reference (Lee, Peluffo-Ordóñez, and Verleysen 2015) below and summarized in (Bodt et al. 2022).

  • runMSTSNE(): nonlinear dimensionality reduction through multi-scale t-distributed SNE (t-SNE) (Maaten and Hinton 2008; Van Der Maaten 2014), as presented in the reference (Bodt et al., n.d.) below and summarized in (Bodt et al. 2022).

  • runFMSSNE(): nonlinear dimensionality reduction through fast multi-scale SNE (FMS SNE), as presented in the reference (Bodt et al. 2022).

  • runFMSTSNE(): nonlinear dimensionality reduction through fast multi-scale t-SNE (FMS t-SNE), as presented in the reference (Bodt et al. 2022).

See the function manual pages for further details

Quality control

  • drQuality(): unsupervised evaluation of the quality of a low-dimensional embedding, as introduced in (Lee and Verleysen 2009, 2010) and applied and summarized in (Bodt et al. 2022; Lee, Peluffo-Ordóñez, and Verleysen 2015; Lee et al. 2013). This function assesses the dimensionality reduction quality measuring the neighborhood preservation from the high-dimensional space to the low-dimensional one. The documentation of the function explains the meaning of the criteria and how to interpret them.

Session information

#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] BiocStyle_2.29.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.4         cli_3.6.1           knitr_1.44         
#>  [4] rlang_1.1.1         xfun_0.40           stringi_1.7.12     
#>  [7] purrr_1.0.2         textshaping_0.3.7   jsonlite_1.8.7     
#> [10] glue_1.6.2          rprojroot_2.0.3     htmltools_0.5.6.1  
#> [13] ragg_1.2.6          sass_0.4.7          rmarkdown_2.25     
#> [16] evaluate_0.22       jquerylib_0.1.4     fastmap_1.1.1      
#> [19] lifecycle_1.0.3     yaml_2.3.7          memoise_2.0.1      
#> [22] bookdown_0.35       BiocManager_1.30.22 stringr_1.5.0      
#> [25] compiler_4.3.1      fs_1.6.3            systemfonts_1.0.5  
#> [28] digest_0.6.33       R6_2.5.1            magrittr_2.0.3     
#> [31] bslib_0.5.1         tools_4.3.1         pkgdown_2.0.7.9000 
#> [34] cachem_1.0.8        desc_1.4.2

References

See also this shared bibliography (with pdfs).

Bodt, Cyril de, Dounia Mulders, Michel Verleysen, and John A Lee. n.d. “Perplexity-Free t-SNE and Twice Student tt-SNE.” In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 123–28. Bruges.
Bodt, Cyril de, Dounia Mulders, Michel Verleysen, and John Aldo Lee. 2022. “Fast Multiscale Neighbor Embedding.” IEEE Trans Neural Netw Learn Syst 33 (4): 1546–60.
Lee, John A, Diego H Peluffo-Ordóñez, and Michel Verleysen. 2015. “Multi-Scale Similarities in Stochastic Neighbour Embedding: Reducing Dimensionality While Preserving Both Local and Global Structure.” Neurocomputing 169 (December): 246–61.
Lee, John A, Emilie Renard, Guillaume Bernard, Pierre Dupont, and Michel Verleysen. 2013. “Type 1 and 2 Mixtures of Kullback–Leibler Divergences as Cost Functions in Dimensionality Reduction Based on Similarity Preservation.” Neurocomputing 112 (July): 92–108.
Lee, John A, and Michel Verleysen. 2009. “Quality Assessment of Dimensionality Reduction: Rank-Based Criteria.” Neurocomputing 72 (7): 1431–43.
———. 2010. “Scale-Independent Quality Criteria for Dimensionality Reduction.” Pattern Recognit. Lett. 31 (14): 2248–57.
Maaten, Laurens van der, and Geoffrey Hinton. 2008. “Visualizing Data Using t-SNE.” J. Mach. Learn. Res. 9 (86): 2579–2605.
Van Der Maaten, Laurens. 2014. “Accelerating t-SNE Using Tree-Based Algorithms.” J. Mach. Learn. Res. 15 (1): 3221–45.