Localisation of proteins using the TAGM MCMC method
Source:R/machinelearning-functions-tagm-mcmc.R
tagm-mcmc.Rd
These functions implement the T augmented Gaussian mixture (TAGM) model for mass spectrometry-based spatial proteomics datasets using Markov-chain Monte-Carlo (MCMC) for inference.
Usage
tagmMcmcTrain(
object,
fcol = "markers",
method = "MCMC",
numIter = 1000L,
burnin = 100L,
thin = 5L,
mu0 = NULL,
lambda0 = 0.01,
nu0 = NULL,
S0 = NULL,
beta0 = NULL,
u = 2,
v = 10,
numChains = 4L,
BPPARAM = BiocParallel::bpparam()
)
tagmMcmcPredict(
object,
params,
fcol = "markers",
probJoint = FALSE,
probOutlier = TRUE
)
tagmPredict(
object,
params,
fcol = "markers",
probJoint = FALSE,
probOutlier = TRUE
)
tagmMcmcProcess(params)
Arguments
- object
An
MSnbase::MSnSet
containing the spatial proteomics data to be passed totagmMcmcTrain
andtagmPredict
.- fcol
The feature meta-data containing marker definitions. Default is
markers
.- method
A
charachter()
describing the inference method for the TAGM algorithm. Default is"MCMC"
.- numIter
The number of iterations of the MCMC algorithm. Default is 1000.
- burnin
The number of samples to be discarded from the begining of the chain. Default is 100.
- thin
The thinning frequency to be applied to the MCMC chain. Default is 5.
- mu0
The prior mean. Default is
colMeans
of the expression data.- lambda0
The prior shrinkage. Default is 0.01.
- nu0
The prior degreed of freedom. Default is
ncol(exprs(object)) + 2
- S0
The prior inverse-wishart scale matrix. Empirical prior used by default.
- beta0
The prior Dirichlet distribution concentration. Default is 1 for each class.
- u
The prior shape parameter for Beta(u, v). Default is 2
- v
The prior shape parameter for Beta(u, v). Default is 10.
- numChains
The number of parrallel chains to be run. Default it 4.
- BPPARAM
Support for parallel processing using the
BiocParallel
infrastructure. When missing (default), the default registeredBiocParallelParam
parameters are used. Alternatively, one can pass a validBiocParallelParam
parameter instance:SnowParam
,MulticoreParam
,DoparParam
, ... see theBiocParallel
package for details.- params
An instance of class
MCMCParams
, as generated bytagmMcmcTrain()
.- probJoint
A
logical(1)
indicating whether to return the joint probability matrix, i.e. the probability for all classes as a newtagm.mcmc.joint
feature variable.- probOutlier
A
logical(1)
indicating whether to return the probability of being an outlier as a newtagm.mcmc.outlier
feature variable. A high value indicates that the protein is unlikely to belong to any annotated class (and is hence considered an outlier).
Value
tagmMcmcTrain
returns an instance of class
MCMCParams
.
tagmMcmcPredict
returns an instance of class
MSnbase::MSnSet
containing the localisation predictions as
a new tagm.mcmc.allocation
feature variable. The allocation
probability is encoded as tagm.mcmc.probability
(corresponding to the mean of the distribution
probability). In additionm the upper and lower quantiles of
the allocation probability distribution are available as
tagm.mcmc.probability.lowerquantile
and
tagm.mcmc.probability.upperquantile
feature variables. The
Shannon entropy is available in the tagm.mcmc.mean.shannon
feature variable, measuring the uncertainty in the allocations
(a high value representing high uncertainty; the highest value
is the natural logarithm of the number of classes).
tagmMcmcProcess
returns an instance of class
MCMCParams
with its summary slot populated.
Details
The tagmMcmcTrain
function generates the samples from the
posterior distributions (object or class MCMCParams
) based on an
annotated quantitative spatial proteomics dataset (object of class
MSnbase::MSnSet
). Both are then passed to the tagmPredict
function to predict the sub-cellular localisation of protein of
unknown localisation. See the pRoloc-bayesian vignette for
details and examples. In this implementation, if numerical instability
is detected in the covariance matrix of the data a small multiple of
the identity is added. A message is printed if this conditioning step
is performed.
References
A Bayesian Mixture Modelling Approach For Spatial Proteomics Oliver M Crook, Claire M Mulvey, Paul D. W. Kirk, Kathryn S Lilley, Laurent Gatto bioRxiv 282269; doi: https://doi.org/10.1101/282269
See also
The plotEllipse()
function can be used to visualise
TAGM models on PCA plots with ellipses.