Localisation of proteins using the TAGM MCMC method
Source:R/machinelearning-functions-tagm-mcmc.R
tagm-mcmc.RdThese functions implement the T augmented Gaussian mixture (TAGM) model for mass spectrometry-based spatial proteomics datasets using Markov-chain Monte-Carlo (MCMC) for inference.
Usage
tagmMcmcTrain(
object,
fcol = "markers",
method = "MCMC",
numIter = 1000L,
burnin = 100L,
thin = 5L,
mu0 = NULL,
lambda0 = 0.01,
nu0 = NULL,
S0 = NULL,
beta0 = NULL,
u = 2,
v = 10,
numChains = 4L,
BPPARAM = BiocParallel::bpparam()
)
tagmMcmcPredict(
object,
params,
fcol = "markers",
probJoint = FALSE,
probOutlier = TRUE
)
tagmPredict(
object,
params,
fcol = "markers",
probJoint = FALSE,
probOutlier = TRUE
)
tagmMcmcProcess(params)Arguments
- object
An
MSnbase::MSnSetcontaining the spatial proteomics data to be passed totagmMcmcTrainandtagmPredict.- fcol
The feature meta-data containing marker definitions. Default is
markers.- method
A
charachter()describing the inference method for the TAGM algorithm. Default is"MCMC".- numIter
The number of iterations of the MCMC algorithm. Default is 1000.
- burnin
The number of samples to be discarded from the begining of the chain. Default is 100.
- thin
The thinning frequency to be applied to the MCMC chain. Default is 5.
- mu0
The prior mean. Default is
colMeansof the expression data.- lambda0
The prior shrinkage. Default is 0.01.
- nu0
The prior degreed of freedom. Default is
ncol(exprs(object)) + 2- S0
The prior inverse-wishart scale matrix. Empirical prior used by default.
- beta0
The prior Dirichlet distribution concentration. Default is 1 for each class.
- u
The prior shape parameter for Beta(u, v). Default is 2
- v
The prior shape parameter for Beta(u, v). Default is 10.
- numChains
The number of parrallel chains to be run. Default it 4.
- BPPARAM
Support for parallel processing using the
BiocParallelinfrastructure. When missing (default), the default registeredBiocParallelParamparameters are used. Alternatively, one can pass a validBiocParallelParamparameter instance:SnowParam,MulticoreParam,DoparParam, ... see theBiocParallelpackage for details.- params
An instance of class
MCMCParams, as generated bytagmMcmcTrain().- probJoint
A
logical(1)indicating whether to return the joint probability matrix, i.e. the probability for all classes as a newtagm.mcmc.jointfeature variable.- probOutlier
A
logical(1)indicating whether to return the probability of being an outlier as a newtagm.mcmc.outlierfeature variable. A high value indicates that the protein is unlikely to belong to any annotated class (and is hence considered an outlier).
Value
tagmMcmcTrain returns an instance of class
MCMCParams.
tagmMcmcPredict returns an instance of class
MSnbase::MSnSet containing the localisation predictions as
a new tagm.mcmc.allocation feature variable. The allocation
probability is encoded as tagm.mcmc.probability
(corresponding to the mean of the distribution
probability). In additionm the upper and lower quantiles of
the allocation probability distribution are available as
tagm.mcmc.probability.lowerquantile and
tagm.mcmc.probability.upperquantile feature variables. The
Shannon entropy is available in the tagm.mcmc.mean.shannon
feature variable, measuring the uncertainty in the allocations
(a high value representing high uncertainty; the highest value
is the natural logarithm of the number of classes).
tagmMcmcProcess returns an instance of class
MCMCParams with its summary slot populated.
Details
The tagmMcmcTrain function generates the samples from the
posterior distributions (object or class MCMCParams) based on an
annotated quantitative spatial proteomics dataset (object of class
MSnbase::MSnSet). Both are then passed to the tagmPredict
function to predict the sub-cellular localisation of protein of
unknown localisation. See the pRoloc-bayesian vignette for
details and examples. In this implementation, if numerical instability
is detected in the covariance matrix of the data a small multiple of
the identity is added. A message is printed if this conditioning step
is performed.
References
A Bayesian Mixture Modelling Approach For Spatial Proteomics Oliver M Crook, Claire M Mulvey, Paul D. W. Kirk, Kathryn S Lilley, Laurent Gatto bioRxiv 282269; doi: https://doi.org/10.1101/282269
See also
The plotEllipse() function can be used to visualise
TAGM models on PCA plots with ellipses.