Localisation of proteins using the TAGM MCMC method
Source:R/machinelearning-functions-tagm-mcmc.R
      tagm-mcmc.RdThese functions implement the T augmented Gaussian mixture (TAGM) model for mass spectrometry-based spatial proteomics datasets using Markov-chain Monte-Carlo (MCMC) for inference.
Usage
tagmMcmcTrain(
  object,
  fcol = "markers",
  method = "MCMC",
  numIter = 1000L,
  burnin = 100L,
  thin = 5L,
  mu0 = NULL,
  lambda0 = 0.01,
  nu0 = NULL,
  S0 = NULL,
  beta0 = NULL,
  u = 2,
  v = 10,
  numChains = 4L,
  BPPARAM = BiocParallel::bpparam()
)
tagmMcmcPredict(
  object,
  params,
  fcol = "markers",
  probJoint = FALSE,
  probOutlier = TRUE
)
tagmPredict(
  object,
  params,
  fcol = "markers",
  probJoint = FALSE,
  probOutlier = TRUE
)
tagmMcmcProcess(params)Arguments
- object
- An - MSnbase::MSnSetcontaining the spatial proteomics data to be passed to- tagmMcmcTrainand- tagmPredict.
- fcol
- The feature meta-data containing marker definitions. Default is - markers.
- method
- A - charachter()describing the inference method for the TAGM algorithm. Default is- "MCMC".
- numIter
- The number of iterations of the MCMC algorithm. Default is 1000. 
- burnin
- The number of samples to be discarded from the begining of the chain. Default is 100. 
- thin
- The thinning frequency to be applied to the MCMC chain. Default is 5. 
- mu0
- The prior mean. Default is - colMeansof the expression data.
- lambda0
- The prior shrinkage. Default is 0.01. 
- nu0
- The prior degreed of freedom. Default is - ncol(exprs(object)) + 2
- S0
- The prior inverse-wishart scale matrix. Empirical prior used by default. 
- beta0
- The prior Dirichlet distribution concentration. Default is 1 for each class. 
- u
- The prior shape parameter for Beta(u, v). Default is 2 
- v
- The prior shape parameter for Beta(u, v). Default is 10. 
- numChains
- The number of parrallel chains to be run. Default it 4. 
- BPPARAM
- Support for parallel processing using the - BiocParallelinfrastructure. When missing (default), the default registered- BiocParallelParamparameters are used. Alternatively, one can pass a valid- BiocParallelParamparameter instance:- SnowParam,- MulticoreParam,- DoparParam, ... see the- BiocParallelpackage for details.
- params
- An instance of class - MCMCParams, as generated by- tagmMcmcTrain().
- probJoint
- A - logical(1)indicating whether to return the joint probability matrix, i.e. the probability for all classes as a new- tagm.mcmc.jointfeature variable.
- probOutlier
- A - logical(1)indicating whether to return the probability of being an outlier as a new- tagm.mcmc.outlierfeature variable. A high value indicates that the protein is unlikely to belong to any annotated class (and is hence considered an outlier).
Value
tagmMcmcTrain returns an instance of class
MCMCParams.
tagmMcmcPredict returns an instance of class
MSnbase::MSnSet containing the localisation predictions as
a new tagm.mcmc.allocation feature variable. The allocation
probability is encoded as tagm.mcmc.probability
(corresponding to the mean of the distribution
probability). In additionm the upper and lower quantiles of
the allocation probability distribution are available as
tagm.mcmc.probability.lowerquantile and
tagm.mcmc.probability.upperquantile feature variables. The
Shannon entropy is available in the tagm.mcmc.mean.shannon
feature variable, measuring the uncertainty in the allocations
(a high value representing high uncertainty; the highest value
is the natural logarithm of the number of classes).
tagmMcmcProcess returns an instance of class
MCMCParams with its summary slot populated.
Details
The tagmMcmcTrain function generates the samples from the
posterior distributions (object or class MCMCParams) based on an
annotated quantitative spatial proteomics dataset (object of class
MSnbase::MSnSet). Both are then passed to the tagmPredict
function to predict the sub-cellular localisation of protein of
unknown localisation. See the pRoloc-bayesian vignette for
details and examples. In this implementation, if numerical instability
is detected in the covariance matrix of the data a small multiple of
the identity is added. A message is printed if this conditioning step
is performed.
References
A Bayesian Mixture Modelling Approach For Spatial Proteomics Oliver M Crook, Claire M Mulvey, Paul D. W. Kirk, Kathryn S Lilley, Laurent Gatto bioRxiv 282269; doi: https://doi.org/10.1101/282269
See also
The plotEllipse() function can be used to visualise
TAGM models on PCA plots with ellipses.