Classification parameter optimisation for the KNN implementation of Wu and Dietterich's transfer learning schema
Usage
knntlOptimisation(
primary,
auxiliary,
fcol = "markers",
k,
times = 50,
test.size = 0.2,
xval = 5,
by = 0.5,
length.out,
th,
xfolds,
BPPARAM = BiocParallel::bpparam(),
method = "Breckels",
log = FALSE,
seed
)
Arguments
- primary
An instance of class
"MSnSet"
.- auxiliary
An instance of class
"MSnSet"
.- fcol
The feature meta-data containing marker definitions. Default is
markers
.- k
Numeric vector of length 2, containing the best
k
parameters to use for the primary (k[1]
) and auxiliary (k[2]
) datasets. SeeknnOptimisation
for generating bestk
.- times
The number of times cross-validation is performed. Default is 50.
- test.size
The size of test (validation) data. Default is 0.2 (20 percent).
- xval
The number of rounds of cross-validation to perform.
- by
The increment for theta, must be one of
c(1, 0.5, 0.25, 0.2, 0.15, 0.1, 0.05)
- length.out
Alternative to using
by
parameter. Specifies the desired length of the sequence of theta to test.- th
A matrix of theta values to test for each class as generated from the function
thetas
, the number of columns should be equal to the number of classes contained infcol
. Note: columns will be ordered according togetMarkerClasses(primary, fcol)
. This argument is only valid if the default method 'Breckels' is used.- xfolds
Option to pass specific folds for the cross validation.
- BPPARAM
Required for parallelisation. If not specified selects a default
BiocParallelParam
, from global options or, if that fails, the most recently registered() back-end.- method
The k-NN transfer learning method to use. The default is 'Breckels' as described in the Breckels et al (2016). If 'Wu' is specificed then the original method implemented Wu and Dietterich (2004) is implemented.
- log
A
logical
defining whether logging should be enabled. Default isFALSE
. Note that logging produes considerably bigger objects.- seed
The optional random number generator seed.
Value
A list of containing the theta combinations tested, associated macro F1 score and accuracy for each combination over each round (specified by times).
Details
knntlOptimisation
implements a variation of Wu and
Dietterich's transfer learning schema: P. Wu and
T. G. Dietterich. Improving SVM accuracy by training on auxiliary
data sources. In Proceedings of the Twenty-First International
Conference on Machine Learning, pages 871 - 878. Morgan Kaufmann,
2004. A grid search for the best theta is performed.
References
Breckels LM, Holden S, Wonjar D, Mulvey CM, Christoforou A, Groen AJ, Kohlbacher O, Lilley KS, Gatto L. Learning from heterogeneous data sources: an application in spatial proteomics. bioRxiv. doi: http://dx.doi.org/10.1101/022152
Wu P, Dietterich TG. Improving SVM Accuracy by Training on Auxiliary Data Sources. Proceedings of the 21st International Conference on Machine Learning (ICML); 2004.
See also
knntlClassification
and example therein.