public class TrainingUtils extends Object
Modifier and Type | Class and Description |
---|---|
static interface |
TrainingUtils.ParameterSweep<C extends Classifier>
This is an interface to define a parameter sweep on some classifier.
|
Constructor and Description |
---|
TrainingUtils() |
Modifier and Type | Method and Description |
---|---|
static void |
balanceDataSet(ArrayList<LabeledDataPoint> data)
Preprocesses the given data and enforces balanced classes by
inserting data points of the under-represented label multiple
times in the given list.
|
static ClassifierEvaluation |
crossValidation(ArrayList<LabeledDataPoint> trainingData,
Classifier classifier,
int folds,
int times,
Comparator<ClassifierEvaluation> comparator,
boolean verbose)
Trains the given classifier using crossvalidation and taking the result
with the best evaluation result according to the given comparator.
|
static <C extends Classifier> |
crossValidationSweep(ArrayList<LabeledDataPoint> trainingData,
C classifier,
int folds,
int times,
TrainingUtils.ParameterSweep<C> sweep,
Comparator<ClassifierEvaluation> comparator,
boolean verbose)
Does the cross validation but does the given parameter sweep in an outer
loop.
|
static <C extends Classifier> |
simpleParameterSweep(ArrayList<LabeledDataPoint> trainingData,
C classifier,
double testRatio,
TrainingUtils.ParameterSweep<C> sweep,
int times,
Comparator<ClassifierEvaluation> comparator,
boolean verbose)
Does the simple training but does the given parameter sweep in an outer
loop.
|
static ClassifierEvaluation |
simpleTraining(ArrayList<LabeledDataPoint> trainingData,
Classifier classifier,
double testRatio,
int times,
Comparator<ClassifierEvaluation> comparator,
boolean verbose)
Chooses randomly testRatio datapoints for testing, trains the classifier
with the rest and reports precision, recall and accuracy for the test
set.
|
public static void balanceDataSet(ArrayList<LabeledDataPoint> data)
data
- data points.public static ClassifierEvaluation simpleTraining(ArrayList<LabeledDataPoint> trainingData, Classifier classifier, double testRatio, int times, Comparator<ClassifierEvaluation> comparator, boolean verbose)
trainingData
- the original training data. This list will be
manipulated!
If you want to keep your data, construct a copy!classifier
- the classifier that shall be trained.testRatio
- the ratio of test to training data (between 0 and 1). A
value of about 0.1 to 0.2 is recommended.times
- defines how many times the training process should be
repeated.comparator
- defines an optimality criterion to choose between
repeated runs.public static ClassifierEvaluation crossValidation(ArrayList<LabeledDataPoint> trainingData, Classifier classifier, int folds, int times, Comparator<ClassifierEvaluation> comparator, boolean verbose)
trainingData
- the original training data. This list will be
manipulated!
If you want to keep your data, construct a copy!classifier
- the classifier that shall be trained.folds
- the number of folds for a crossvalidation.times
- sets how often the training should be repeated within one
fold.comparator
- a comparator for ClassifierEvaluation.public static <C extends Classifier> ClassifierEvaluation simpleParameterSweep(ArrayList<LabeledDataPoint> trainingData, C classifier, double testRatio, TrainingUtils.ParameterSweep<C> sweep, int times, Comparator<ClassifierEvaluation> comparator, boolean verbose)
C
- the classifier class.trainingData
- the original training data.classifier
- the classifier that shall be trained.testRatio
- the ratio of test to training data (between 0 and 1). A
value of about 0.1 to 0.2 is recommended.sweep
- the parameter sweep.times
- sets how many times a training run should be done for one
setting.comparator
- a comparator for ClassifierEvaluation.public static <C extends Classifier> ClassifierEvaluation crossValidationSweep(ArrayList<LabeledDataPoint> trainingData, C classifier, int folds, int times, TrainingUtils.ParameterSweep<C> sweep, Comparator<ClassifierEvaluation> comparator, boolean verbose)
C
- the classifier class.trainingData
- the original training data.classifier
- the classifier that shall be trained.folds
- the number of folds for a crossvalidation.times
- sets how often the training should be repeated within one
fold.sweep
- the parameter sweep.comparator
- a comparator for ClassifierEvaluation.Copyright © 2014. All rights reserved.