|
1 | 1 | <?xml version="1.0" encoding="utf-8"?> |
2 | 2 | <doc> |
3 | 3 | <members> |
4 | | - <!-- |
5 | | - The following text describes the FastTree algorithm details. |
6 | | - It's used for the remarks section of all FastTree-based trainers (binary, regression, ranking) |
7 | | - --> |
8 | | - <member name="FastTree_remarks"> |
9 | | - <remarks> |
10 | | - <para> |
11 | | - FastTree is an efficient implementation of the <a href='https://arxiv.org/abs/1505.01866'>MART</a> gradient boosting algorithm. |
12 | | - Gradient boosting is a machine learning technique for regression problems. |
13 | | - It builds each regression tree in a step-wise fashion, using a predefined loss function to measure the error for each step and corrects for it in the next. |
14 | | - So this prediction model is actually an ensemble of weaker prediction models. In regression problems, boosting builds a series of such trees in a step-wise fashion and then selects the optimal tree using an arbitrary differentiable loss function. |
15 | | - </para> |
16 | | - <para> |
17 | | - MART learns an ensemble of regression trees, which is a decision tree with scalar values in its leaves. |
18 | | - A decision (or regression) tree is a binary tree-like flow chart, where at each interior node one decides which of the two child nodes to continue to based on one of the feature values from the input. |
19 | | - At each leaf node, a value is returned. In the interior nodes, the decision is based on the test 'x <= v' where x is the value of the feature in the input sample and v is one of the possible values of this feature. |
20 | | - The functions that can be produced by a regression tree are all the piece-wise constant functions. |
21 | | - </para> |
22 | | - <para> |
23 | | - The ensemble of trees is produced by computing, in each step, a regression tree that approximates the gradient of the loss function, and adding it to the previous tree with coefficients that minimize the loss of the new tree. |
24 | | - The output of the ensemble produced by MART on a given instance is the sum of the tree outputs. |
25 | | - </para> |
26 | | - <list type='bullet'> |
27 | | - <item><description>In case of a binary classification problem, the output is converted to a probability by using some form of calibration.</description></item> |
28 | | - <item><description>In case of a regression problem, the output is the predicted value of the function.</description></item> |
29 | | - <item><description>In case of a ranking problem, the instances are ordered by the output value of the ensemble.</description></item> |
30 | | - </list> |
31 | | - <para>For more information see:</para> |
32 | | - <list type="bullet"> |
33 | | - <item><description><a href='https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting'>Wikipedia: Gradient boosting (Gradient tree boosting).</a></description></item> |
34 | | - <item><description><a href='https://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.aos/1013203451'>Greedy function approximation: A gradient boosting machine.</a></description></item> |
35 | | - </list> |
36 | | - </remarks> |
37 | | - </member> |
38 | | - |
39 | | - <!-- |
40 | | - The following text describes the FastForest algorithm details. |
41 | | - It's used for the remarks section of all FastForest-based trainers (regression) |
42 | | - --> |
43 | | - <member name="FastForest_remarks"> |
44 | | - <remarks> |
45 | | - Decision trees are non-parametric models that perform a sequence of simple tests on inputs. |
46 | | - This decision procedure maps them to outputs found in the training dataset whose inputs were similar to the instance being processed. |
47 | | - A decision is made at each node of the binary tree data structure based on a measure of similarity that maps each instance recursively through the branches of the tree until the appropriate leaf node is reached and the output decision returned. |
48 | | - <para>Decision trees have several advantages:</para> |
49 | | - <list type='bullet'> |
50 | | - <item><description>They are efficient in both computation and memory usage during training and prediction. </description></item> |
51 | | - <item><description>They can represent non-linear decision boundaries.</description></item> |
52 | | - <item><description>They perform integrated feature selection and classification. </description></item> |
53 | | - <item><description>They are resilient in the presence of noisy features.</description></item> |
54 | | - </list> |
55 | | - <para>Fast forest is a random forest implementation. |
56 | | - The model consists of an ensemble of decision trees. Each tree in a decision forest outputs a Gaussian distribution by way of prediction. |
57 | | - An aggregation is performed over the ensemble of trees to find a Gaussian distribution closest to the combined distribution for all trees in the model. |
58 | | - This decision forest classifier consists of an ensemble of decision trees.</para> |
59 | | - <para>Generally, ensemble models provide better coverage and accuracy than single decision trees. |
60 | | - Each tree in a decision forest outputs a Gaussian distribution.</para> |
61 | | - <para>For more see: </para> |
62 | | - <list type='bullet'> |
63 | | - <item><description><a href='https://en.wikipedia.org/wiki/Random_forest'>Wikipedia: Random forest</a></description></item> |
64 | | - <item><description><a href='http://jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf'>Quantile regression forest</a></description></item> |
65 | | - <item><description><a href='https://blogs.technet.microsoft.com/machinelearning/2014/09/10/from-stumps-to-trees-to-forests/'>From Stumps to Trees to Forests</a></description></item> |
66 | | - </list> |
67 | | - </remarks> |
68 | | - </member> |
69 | | - |
70 | | - <!-- |
71 | | - The following text describes the GAM algorithm details. |
72 | | - It's used for the remarks section of all GAM-based trainers (regression, binary classification) |
73 | | - --> |
74 | | - <member name="GAM_remarks"> |
75 | | - <remarks> |
76 | | - <para> |
77 | | - Generalized Additive Models, or GAMs, model the data as a set of linearly independent features |
78 | | - similar to a linear model. For each feature, the GAM trainer learns a non-linear function, |
79 | | - called a "shape function", that computes the response as a function of the feature's value. |
80 | | - (In contrast, a linear model fits a linear response (e.g. a line) to each feature.) |
81 | | - To score an example, the outputs of all the shape functions are summed and the score is the total value. |
82 | | - </para> |
83 | | - <para> |
84 | | - This GAM trainer is implemented using shallow gradient boosted trees (e.g. tree stumps) to learn nonparametric |
85 | | - shape functions, and is based on the method described in Lou, Caruana, and Gehrke. |
86 | | - <a href='http://www.cs.cornell.edu/~yinlou/papers/lou-kdd12.pdf'>"Intelligible Models for Classification and Regression."</a> KDD'12, Beijing, China. 2012. |
87 | | - After training, an intercept is added to represent the average prediction over the training set, |
88 | | - and the shape functions are normalized to represent the deviation from the average prediction. This results |
89 | | - in models that are easily interpreted simply by inspecting the intercept and the shape functions. |
90 | | - See the sample below for an example of how to train a GAM model and inspect and interpret the results. |
91 | | - </para> |
92 | | - </remarks> |
93 | | - </member> |
94 | | - |
95 | 4 | <member name="TreeEnsembleFeaturizerTransform"> |
96 | 5 | <summary> |
97 | 6 | Trains a tree ensemble, or loads it from a file, then maps a numeric feature vector to outputs. |
|
0 commit comments