forked from dotnet/machinelearning
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdoc.xml
More file actions
113 lines (109 loc) · 5.19 KB
/
Copy pathdoc.xml
File metadata and controls
113 lines (109 loc) · 5.19 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
<?xml version="1.0" encoding="utf-8" ?>
<doc>
<members>
<member name="NAFilter">
<summary>
Removes missing values from vector type columns.
</summary>
<remarks>
This transform removes the entire row if any of the input columns have a missing value in that row.
This preprocessing is required for many ML algorithms that cannot work with missing values.
Useful if any missing entry invalidates the entire row.
If the <see cref="P:Microsoft.ML.Transforms.MissingValuesRowDropper.Complement"/> is set to true, this transform would do the exact opposite,
it will keep only the rows that have missing values.
</remarks>
</member>
<example name="NAFilter">
<example>
<code language="csharp">
pipeline.Add(new MissingValuesRowDropper("Column1"));
</code>
</example>
</example>
<member name="TextToKey">
<summary>
Converts input values (words, numbers, etc.) to index in a dictionary.
</summary>
<remarks>
The TextToKeyConverter transform builds up term vocabularies (dictionaries).
The TextToKeyConverter and the <see cref="T:Microsoft.ML.Transforms.HashConverter"/> are the two one primary mechanisms by which raw input is transformed into keys.
If multiple columns are used, each column builds/uses exactly one vocabulary.
The output columns are KeyType-valued.
The Key value is the one-based index of the item in the dictionary.
If the key is not found in the dictionary, it is assigned the missing value indicator.
This dictionary mapping values to keys is most commonly learnt from the unique values in input data,
but can be defined through other means: either with the mapping defined directly on the command line, or as loaded from an external file.
</remarks>
<seealso cref="T:Microsoft.ML.Transforms.HashConverter"/>
<seealso cref="T:Microsoft.ML.Transforms.KeyToTextConverter"/>
</member>
<example name="TextToKey">
<example>
<code language="csharp">
pipeline.Add(new TextToKeyConverter(("Column", "OutColumn"))
{
Sort = TermTransformSortOrder.Occurrence
});
</code>
</example>
</example>
<member name="ValueToKeyMappingEstimator">
<summary>
Converts input values (words, numbers, etc.) to index in a dictionary.
</summary>
<remarks>
The ValueToKeyMappingEstimator builds up term vocabularies (dictionaries).
If multiple columns are used, each column builds/uses exactly one vocabulary.
The output columns are KeyType-valued.
The Key value is the one-based index of the item in the dictionary.
If the key is not found in the dictionary, it is assigned the missing value indicator.
This dictionary mapping values to keys is most commonly learnt from the unique values in input data,
but can be defined through other means: either with the mapping defined directly on the command line, or as loaded from an external file.
</remarks>
</member>
<member name="NAHandle">
<summary>
Handle missing values by replacing them with either the default value or the indicated value.
</summary>
<remarks>
This transform handles missing values in the input columns. For each input column, it creates an output column
where the missing values are replaced by one of these specified values:
<list type='bullet'>
<item>
<description>The default value of the appropriate type.</description>
</item>
<item>
<description>The mean value of the appropriate type.</description>
</item>
<item>
<description>The max value of the appropriate type.</description>
</item>
<item>
<description>The min value of the appropriate type.</description>
</item>
</list>
<para>The last three work only for numeric/TimeSpan/DateTime kind columns.</para>
<para>
The output column can also optionally include an indicator vector for which slots were missing in the input column.
This can be done only when the indicator vector type can be converted to the input column type, i.e. only for numeric columns.
</para>
<para>
When computing the mean/max/min value, there is also an option to compute it over the whole column instead of per slot.
This option has a default value of true for variable length vectors, and false for known length vectors.
It can be changed to true for known length vectors, but it results in an error if changed to false for variable length vectors.
</para>
</remarks>
<seealso cref="T:Microsoft.ML.Data.DataKind"/>
</member>
<example name="NAHandle">
<example>
<code language="csharp">
pipeline.Add(new MissingValueHandler("FeatureCol", "CleanFeatureCol")
{
ReplaceWith = NAHandleTransformReplacementKind.Mean
});
</code>
</example>
</example>
</members>
</doc>