PMML 1.1 -- DTD for Normalization
|
|
| NormContinuous defines how to normalize an input field. field must refer to a field in the data dictionary. If LinearNorm is missing then the input field is not normalized. |
|
LinearNorm*
defines a sequence of points for a stepwise linear interpolation
function. The sequence must contain at least two elements.
To simplify processing, the sequence must be sorted by ascending original
values.
Within
NormContinuous
the elements
LinearNorm
must be strictly sorted
by ascending value of
'orig'.
Given two
points (a1, b1) and (a2, b2) such that there is no other point
(a3, b3) with a1<a3<a2, then the normalized value is
b1+ ( x-a1)/(a2-a1)*(b2-b1) for a1 <= x <= a2 |
| Missing input values are mapped to missing output. If the input value is not within the range [a1..an] then it is treated as an outlier, the specific method for outlier treatment must be provided by the caller, eg, an outlier could be mapped to a missing value or it could be mapped as the minimal or maximal value. |
|
|
An element (f, v) defines that the unit has value 1.0 if the value of input field f is v, otherwise it is 0. The set of NormDiscrete instances which refer to a certain input field define a fan-out function which maps a single input field to a set of normalized fields. Missing input values are mapped to missing output. PMML 1.1 supports only one kind of discrete normalization, future versions could support other techniques such as thermometer encoding. Thermometer encoding can be used for ordinal values, the output is 1.0 if the value of input field f is greater or equal v, otherwise it is 0.0. Futhermore there could also be a linear index mapping for ordinal values: given an ordering (a1, a2, ..., an), then the normalized value for value ai is the number i. |
|
|