|
||||||||||||||
|
||||||||||||||
| ||||||||||||||
PMML 4.3 - Built-in functionsAlmost all programming languages come with a set of predefined functions that perform low-level operations. PMML has a similar set of functions.
The definitions of functions in PMML generally follow the design of functions and operators in XQuery. Further ideas are taken from MathML , XPath , Java Date formats. +, -, * and /Functions for simple arithmetics. Pseudo-declaration of PMML built-in function +: <DefineFunction name="+" optype="continuous"> <ParameterField name="a" optype="continuous"/> <ParameterField name="b" optype="continuous"/> ... implementation built-in ... </DefineFunction> The functions Example: Return the difference between input fields named
<Apply function="-"> <FieldRef field="A"/> <FieldRef field="B"/> </Apply> Assuming min, max, sum, avg, median, productReturns an aggregation of a variable number of input fields. Pseudo-declaration of PMML built-in function min: <DefineFunction name="min" optype="continuous"> The function takes a variable number of <FieldRef/> as parameters ... implementation built-in ... </DefineFunction> The aggregation functions max, sum, avg, median, product are defined in the same way. Note that the number of input parameters is variable but these functions do not aggregate values coming from multiple input records. Example: Return the minimum value of input fields named <Apply function="min"> <FieldRef field="A"/> <FieldRef field="B"/> <FieldRef field="C"/> </Apply> Assuming log10, ln, sqrt, abs, exp, pow, threshold, floor, ceil, roundFurther mathematical functions. Pseudo-declaration of PMML built-in function log10: <DefineFunction name="log10" optype="continuous"> <ParameterField name="x" optype="continuous"/> ... implementation built-in ... </DefineFunction> The function log10 returns the logarithm to the base 10. The functions ln (natural log), sqrt (square root), abs (absolute value), exp (exponential) are defined in the same way. Semantics are as usual. See also MathML. Example: Return the logarithm to the base 10 of an input field
<Apply function="log10"> <FieldRef field="A"/> </Apply> Assuming Pseudo-declaration of PMML built-in functions pow and floor: <DefineFunction name="pow" optype="continuous"> <ParameterField name="x" optype="continuous"/> <ParameterField name="y" optype="continuous"/> ... implementation built-in ... </DefineFunction> <DefineFunction name="floor" datatype="integer"> <ParameterField name="x" optype="continuous"/> ... implementation built-in ... </DefineFunction> The function pow(x,y) returns the number Example: Return the cube of an input field <Apply function="pow"> <FieldRef field="A"/> <Constant dataType="integer">3</Constant> </Apply> Assuming isMissing, isNotMissingFunctions for boolean operations. Return true or false. Result is dependent on applying either function to a single input parameter. Pseudo-declaration of PMML built-in function isMissing: <DefineFunction name="isMissing" dataType="boolean"> <ParameterField name="input"/> ... implementation built-in ... </DefineFunction> Example: Check if field Str is missing. If so, returns true, else false <Apply function="isMissing"> <FieldRef field="Str"/> </Apply> equal, notEqual, lessThan, lessOrEqual, greaterThan, greaterOrEqualFurther boolean functions. Return true or false. Result is dependent on applying either function to two input parameters. Pseudo-declaration of PMML built-in function lessThan: <DefineFunction name="lessThan" dataType="boolean"> <ParameterField name="x"/> <ParameterField name="y"/> ... implementation built-in ... </DefineFunction> Example: Check if field A is less than field B. If so, returns true, else false. <Apply function="lessThan"> <FieldRef field="A"/> <FieldRef field="B"/> </Apply> and, orFurther boolean functions. Evaluate the results of two or more boolean values.
Pseudo-declaration of PMML built-in function and: <DefineFunction name="and" dataType="boolean"> The function takes a variable number of fields as parameters ... implementation built-in ... </DefineFunction> Example: Check if field A is less than 3 and field B is less than 4. If so, returns true, else false. <Apply function="and"> <Apply function="lessThan"> <FieldRef field="A"/> <Constant dataType="integer">3</Constant> </Apply> <Apply function="lessThan"> <FieldRef field="B"/> <Constant dataType="integer">4</Constant> </Apply> </Apply> notFurther boolean function. Negates input boolean value. Pseudo-declaration of PMML built-in function not: <DefineFunction name="not" dataType="boolean"> <ParameterField name="x" dataType="boolean"/> ... implementation built-in ... </DefineFunction> Example: Check if field A is not less than B (i.e. greater or equal to B). If so, returns true, else false. <Apply function="not"> <Apply function="lessThan"> <FieldRef field="A"/> <FieldRef field="B"/> </Apply> </Apply> isIn, isNotInFurther boolean functions. Evaluates if a field value is contained in a given list of values.
Pseudo-declaration of PMML built-in function isIn: <DefineFunction name="isIn" dataType="boolean"> <ParameterField name="x"/> The list takes a variable number of fields as parameters ... implementation built-in ... </DefineFunction> Example: Check if field color is in (red, green, blue). If so, returns true, else false. <Apply function="isIn"> <FieldRef field="color"/> <Constant dataType="string">red</Constant> <Constant dataType="string">green</Constant> <Constant dataType="string">blue</Constant> </Apply> ifImplements IF-THEN-ELSE logic. The ELSE part is optional. If the ELSE part is absent and the boolean value is false then a missing value is returned. Pseudo-declaration of PMML built-in function if: <DefineFunction name="if"> <ParameterField name="x" dataType="boolean"/> <ParameterField name="A"/> THEN part is required <ParameterField name="B"/> ELSE part is optional ... implementation built-in ... </DefineFunction> Example: Check if field color is in (red, green, blue). If so, returns "primary", else "other". <Apply function="if"> <Apply function="isIn"> <FieldRef field="color"/> <Constant dataType="string">red</Constant> <Constant dataType="string">green</Constant> <Constant dataType="string">blue</Constant> </Apply> <Constant dataType="string">primary</Constant> <Constant dataType="string">other</Constant> </Apply> uppercaseReturns a string where all lowercase characters in the input string are replaced by their uppercase variants. Pseudo-declaration of PMML built-in function uppercase: <DefineFunction name="uppercase" dataType="string"> <ParameterField name="input" dataType="string"/> ... implementation built-in ... </DefineFunction> The function uppercase uses the Unicode definitions for classifying characters as uppercase / lowercase. See XQuery fn:upper-case Example: Return the field Str with all characters in upper case. <Apply function="uppercase"> <FieldRef field="Str"/> </Apply> Assuming lowercaseReturns a string where all uppercase characters in the input string are replaced by their lowercase variants. Pseudo-declaration of PMML built-in function lowercase: <DefineFunction name="lowercase" dataType="string"> <ParameterField name="input" dataType="string"/> ... implementation built-in ... </DefineFunction> The function lowercase uses the Unicode definitions for classifying characters as uppercase / lowercase. See XQuery fn:lower-case. Example: Return the field Str with all characters in lower case. <Apply function="lowercase"> <FieldRef field="Str"/> </Apply> Assuming substringExtracts a substring from an input string. Pseudo-declaration of PMML built-in function substring: <DefineFunction name="substring" dataType="string"> <ParameterField name="input" dataType="string"/> <ParameterField name="startPos" dataType="integer"/> <ParameterField name="length" dataType="integer"/> ... See XQuery fn:substring ... </DefineFunction>
Example: Return the 3 characters of field <Apply function="substring"> <FieldRef field="Str"/> <Constant dataType="integer">2</Constant> <Constant dataType="integer">3</Constant> </Apply> Assuming trimBlanksReturns a string where leading and trailing characters in the input string are removed. Note that trailing blanks in PMML, by definition, are not significant when strings are compared. Pseudo-declaration of PMML built-in function trimBlanks: <DefineFunction name="trimBlanks" dataType="string"> <ParameterField name="input" dataType="string"/> ... implementation built-in ... </DefineFunction> Blanks include tab and newline characters. Use definitions according to Unicode. Example: Trim blanks of field <Apply function="trimBlanks"> <FieldRef field="Str"/> </Apply> Assuming concatReturns a string as a result of the concatenation of two or more parameters.Pseudo-declaration of PMML built-in function concat: <DefineFunction name="concat" dataType="string"> <ParameterField name="x"/> <ParameterField name="y"/> ... See XQuery fn:concat ... </DefineFunction> Example: Concatenates field month, constant value "-" and field year. <Apply function="concat"> <FieldRef field="month"/> <Constant>-</Constant> <FieldRef field="year"/> </Apply> Assuming month="2" and year="2000" the result corresponding to this Apply element is "2-2000". replaceReplaces each substring in a given input string that matches a given pattern or regular expression by another string. It returns the resulting string after replacement. Note that for regular expressions, PMML follows the specification implemented in the PCRE (Perl Compatible Regular Expressions) library.Pseudo-declaration of PMML built-in function replace: <DefineFunction name="replace" dataType="string"> <ParameterField name="input" dataType="string"/> <ParameterField name="pattern" dataType="string"/> <ParameterField name="replacement" dataType="string"/> ... See XQuery fn:replace ... </DefineFunction> Example: Replaces a sequence of "B" letters by letter "c". <Apply function="replace"> <Constant>BBBB</Constant> <Constant>B+</Constant> <Constant>c</Constant> </Apply> matchesAttempts to match a pattern or regular expression against a given string. It returns a Boolean: "true" if a match is found or "false" if not. Note that for regular expressions, PMML follows the specification implemented in the PCRE (Perl Compatible Regular Expressions) library.Pseudo-declaration of PMML built-in function matches: <DefineFunction name="matches" dataType="boolean"> <ParameterField name="input" dataType="string"/> <ParameterField name="pattern" dataType="string"/> ... See XQuery fn:matches ... </DefineFunction> Example: Attempts to match pattern "ary" against the value of field month. <Apply function="matches"> <FieldRef field="month"/> <Constant>ar?y</Constant> </Apply> Assuming month is either "January", "February" or "May" the result corresponding to this Apply element is "true". For any other month, the result is "false". formatNumberFormats numbers according to a pattern. The pattern uses the Posix descriptors as used, e.g., in the C function printf.Pseudo-declaration of PMML built-in function formatNumber: <DefineFunction name="formatNumber" dataType="string"> <ParameterField name="input" optype="continuous"/> <ParameterField name="pattern" dataType="string"/> ... implementation built-in ... </DefineFunction> Example: Convert a number in the field <Apply function="formatNumber"> <FieldRef field="Num"/> <Constant>%3d</Constant> </Apply> Assuming formatDatetimeFormats date and time value according to a pattern. The pattern is a Posix descriptors as used, e.g., in the C function strftime or the Unix command date.Pseudo-declaration of PMML built-in function formatDatetime: <DefineFunction name="formatDatetime" optype="categorical"> <ParameterField name="input" optype="ordinal"/> <ParameterField name="pattern" dataType="string"/> ... implementation built-in ... </DefineFunction>
Example: Format a date value as 'Month/Day/Year'. <DerivedField name="StartDateUS" dataType="string" optype="categorical"> <Apply function="formatDatetime"> <FieldRef field="StartDate"/> <Constant>%m/%d/%y</Constant> </Apply> </DerivedField> With dateDaysSinceYearFunction for transforming dates into integers. The typedateDaysSinceYear is a
variant of the type date where the values are represented as the
number of days since Year-01-01. The date January 1 of Year is
represented by the number 0. January 2 of Year is represented by 1,
February 1 of Year is represented by 31, etc. Dates before January 1
of Year are represented as negative numbers. For example, values of
type dateDaysSince[1960] are the number of days since January 1,
1960. The date January 1, 1960 is represented by the number 0.For example, the date April 1, 2003 can be converted to the value 15796 of type dateDaysSince[1960] .
Pseudo-declaration of PMML built-in function dateDaysSinceYear: <DefineFunction name="dateDaysSinceYear" optype="continuous"> <ParameterField name="input" optype="ordinal"/> <ParameterField name="referenceYear" optype="continuous"/> </DefineFunction>
Example: Calculate days since 1970. <DerivedField name="PurchaseDateDays" dataType="integer" optype="continuous"> <Apply function="dateDaysSinceYear"> <FieldRef field="PurchaseDate"/> <Constant>1970</Constant> </Apply> </DerivedField> dateSecondsSinceYearFunction for transforming dates into integers. The typedateSecondsSinceYear is a
variant of the type date where the values are represented as the number of
seconds since midnight starting the first day of Year (which is
represented by 0). 1 minute after midnight on January 1 of Year is
represented by 60, 1 hour after midnight on January 1 of Year is
represented by 3600, etc. Times before January 1 of Year are
represented as negative numbers.For example, values of type dateSecondsSince[1960] are the number of
seconds since the midnight starting January 1, 1960. 30 minutes and 3 seconds
after 3 o'clock in the morning of January 3, 1960 can be converted to the
value 185403 of type dateSecondsSince[1960] .
Pseudo-declaration of PMML built-in function dateSecondsSinceYear: <DefineFunction name="dateSecondsSinceYear" optype="continuous"> <ParameterField name="input" optype="ordinal"/> <ParameterField name="referenceYear" optype="continuous"/> </DefineFunction>
Example: Create a new field <DerivedField name="PurchaseDateSeconds" dataType="integer" optype="continuous"> <Apply function="dateSecondsinceYear"> <FieldRef field="PurchaseDate"/> <Constant>1970</Constant> </Apply> </DerivedField> dateSecondsSinceMidnightFunction for transforming dates into integers. For example, Midnight returns a value of 0, 1 second after midnight (00:00:01) would return a value of 1, one minute after midnight would return a value of 60, etc. 23 minutes and 30 seconds after 5 o'clock in the morning should return 19410.Pseudo-declaration of PMML built-in function dateSecondsSinceMidnight: <DefineFunction name="dateSecondsSinceMidnight" optype="continuous"> <ParameterField name="input" optype="ordinal"/> </DefineFunction>
Example: Create a new field <DerivedField name="PurchaseDateSeconds" dataType="integer" optype="continuous"> <Apply function="dateSecondsSinceMidnight"> <FieldRef field="PurchaseDate"/> </Apply> </DerivedField> normalCDF, normalPDF, stdNormalCDF, stdNormalPDF, erf, normalIDF, stdNormalIDFFunctions for normal distribution are widely used in statistical applications. Wikipedia has the following information at https://en.wikipedia.org/wiki/Normal_distribution: In probability theory, the normal (or Gaussian) distribution is a very common continuous probability distribution. Normal distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables. The probability density function (PDF) of the normal distribution with mean Μ and standard deviation Σ is:If Μ = 0 and Σ = 1, the distribution is called the standard normal distribution or the unit normal distribution. The cumulative distribution function (CDF) of the standard normal distribution, usually denoted with the capital Greek letter Φ (phi), is the integral In statistics one often uses the related error function, or erf(x), defined as the probability of a random variable
with normal distribution of mean 0 and variance 1/2 falling in the range These integrals cannot be expressed in terms of elementary functions, and are often said to be special functions. However, many numerical approximations are known. The two functions are closely related, namely For a generic normal distribution f with mean Μ and standard deviation Σ, the cumulative distribution function is The inverse of normal CDF is called quantile function. The quantile function of the standard normal distribution is called the probit function, and can be expressed in terms of the inverse error function: For a normal random variable with mean Μ and variance Σ2, the quantile function isPMML defines the following built-in functions related to the normal distribution: normalCDF, normalPDF, normalIDF, stdNormalCDF, stdNormalPDF, stdNormalIDF, erf, normalIDF, stdNormalIDF. Pseudo-declaration of PMML built-in function normalCDF: <DefineFunction name="normalCDF" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> <ParameterField name="mu" optype="continuous" dataType="double"/> <ParameterField name="sigma" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction> The function normalCDF(x, mu, sigma) returns the value Φ(x, Μ, Σ) defined above. The function stdNormalCDF(x) returns the cumulative distribution function value of x for the standard normal distribution. Its pseudo-declaration is: <DefineFunction name="stdNormalCDF" optype="continuous" dataType="double"> <ParameterField name="x" optype="continuous" dataType="double"/> ... implementation built-in ... </DefineFunction> Note that Σ here must be positive. Functions normalPDF(x, Μ, Σ), normalIDF(p,Μ, Σ), and stdNormalPDF(x), stdNormalIDF(x) have similar to above pseudo-declarations and compute probability distribution functions and inverse distribution functions of normal distribution with mean Μ and positive standard deviation Σ and of standard normal distribution respectively. PMML function erf is defined similar to stdNormalCDF and computes erf(x) as described above. |
||||||||||||||
|