|
||||||||||||||
|
||||||||||||||
| ||||||||||||||
PMML 4.3 - RuleSetRuleset models can be thought of as flattened decision tree models. A ruleset consists of a number of rules. Each rule contains a predicate and a predicted class value, plus some information collected at training or testing time on the performance of the rule. For example, the following text describes a rule: PREDICATE: BP="HIGH" AND K > 0.045804001 AND Age <= 50 AND Na <= 0.77240998 PREDICTION: "drugB" CONFIDENCE: 0.9 Rulesets can be applied to new instances to derive predictions and associated confidences (scoring). Considering a case to be scored, if the rule's predicate evaluates to TRUE on the instance, the rule is said to fire. The ruleset can also have an optional default prediction and associated confidence that can be used to score a case if no rules fire. If missing values in fields mentioned in a rule's predicate cause the
predicate to evaluate to UNKNOWN, the rule does not fire.
Each rule can have a confidence and a weight that are set at model build time, likely by considering each rule's performance on the training data. The method used to compute confidence and weight is employed by the application authoring the PMML and lies outside the scope of the PMML model description. <xs:element name="RuleSetModel"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="MiningSchema"/> <xs:element ref="Output" minOccurs="0"/> <xs:element ref="ModelStats" minOccurs="0"/> <xs:element ref="ModelExplanation" minOccurs="0"/> <xs:element ref="Targets" minOccurs="0"/> <xs:element ref="LocalTransformations" minOccurs="0"/> <xs:element ref="RuleSet"/> <xs:element ref="ModelVerification" minOccurs="0"/> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="modelName" type="xs:string" use="optional"/> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/> <xs:attribute name="algorithmName" type="xs:string" use="optional"/> <xs:attribute name="isScorable" type="xs:boolean" default="true"/> </xs:complexType> </xs:element> Definitions: RuleSetModel: starts the definition for a ruleset model. RuleSet: this element describes a list of rules that make up a ruleset model. The order of rules in the list is important when considering how to score the ruleset. modelName: the value in modelName in a RuleSetModel element identifies the model with an unique name in the context of the PMML file. See General Structure of PMML models. isScorable: This attribute indicates if the model is valid for scoring. If this attribute is true or if it is missing, then the model should be processed normally. However, if the attribute is false, then the model producer has indicated that this model is intended for information purposes only and should not be used to generate results. In order to be valid PMML, all required elements and attributes must be present, even for non-scoring models. For more details, see General Structure. <xs:element name="RuleSet"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="RuleSelectionMethod" minOccurs="1" maxOccurs="unbounded"/> <xs:element ref="ScoreDistribution" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="Rule" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="recordCount" type="NUMBER" use="optional"/> <xs:attribute name="nbCorrect" type="NUMBER" use="optional"/> <xs:attribute name="defaultScore" type="xs:string" use="optional"/> <xs:attribute name="defaultConfidence" type="NUMBER" use="optional"/> </xs:complexType> </xs:element> Definitions recordCount: The number of training/test cases to which the ruleset was applied to generate support and confidence measures for individual rules. nbCorrect: indicates the number of training/test instances for which the default score is correct. defaultScore: The value of score in a RuleSet serves as the default predicted value when scoring a case no rules in the ruleset fire. defaultConfidence: provides a confidence to be returned with the default score (when scoring a case and no rules in the ruleset fire). ScoreDistribution: describe the distribution of the predicted value in the test/training data. Rule: contains 0 or more rules which comprise the ruleset. The RuleSelectionMethod describes how rules are selected to apply the model to a new case, and consists of: <xs:element name="RuleSelectionMethod"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="criterion" use="required"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="weightedSum"/> <xs:enumeration value="weightedMax"/> <xs:enumeration value="firstHit"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> Definitions: criterion explains how to determine and rank predictions and their associated confidences from the ruleset in case multiple rules fire. There are many many possible ways of applying rulesets, but three useful approaches are covered.
Each Rule can be either a SimpleRule or a CompoundRule. <xs:group name="Rule"> <xs:choice> <xs:element ref="SimpleRule"/> <xs:element ref="CompoundRule"/> </xs:choice> </xs:group> Each SimpleRule consists of an identifier, a predicate, a score and information on rule performance. <xs:element name="SimpleRule"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="PREDICATE"/> <xs:element ref="ScoreDistribution" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="id" type="xs:string" use="optional"/> <xs:attribute name="score" type="xs:string" use="required"/> <xs:attribute name="recordCount" type="NUMBER" use="optional"/> <xs:attribute name="nbCorrect" type="NUMBER" use="optional"/> <xs:attribute name="confidence" type="NUMBER" use="optional" default="1"/> <xs:attribute name="weight" type="NUMBER" use="optional" default="1"/> </xs:complexType> </xs:element> Definitions:
Each CompoundRule consists of a predicate and one or more rules. CompoundRules offer a shorthand for a more compact representation of rulesets and suggest a more efficient execution mechanism. <xs:element name="CompoundRule"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="PREDICATE"/> <xs:group ref="Rule" minOccurs="1" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> Definitions:
A ruleset containing both compound rules and simple rules have the same meaning as an equivalent ruleset containing only simple rules. It is possible to derive a ruleset containing simple rules by repeating the following transformation: The original rule<CompoundRule> <PREDICATE1/> <SimpleRule id="1" ...> <PREDICATE2/> ... contents of simple rule 1 ... </SimpleRule> ... further rules ... </CompoundRule> transforms to <SimpleRule id="1" ...> <CompoundPredicate booleanOperator="and"> <PREDICATE1> <PREDICATE2> </CompoundPredicate> ... contents of simple rule 1 ... </SimpleRule> <CompoundRule> <PREDICATE1/> ... further rules ... </CompoundRule> Or in other words, a simple rule is said to fire if its predicate evaluates to TRUE, and the predicates of all compound rules that contain the simple rule also evaluate to TRUE. A Complete RuleSet ExampleConsider a ruleset with three rules: RULE1: PREDICATE: BP="HIGH" AND K > 0.045804001 AND Age <= 50 AND Na <= 0.77240998 PREDICTION: drugB Training/test measures: recordCount 79 nbCorrect 76 confidence 0.9 weight 0.9 RULE2: PREDICATE: K > 0.057789002 AND BP="HIGH" AND Age <= 50 PREDICTION: drugA Training/test measures: recordCount 278 nbCorrect 168 confidence 0.6 weight 0.6 RULE3: PREDICATE: BP="HIGH" AND Na > 0.21 PREDICTION: drugA Training/test measures: recordCount 100 nbCorrect 50 confidence 0.36 weight 0.36 PMML for the example (using only simple rules) <PMML xmlns="https://www.dmg.org/PMML-4_3" version="4.3"> <Header copyright="MyCopyright"> <Application name="MyApplication" version="1.0"/> </Header> <DataDictionary numberOfFields="7"> <DataField name="BP" displayName="BP" optype="categorical" dataType="string"> <Value value="HIGH" property="valid"/> <Value value="LOW" property="valid"/> <Value value="NORMAL" property="valid"/> </DataField> <DataField name="K" displayName="K" optype="continuous" dataType="double"> <Interval closure="closedClosed" leftMargin="0.020152" rightMargin="0.079925"/> </DataField> <DataField name="Age" displayName="Age" optype="continuous" dataType="integer"/> <DataField name="Na" displayName="Na" optype="continuous" dataType="double"/> <DataField name="Cholesterol" displayName="Cholesterol" optype="categorical" dataType="string"> <Value value="HIGH" property="valid"/> <Value value="NORMAL" property="valid"/> </DataField> <DataField name="$C-Drug" displayName="$C-Drug" optype="categorical" dataType="string"> <Value value="drugA" property="valid"/> <Value value="drugB" property="valid"/> <Value value="drugC" property="valid"/> <Value value="drugX" property="valid"/> <Value value="drugY" property="valid"/> </DataField> <DataField name="$CC-Drug" displayName="$CC-Drug" optype="continuous" dataType="double"/> </DataDictionary> <RuleSetModel modelName="NestedDrug" functionName="classification" algorithmName="RuleSet"> <MiningSchema> <MiningField name="BP" usageType="active"/> <MiningField name="K" usageType="active"/> <MiningField name="Age" usageType="active"/> <MiningField name="Na" usageType="active"/> <MiningField name="Cholesterol" usageType="active"/> <MiningField name="$C-Drug" usageType="target"/> <MiningField name="$CC-Drug" usageType="supplementary"/> </MiningSchema> <RuleSet defaultScore="drugY" recordCount="1000" nbCorrect="149" defaultConfidence="0.0"> <RuleSelectionMethod criterion="weightedSum"/> <RuleSelectionMethod criterion="weightedMax"/> <RuleSelectionMethod criterion="firstHit"/> <SimpleRule id="RULE1" score="drugB" recordCount="79" nbCorrect="76" confidence="0.9" weight="0.9"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="BP" operator="equal" value="HIGH"/> <SimplePredicate field="K" operator="greaterThan" value="0.045804001"/> <SimplePredicate field="Age" operator="lessOrEqual" value="50"/> <SimplePredicate field="Na" operator="lessOrEqual" value="0.77240998"/> </CompoundPredicate> <ScoreDistribution value="drugA" recordCount="2"/> <ScoreDistribution value="drugB" recordCount="76"/> <ScoreDistribution value="drugC" recordCount="1"/> <ScoreDistribution value="drugX" recordCount="0"/> <ScoreDistribution value="drugY" recordCount="0"/> </SimpleRule> <SimpleRule id="RULE2" score="drugA" recordCount="278" nbCorrect="168" confidence="0.6" weight="0.6"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="K" operator="greaterThan" value="0.057789002"/> <SimplePredicate field="BP" operator="equal" value="HIGH"/> <SimplePredicate field="Age" operator="lessOrEqual" value="50"/> </CompoundPredicate> <ScoreDistribution value="drugA" recordCount="168"/> <ScoreDistribution value="drugB" recordCount="40"/> <ScoreDistribution value="drugC" recordCount="12"/> <ScoreDistribution value="drugX" recordCount="14"/> <ScoreDistribution value="drugY" recordCount="24"/> </SimpleRule> <SimpleRule id="RULE3" score="drugA" recordCount="100" nbCorrect="50" confidence="0.36" weight="0.36"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="BP" operator="equal" value="HIGH"/> <SimplePredicate field="Na" operator="greaterThan" value="0.21"/> </CompoundPredicate> <ScoreDistribution value="drugA" recordCount="50"/> <ScoreDistribution value="drugB" recordCount="10"/> <ScoreDistribution value="drugC" recordCount="12"/> <ScoreDistribution value="drugX" recordCount="7"/> <ScoreDistribution value="drugY" recordCount="11"/> </SimpleRule> </RuleSet> </RuleSetModel> </PMML> Scoring Procedure for the Example We will use the above example to illustrate the steps that should be followed in the scoring process. Suppose we wish to score an instance where: PMML for the example (using compound rules) The following PMML shows how the example model can be described using compound rules. <PMML xmlns="https://www.dmg.org/PMML-4_3" version="4.3"> <Header copyright="MyCopyright"> <Application name="MyApplication" version="1.0"/> </Header> <DataDictionary numberOfFields="7"> <DataField name="BP" displayName="BP" optype="categorical" dataType="string"> <Value value="HIGH" property="valid"/> <Value value="LOW" property="valid"/> <Value value="NORMAL" property="valid"/> </DataField> <DataField name="K" displayName="K" optype="continuous" dataType="double"> <Interval closure="closedClosed" leftMargin="0.020152" rightMargin="0.079925"/> </DataField> <DataField name="Age" displayName="Age" optype="continuous" dataType="integer"> <Interval closure="closedClosed" leftMargin="15" rightMargin="74"/> </DataField> <DataField name="Na" displayName="Na" optype="continuous" dataType="double"> <Interval closure="closedClosed" leftMargin="0.500517" rightMargin="0.899774"/> </DataField> <DataField name="Cholesterol" displayName="Cholesterol" optype="categorical" dataType="string"> <Value value="HIGH" property="valid"/> <Value value="NORMAL" property="valid"/> </DataField> <DataField name="$C-Drug" displayName="$C-Drug" optype="categorical" dataType="string"> <Value value="drugA" property="valid"/> <Value value="drugB" property="valid"/> <Value value="drugC" property="valid"/> <Value value="drugX" property="valid"/> <Value value="drugY" property="valid"/> </DataField> <DataField name="$CC-Drug" displayName="$CC-Drug" optype="continuous" dataType="double"> <Interval closure="closedClosed" leftMargin="0" rightMargin="1"/> </DataField> </DataDictionary> <RuleSetModel modelName="Drug" functionName="classification" algorithmName="RuleSet"> <MiningSchema> <MiningField name="BP" usageType="active"/> <MiningField name="K" usageType="active"/> <MiningField name="Age" usageType="active"/> <MiningField name="Na" usageType="active"/> <MiningField name="Cholesterol" usageType="active"/> <MiningField name="$C-Drug" usageType="target"/> <MiningField name="$CC-Drug" usageType="supplementary"/> </MiningSchema> <RuleSet defaultScore="drugY" recordCount="1000" nbCorrect="149" defaultConfidence="0.0"> <RuleSelectionMethod criterion="weightedSum"/> <RuleSelectionMethod criterion="weightedMax"/> <RuleSelectionMethod criterion="firstHit"/> <CompoundRule> <SimplePredicate field="BP" operator="equal" value="HIGH"/> <CompoundRule> <SimplePredicate field="Age" operator="lessOrEqual" value="50"/> <SimpleRule id="RULE1" score="drugB" recordCount="79" nbCorrect="76" confidence="0.9" weight="0.9"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="K" operator="greaterThan" value="0.045804001"/> <SimplePredicate field="Na" operator="lessOrEqual" value="0.77240998"/> </CompoundPredicate> <ScoreDistribution value="drugA" recordCount="2"/> <ScoreDistribution value="drugB" recordCount="76"/> <ScoreDistribution value="drugC" recordCount="1"/> <ScoreDistribution value="drugX" recordCount="0"/> <ScoreDistribution value="drugY" recordCount="0"/> </SimpleRule> <SimpleRule id="RULE2" score="drugA" recordCount="278" nbCorrect="168" confidence="0.6" weight="0.6"> <SimplePredicate field="K" operator="greaterThan" value="0.057789002"/> <ScoreDistribution value="drugA" recordCount="168"/> <ScoreDistribution value="drugB" recordCount="40"/> <ScoreDistribution value="drugC" recordCount="12"/> <ScoreDistribution value="drugX" recordCount="14"/> <ScoreDistribution value="drugY" recordCount="24"/> </SimpleRule> </CompoundRule> <SimpleRule id="RULE3" score="drugA" recordCount="100" nbCorrect="50" confidence="0.36" weight="0.36"> <SimplePredicate field="Na" operator="greaterThan" value="0.21"/> <ScoreDistribution value="drugA" recordCount="50"/> <ScoreDistribution value="drugB" recordCount="10"/> <ScoreDistribution value="drugC" recordCount="12"/> <ScoreDistribution value="drugX" recordCount="7"/> <ScoreDistribution value="drugY" recordCount="11"/> </SimpleRule> </CompoundRule> </RuleSet> </RuleSetModel> </PMML> |
||||||||||||||
|