PMML 1.1 -- DTD of Association Rules Model
The Association Rule model represents rules where some set
of items is associated to another set of items.
For example a rule can express that a certain product is
often bought in combination with a certain set of other products.
The attribute definitions of the association rule model uses
the entity
ELEMENT-ID
in order to express a semantical
constraint that a value must be
unique in a set of elements (contained in the same
XML document) of the same type.
|
<!ENTITY % ELEMENT-ID "CDATA">
|
|
An Association Rule model consists of four major parts:
|
<!ELEMENT AssociationModel (Extension*, AssocInputStats,
AssocItem+, AssocItemset+, AssocRule+)>
<!ATTLIST AssociationModel
modelName CDATA #IMPLIED
>
|
Basic information of the input data:
|
<!ELEMENT AssocInputStats EMPTY>
<!ATTLIST AssocInputStats
numberOfTransactions %INT-NUMBER; #REQUIRED
maxNumberOfItemsPerTA %INT-NUMBER; #IMPLIED
avgNumberOfItemsPerTA %REAL-NUMBER; #IMPLIED
minimumSupport %PROB-NUMBER; #REQUIRED
minimumConfidence %PROB-NUMBER; #REQUIRED
lengthLimit %INT-NUMBER; #IMPLIED
numberOfItems %INT-NUMBER; #REQUIRED
numberOfItemsets %INT-NUMBER; #REQUIRED
numberOfRules %INT-NUMBER; #REQUIRED
>
|
Attribute description:
numberOfTransactions
: The number of transactions (baskets of items) contained in the input
data
maxNumberOfItemsPerTA
: The number of items contained in the largest transaction
avgNumberOfItemsPerTA
: The average number of items contained in a transaction
minimumSupport
: The minimum relative support value (#supporting transactions / #total
transactions) satisfied by all rules
minimumConfidence
: The minimum confidence value satisfied by all rules. Confidence is
calculated as (support (rule) / support(antecedent))
lengthLimit
: The maximum number of items contained in a rule which was used to limit
the number of rules
numberOfItems
: The number of different items contained in the input data
numberOfItemsets
: The number of itemsets contained in the model
numberOfRules
: The number of rules contained in the model
Items contained in itemsets
|
<!ELEMENT AssocItem EMPTY>
<!ATTLIST AssocItem
id %ELEMENT-ID; #REQUIRED
value CDATA #REQUIRED
mappedValue CDATA #IMPLIED
weight %REAL-NUMBER; #IMPLIED
>
|
Attribute description:
id
: An identification to uniquely identify an item
value
: The value of the item as in the input data
mappedValue
: Optional, a value to which the original item value is mapped.
For instance, this could be a product name
if the original value is an EAN code.
weight
: The weight of the item. For example, the price or value of an item.
Itemsets which are contained in rules
|
<!ELEMENT AssocItemset (Extension*, AssocItemRef+)>
<!ATTLIST AssocItemset
id %ELEMENT-ID; #REQUIRED
support %PROB-NUMBER; #REQUIRED
numberOfItems %INT-NUMBER; #REQUIRED
>
|
Attribute description:
id
: An identification to uniquely identify an itemset
support
: The relative support of the itemset
numberOfItems
: The number of items contained in this itemset
Subelements: Item references to point to elements of type item.
|
<!ELEMENT AssocItemRef EMPTY>
<!ATTLIST AssocItemRef
itemRef %ELEMENT-ID; #REQUIRED
>
|
Attribute description:
itemRef
: The id value of an item element
Rules: Elements of the form <antecedent itemset>
=>
<consequent itemset>
|
<!ELEMENT AssocRule ( Extension* )>
<!ATTLIST AssocRule
support %PROB-NUMBER; #REQUIRED
confidence %PROB-NUMBER; #REQUIRED
antecedent %ELEMENT-ID; #REQUIRED
consequent %ELEMENT-ID; #REQUIRED
>
|
Attribute definitions:
support
: The relative support of the rule
confidence
: The confidence of the rule
antecedent
: The id value of the itemset which is the antecedent of the rule
consequent
: The id value of the itemset which is the consequent of the rule
Example:
Let's assume we have four transactions with the following data:
t1: Cracker, Coke, Water
t2: Cracker, Water
t3: Cracker, Water
t4: Cracker, Coke, Water
|
<?xml version="1.0" ?>
<PMML version="1.1">
<Header copyright="www.dmg.org"
description="example model for association rules"/>
<DataDictionary numberOfFields="1" >
<DataField name="item" optype="categorical" />
</DataDictionary>
<AssociationModel>
<AssocInputStats numberOfTransactions="4" numberOfItems="3"
minimumSupport="0.6" minimumConfidence="0.5"
numberOfItemsets="3" numberOfRules="2"/>
<!-- We have three items in our input data -->
<AssocItem id="1" value="Cracker" />
<AssocItem id="2" value="Coke" />
<AssocItem id="3" value="Water" />
<!-- and two frequent itemsets with a single item -->
<AssocItemset id="1" support="1.0" numberOfItems="1">
<AssocItemRef itemRef="1" />
</AssocItemset>
<AssocItemset id="2" support="1.0" numberOfItems="1">
<AssocItemRef itemRef="3" />
</AssocItemset>
<!-- and one frequent itemset with two items. -->
<AssocItemset id="3" support="1.0" numberOfItems="2">
<AssocItemRef itemRef="1" />
<AssocItemRef itemRef="3" />
</AssocItemset>
<!-- Two rules satisfy the requirements -->
<AssocRule support="1.0" confidence="1.0"
antecedent="1" consequent="2" />
<AssocRule support="1.0" confidence="1.0"
antecedent="2" consequent="1" />
</AssociationModel>
</PMML>
|
|