DMG.ORG

PMML 1.1 -- DTD of Association Rules Model

The Association Rule model represents rules where some set of items is associated to another set of items. For example a rule can express that a certain product is often bought in combination with a certain set of other products.


The attribute definitions of the association rule model uses the entity ELEMENT-ID in order to express a semantical constraint that a value must be unique in a set of elements (contained in the same XML document) of the same type.


<!ENTITY % ELEMENT-ID   "CDATA">
An Association Rule model consists of four major parts:

<!ELEMENT AssociationModel (Extension*, AssocInputStats,
                            AssocItem+, AssocItemset+, AssocRule+)>
<!ATTLIST AssociationModel
      modelName  CDATA    #IMPLIED
>



Basic information of the input data:


 <!ELEMENT AssocInputStats EMPTY>
     <!ATTLIST AssocInputStats
       numberOfTransactions  %INT-NUMBER;  #REQUIRED
       maxNumberOfItemsPerTA %INT-NUMBER;  #IMPLIED
       avgNumberOfItemsPerTA %REAL-NUMBER; #IMPLIED
       minimumSupport        %PROB-NUMBER; #REQUIRED
       minimumConfidence     %PROB-NUMBER; #REQUIRED
       lengthLimit           %INT-NUMBER;  #IMPLIED
       numberOfItems         %INT-NUMBER;  #REQUIRED
       numberOfItemsets      %INT-NUMBER;  #REQUIRED
       numberOfRules         %INT-NUMBER;  #REQUIRED
    
>

Attribute description:

numberOfTransactions : The number of transactions (baskets of items) contained in the input data
maxNumberOfItemsPerTA : The number of items contained in the largest transaction
avgNumberOfItemsPerTA : The average number of items contained in a transaction
minimumSupport : The minimum relative support value (#supporting transactions / #total transactions) satisfied by all rules
minimumConfidence : The minimum confidence value satisfied by all rules. Confidence is calculated as (support (rule) / support(antecedent))
lengthLimit : The maximum number of items contained in a rule which was used to limit the number of rules
numberOfItems : The number of different items contained in the input data
numberOfItemsets : The number of itemsets contained in the model
numberOfRules : The number of rules contained in the model



Items contained in itemsets


<!ELEMENT AssocItem EMPTY>
<!ATTLIST AssocItem
  id                    %ELEMENT-ID;  #REQUIRED
  value                 CDATA         #REQUIRED
  mappedValue           CDATA         #IMPLIED
  weight                %REAL-NUMBER; #IMPLIED
>

Attribute description:

id : An identification to uniquely identify an item
value : The value of the item as in the input data
mappedValue : Optional, a value to which the original item value is mapped. For instance, this could be a product name if the original value is an EAN code.
weight : The weight of the item. For example, the price or value of an item.



Itemsets which are contained in rules


<!ELEMENT AssocItemset (Extension*, AssocItemRef+)>
<!ATTLIST AssocItemset
  id                    %ELEMENT-ID;  #REQUIRED
  support               %PROB-NUMBER; #REQUIRED
  numberOfItems         %INT-NUMBER;  #REQUIRED
>

Attribute description:

id : An identification to uniquely identify an itemset
support : The relative support of the itemset
numberOfItems : The number of items contained in this itemset

Subelements: Item references to point to elements of type item.


<!ELEMENT AssocItemRef EMPTY>
<!ATTLIST AssocItemRef
  itemRef     %ELEMENT-ID;  #REQUIRED
>

Attribute description:

itemRef : The id value of an item element



Rules: Elements of the form <antecedent itemset> => <consequent itemset>


<!ELEMENT AssocRule ( Extension* )>
<!ATTLIST AssocRule
  support               %PROB-NUMBER; #REQUIRED
  confidence            %PROB-NUMBER; #REQUIRED
  antecedent            %ELEMENT-ID;  #REQUIRED
  consequent            %ELEMENT-ID;  #REQUIRED
>

Attribute definitions:

support : The relative support of the rule
confidence : The confidence of the rule
antecedent : The id value of the itemset which is the antecedent of the rule
consequent : The id value of the itemset which is the consequent of the rule



Example:

Let's assume we have four transactions with the following data:

t1: Cracker, Coke, Water
t2: Cracker, Water
t3: Cracker, Water
t4: Cracker, Coke, Water


<?xml version="1.0" ?>
<PMML version="1.1">
  <Header copyright="www.dmg.org" 
          description="example model for association rules"/>
 <DataDictionary numberOfFields="1" >
 <DataField name="item" optype="categorical" />
 </DataDictionary>

<AssociationModel>

<AssocInputStats numberOfTransactions="4" numberOfItems="3"
                 minimumSupport="0.6"     minimumConfidence="0.5"
                 numberOfItemsets="3"     numberOfRules="2"/>

<!-- We have three items in our input data -->

<AssocItem id="1" value="Cracker" />
<AssocItem id="2" value="Coke" />
<AssocItem id="3" value="Water" />

<!-- and two frequent itemsets with a single item -->

<AssocItemset id="1" support="1.0" numberOfItems="1">
   <AssocItemRef itemRef="1" />
</AssocItemset>

<AssocItemset id="2" support="1.0" numberOfItems="1">
   <AssocItemRef itemRef="3" />
</AssocItemset>

<!-- and one frequent itemset with two items. -->

<AssocItemset id="3" support="1.0" numberOfItems="2">
   <AssocItemRef itemRef="1" />
   <AssocItemRef itemRef="3" />
</AssocItemset>


<!-- Two rules satisfy the requirements -->

<AssocRule support="1.0" confidence="1.0"
                 antecedent="1" consequent="2" />

<AssocRule support="1.0" confidence="1.0"
                 antecedent="2" consequent="1" />

</AssociationModel>
</PMML>


Webmaster

Copyright © 1999 DMG.org All Rights Reserved.