|
PMML Sample Models:
The PMMLs provided below are examples of predicted models
developed that use the PMML standard. These samples are not intended for
performance or vendor comparisons as they are provided solely for users to gain
a better understanding of PMML. No representation is made as to the
accuracy and applicability of these models. Also included are the
datasets used to train and validate these predictive models.
PMML Version |
Model Type |
Vendor |
Application |
Dataset |
PMML
File |
|
2.0 |
Association |
Oracle |
Oracle 9i Data Mining, 9.2.0 |
Iris |
View |
|
2.0 |
center-based Clustering |
IBM |
DB2 Intelligent Miner for Data V8.1 |
Iris |
View |
|
2.0 |
distribution-based Clustering |
IBM |
DB2 Intelligent Miner for Data V8.1 |
Iris |
View |
|
2.0 |
Naïve Bayes |
Oracle |
Oracle 9i Data Mining, 9.2.0 |
Iris |
View |
|
2.0 |
Neural Network (Classification) |
IBM |
DB2 Intelligent Miner for Data V8.1 |
Iris |
View |
|
2.0 |
Neural Network (Regression) |
IBM |
DB2 Intelligent Miner for Data V8.1 |
Iris |
View |
|
2.0 |
Regression |
IBM |
DB2 Intelligent Miner for Data V8.1 |
Iris |
View |
|
2.0 |
Tree |
IBM |
DB2 Intelligent Miner for Data V8.1 |
Iris |
View |
|
2.1 |
Association |
IBM |
DB2 Intelligent Miner Modeling V8.2 |
Voting |
View |
|
2.1 |
Clustering |
IBM |
DB2 Intelligent Miner Modeling V8.2 |
Robustness |
View |
|
2.1 |
Tree |
IBM |
DB2 Intelligent Miner Modeling V8.2 |
Robustness |
View |
|
3.0 |
Association |
IBM |
DB2 Data Warehouse Edition V9.1 |
Shopping |
View |
|
3.0 |
Association |
SPSS |
Clementine, 10.0 |
Shopping |
View |
|
3.0 |
Distribution-based Clustering |
IBM |
DB2 Data Warehouse Edition V9.1 |
Elnino |
View |
|
3.0 |
Center-based Clustering |
IBM |
DB2 Data Warehouse Edition V9.1 |
Elnino |
View |
|
3.0 |
Clustering |
SPSS |
Clementine, 10.0 |
Iris |
View |
|
3.0 |
Model Composition |
IBM |
DB2 Data Warehouse Edition V9.1 |
Elnino |
View |
|
3.0 |
Neural Network |
SPSS |
Clementine, 10.0 |
Iris |
View |
|
3.0 |
Neural Network |
SPSS |
Clementine, 10.0 |
Heart |
View |
|
3.0 |
Neural Network |
SPSS |
Clementine, 10.0 |
Iris |
View |
|
3.0 |
Neural Network |
SPSS |
Clementine, 10.0 |
Heart |
View |
|
3.0 |
General Regression |
SPSS |
Clementine, 10.0 |
Iris |
View |
|
3.0 |
Regression |
IBM |
DB2 Data Warehouse Edition V9.1 |
Elnino |
View |
|
3.0 |
Regression |
IBM |
DB2 Data Warehouse Edition V9.1 |
Elnino |
View |
|
3.0 |
Regression |
SPSS |
Clementine, 10.0 |
Elnino |
View |
|
3.0 |
Regression |
SPSS |
Clementine, 10.0 |
Elnino |
View |
|
3.0 |
Regression |
SPSS |
Clementine, 10.0 |
Heart |
View |
|
3.0 |
Ruleset |
SPSS |
Clementine, 10.0 |
Heart |
View |
|
3.0 |
Sequence |
SPSS |
Clementine, 10.0 |
Visits |
View |
|
3.0 |
Tree |
IBM |
DB2 Data Warehouse Edition V9.1 |
Heart |
View |
|
3.0 |
Tree |
SPSS |
Clementine, 10.0 |
Iris |
View |
|
3.0 |
Tree |
SPSS |
Clementine, 10.0 |
Heart |
View |
|
3.1 |
Sequence |
IBM |
DB2 Data Warehouse Edition V9.1 |
Visits |
View |
|
3.1 |
Association |
SAS |
SAS 9.2 |
Unknown |
View |
|
3.1 |
Ann |
SAS |
SAS 9.2 |
Iris |
View |
|
3.1 |
Cluster |
SAS |
SAS 9.2 |
Iris |
View |
|
3.1 |
Logistic Reg. |
SAS |
SAS 9.2 |
Iris |
View |
|
3.1 |
Tree |
SAS |
SAS 9.2 |
Iris |
View |
The Data Mining Group is always looking to increase the
variety of these samples. If you would like to submit samples,
please see the instructions below.
Datasets for PMML Sample Models
These datasets are used in conjunction with the sample
PMML models. While a high level description is provided here, more details
can be found in the ReadMe text file associated with each dataset. If you
publish material based on these datasets, please note the source in your
acknowledgements.
|
Dataset |
Description |
Source |
Comma-Delimited File |
|
Elnino |
Contains oceanographic and surface meteorological readings taken from a
series of buoys positioned throughout the equatorial Pacific. The "small"
dataset is provided here, larger dataset are available via the UCI KDD
Archive. The data is expected to aid in the understanding and prediction of
El Nino/Southern Oscillation (ENSO) cycles (from National Oceanic and
Atmospheric Administration, donated by Dr. Di Cook of Iowa State
University). Click here for more info... |
UCI KDD Archive |
View |
|
Heart |
Data provided by the Cleveland Clinic Foundation on the diagnosis of heart
disease. The data file consists of 13 potential predictors and a target field
(num) identifying patients diagnosed with > 50% diameter narrowing of arteries
(value >50), otherwise (<50) is assigned. In the original file, categorical
values were represented by numeric codes, these have been replaced with
representative strings for easy use.
|
UCI Machine Learning Repository
|
View |
|
Iris |
Perhaps the best known database to be found in the pattern recognition
literature, R. A. Fisher's 1936 paper is a classic in the field and is
referenced frequently to this day. The data set contains 3 classes of 50
instances ach, where each class refers to a type of iris plant. One class
is linearly separable from the other 2; the latter are NOT linearly
separable from each other (from Fisher,R.A. "The use of multiple
measurements in taxonomic problems," Annual Eugenics, 7, Part II, 179-188,
1936).
Click here for more info... |
UCI
Machine Learning Repository |
View |
|
Robustness |
This dataset is aimed at finding flaws in PMML export implementations.
In terms of data mininig, the data makes no sense at all, since the values are
randomly distributed, and in no way ment to be correlated. If you receive a
meaningful model, you most probably did something wrong.
Click here for more info |
IBM |
View Apply Data
View Train Data |
|
Shopping |
Contains data for SPSS SHOPPING_ASSOC.xml |
SPSS |
View |
|
Visits |
Describes the page visits of users who visited msnbc.com on September 28,
1999. Visits are recorded at the level of URL category and are recorded in
time order (from David Heckerman of Microsoft Corporation).
Visits_Small.csv contains about 65,000 visits, Visits_Large.csv contains
over 880,000 visits
Click here for more info… |
UCI KDD Archive |
View 65KB
View 880KB |
|
Voting |
Includes votes for each of the U.S. House of Representatives Congressperson
on 16 key votes (from Congressional Quarterly Almanac, 98th Congress, 2nd
session 1984, Volume XL: Congressional Quarterly Inc. Washington, D.C.,
1985. Donated by Jeff Schlimmer at Carnegie-Mellon University).
Click here for more info... |
UCI
Machine Learning Repository |
View |
Additional PMML Examples
These models are additional examples of PMML, not based on
the datasets listed above (datasets marked * can be found by seaching the
UCI Machine Learning
Repository, datasets marked N/A are not available). These models are included here to provide a wider range
of PMML examples for inspection and understanding.
|
PMML
Version |
Model Type |
Vendor |
Application |
Dataset |
PMML
File |
|
3.0 |
Regression |
Salford
Systems |
MARS |
N/A |
View |
|
2.0 |
Tree |
Weka |
Weka 3-3-5 |
Anneal* |
View |
|
2.0 |
Tree |
Weka |
Weka 3-3-5 |
Audiology* |
View |
|
2.0 |
Tree |
Weka |
Weka 3-3-5 |
Autos* |
View |
|
2.0 |
Tree |
Weka |
Weka 3-3-5 |
Balance Scale* |
View |
|
2.0 |
Tree |
Weka |
Weka 3-3-5 |
Breast Cancer* |
View |
|
2.0 |
Tree |
Weka |
Weka 3-3-5 |
Wisconsin Breast Cancer* |
View |
How to Submit PMML Models:
If you wish to provide PMML models, please send the
following to info@dmg.org. In the body of your message please provide:
- Text describing the model, including (* = unless
included in the PMML Header element):
PMML Version *
Application *
Application
Version *
Submitting
Organization
Any special
characteristics
- If you do not use one of the datasets already listed
here, please provide text describing the dataset, including
Dataset Title
Source(s),
including any acknowledgements:
Any past usage
Description and other relevant Information:
Number of Records (items, occurrences, rows, etc.)
Number of Variables (fields, columns, etc.)
Variable Information, especially data types, categorical values, valid ranges,
etc.
Missing Values Descriptions
Summary Statistics & Data Distributions
Output/Scoring Information
- If you do use one of the datasets already listed here,
please provide the output of your model for inclusion with the existing
dataset.
- Contact information, and whether you want your
information included on this webpage
Also, attach the following files:
- The PMML model
- The dataset used to train and validate the model.
Also, include in the dataset the output of the model so other users can verify
their results. The first line (row) should contain the variable names
(column headers)
Acknowledgements:
The Data Mining Group thanks the UCI Repository of Machine
Learning Databases for being a valuable resource:
Blake, C.L. & Merz, C.J. (1998). UCI Repository of
machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html].
Irvine, CA: University of California, Department of Information and Computer
Science.
|