DMG.ORG

PMML 1.1 -- Polynomial Regression



Model DTD and Tag Description

The regression functions are used to determine the relationship between the dependent variable (target variable) and one or more independent variables. The dependent variable is the one whose values you want to predict, whereas the independent variables are the variables that you base your prediction on.

The regression formula is:

Dependent variable = intercept + Sumi (coefficient * independent variablei ) + error


<!-- regression model.  -->
<!ELEMENT RegressionModel (Extension*, RegressionTable) >
<!ATTLIST RegressionModel
   modelType          (linearRegression|
                       stepwisePolynomialRegression) #REQUIRED
   targetVariableName %FIELD-NAME;                   #REQUIRED
   modelName          CDATA                          #IMPLIED

<!ELEMENT RegressionTable (NumericPredictor*), (CategoricalPredictor*))>
<!ATTLIST RegressionTable
    intercept                      %REAL-NUMBER;        #REQUIRED
>

<!ELEMENT NumericPredictor EMPTY>
<!ATTLIST NumericPredictor
    name                           %FIELD-NAME;         #REQUIRED
    exponent                       %INT-NUMBER;         #REQUIRED
    coefficient                    %REAL-NUMBER;        #REQUIRED
    mean                           %REAL-NUMBER;        #IMPLIED
>
<!ELEMENT CategoricalPredictor EMPTY>
<!ATTLIST CategoricalPredictor
    name                           %FIELD-NAME;         #REQUIRED
    value                          CDATA                #REQUIRED
    coefficient                    %REAL-NUMBER;        #REQUIRED
>
RegressionModel : The root element of an XML regression model. Each instance of a regression model must start with this element.

modelName : This is a unique identifier specifying the name of the regression model.

modelType : Specifies the type of a regression model. This information is used to select the appropriate mathematical formulas during the scoring phase. The supported regression algorithms are listed.

targetVariableName : The name of the target variable (also called response variable).

RegressionTable : A table that lists the values of all predictors or independent variables.

NumericPredictor : Defines a numeric independent variable. The list of valid attributes comprises the name of the variable, the exponent to be used, and the coefficient by which the values of this variable must be multiplied. If the independent variable contains missing values, the mean attribute is used to replace the missing values with the mean value.

CategoricalPredictor : Defines a categorical independent variable. The list of attributes comprises the name of the variable, the value attribute, and the coefficient by which the values of this variable must be multiplied. To do a regression analysis with categorical values, some means must be applied to enable calculations. If the specified value of an independent value occurs, the term variable_name(value) is replaced with 1. Thus the coefficient is multiplied by 1. If the value does not occur, the term variable_name(value) is replaced with 0 so that the product

coefficient × variable_name(value)
yields 0. Consequently, the product is ignored in the ongoing analysis.


Example:

The following regression formula is used to predict the number of insurance claims:
number of claims = 132.37 + 7.1 age + 0.01 salary + 41.1 car location( carpark ) + 325.03 car location( street )

If the value carpark was specified for car location in a particular record, you would get the following formula:

number of claims = 132.37 + 7.1 age + 0.01 salary + 41.1 × 1 + 325.03 × 0


Linear Regression Sample

This is a  linear regression equation predicting a number of insurance claims on prior knowledge of the values of the independent variables age, salary and car location. Car location is the only categorical variable. Its value attribute can take on two possible values, carpark and street.

number of claims = 132.37 + 7.1 age + 0.01 salary + 41.1 car location( carpark ) + 325.03 car location( street )

The corresponding XML model is:


<RegressionModel
   modelName="Sample for linear regression"
   modelType="linearRegression"
   targetVariableName="number of claims">

   <RegressionTable intercept="132.37">
       <NumericPredictor name="age" 
                         exponent="1" coefficient="7.1"/>
       <NumericPredictor name="salary" 
                         exponent="1" coefficient="0.01"/>
       <CategoricalPredictor name="car location"
                         value="carpark" coefficient="41.1"/>
       <CategoricalPredictor name="car location"
                         value="street" coefficient="325.03"/>
     </RegressionTable>

</RegressionModel>


Stepwise Polynomial Regression Sample

This is a stepwise polynomial regression equation predicting a number of insurance claims on prior knowledge of the values of the independent variables salary and car location. Car location is a categorical variable. Its value attribute can take on two possible values, carpark and street.

number of claims = 3216.38 - 0.08 salary  + 9.54E-7 salary**2  - 2.67E-12 salary**3 + 93.78 car location( carpark ) + 288.75 car location( street )

The corresponding XML model is:


<RegressionModel
   modelName="Sample for stepwise polynomial regression"
   modelType="stepwisePolynomialRegression"
   targetVariableName="number of claims">

   <RegressionTable intercept="3216.38">
       <NumericPredictor name="salary" 
                         exponent="1" coefficient="-0.08"/>
       <NumericPredictor name="salary" 
                         exponent="2" coefficient="9.54E-7"/>
       <NumericPredictor name="salary" 
                         exponent="3" coefficient="-2.67E-12"/>
       <CategoricalPredictor name="car location"
                         value="carpark" coefficient="93.78"/>
       <CategoricalPredictor name="car location"
                         value="street" coefficient="288.75"/>
     </RegressionTable>

</RegressionModel>

Webmaster

Copyright © 2000 DMG.org All Rights Reserved.