weka.datagenerators.classifiers.classification
Class RDG1

java.lang.Object
  extended by weka.datagenerators.DataGenerator
      extended by weka.datagenerators.ClassificationGenerator
          extended by weka.datagenerators.classifiers.classification.RDG1
All Implemented Interfaces:
java.io.Serializable, OptionHandler, Randomizable, RevisionHandler

public class RDG1
extends ClassificationGenerator

A data generator that produces data randomly by producing a decision list.
The decision list consists of rules.
Instances are generated randomly one by one. If decision list fails to classify the current instance, a new rule according to this current instance is generated and added to the decision list.

The option -V switches on voting, which means that at the end of the generation all instances are reclassified to the class value that is supported by the most rules.

This data generator can generate 'boolean' attributes (= nominal with the values {true, false}) and numeric attributes. The rules can be 'A' or 'NOT A' for boolean values and 'B < random_value' or 'B >= random_value' for numeric values.

Valid options are:

 -h
  Prints this help.
 -o <file>
  The name of the output file, otherwise the generated data is
  printed to stdout.
 -r <name>
  The name of the relation.
 -d
  Whether to print debug informations.
 -S
  The seed for random function (default 1)
 -n <num>
  The number of examples to generate (default 100)
 -a <num>
  The number of attributes (default 10).
 -c <num>
  The number of classes (default 2)
 -R <num>
  maximum size for rules (default 10) 
 -M <num>
  minimum size for rules (default 1) 
 -I <num>
  number of irrelevant attributes (default 0)
 -N
  number of numeric attributes (default 0)
 -V
  switch on voting (default is no voting)
Following an example of a generated dataset:
 %
 % weka.datagenerators.RDG1 -r expl -a 2 -c 3 -n 4 -N 1 -I 0 -M 2 -R 10 -S 2
 %
 relation expl

 attribute a0 {false,true}
 attribute a1 numeric
 attribute class {c0,c1,c2}

 data

 true,0.496823,c0
 false,0.743158,c1
 false,0.408285,c1
 false,0.993687,c2
 %
 % Number of attributes chosen as irrelevant = 0
 %
 % DECISIONLIST (number of rules = 3):
 % RULE 0:   c0 := a1 < 0.986, a0
 % RULE 1:   c1 := a1 < 0.95, not(a0)
 % RULE 2:   c2 := not(a0), a1 >= 0.562
 

Version:
$Revision: 1.5 $
Author:
Gabi Schmidberger (gabi@cs.waikato.ac.nz)
See Also:
Serialized Form

Constructor Summary
RDG1()
          initializes the generator with default values
 
Method Summary
 java.lang.String attList_IrrTipText()
          Returns the tip text for this property
 Instances defineDataFormat()
          Initializes the format for the dataset produced.
 Instance generateExample()
          Generate an example of the dataset dataset.
 Instances generateExamples()
          Generate all examples of the dataset.
 Instances generateExamples(int num, java.util.Random random, Instances format)
          Generate all examples of the dataset.
 java.lang.String generateFinished()
          Compiles documentation about the data generation.
 java.lang.String generateStart()
          Generates a comment string that documentates the data generator.
 boolean[] getAttList_Irr()
          Gets the array that defines which of the attributes are seen to be irrelevant.
 int getMaxRuleSize()
          Gets the maximum number of tests in rules.
 int getMinRuleSize()
          Gets the minimum number of tests in rules.
 int getNumAttributes()
          Gets the number of attributes that should be produced.
 int getNumClasses()
          Gets the number of classes the dataset should have.
 int getNumIrrelevant()
          Gets the number of irrelevant attributes.
 int getNumNumeric()
          Gets the number of numerical attributes.
 java.lang.String[] getOptions()
          Gets the current settings of the datagenerator RDG1.
 java.lang.String getRevision()
          Returns the revision string.
 boolean getSingleModeFlag()
          Gets the single mode flag.
 boolean getVoteFlag()
          Gets the vote flag.
 java.lang.String globalInfo()
          Returns a string describing this data generator.
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Main method for testing this class.
 java.lang.String maxRuleSizeTipText()
          Returns the tip text for this property
 java.lang.String minRuleSizeTipText()
          Returns the tip text for this property
 java.lang.String numAttributesTipText()
          Returns the tip text for this property
 java.lang.String numClassesTipText()
          Returns the tip text for this property
 java.lang.String numIrrelevantTipText()
          Returns the tip text for this property
 java.lang.String numNumericTipText()
          Returns the tip text for this property
 void setAttList_Irr(boolean[] newAttList_Irr)
          Sets the array that defines which of the attributes are seen to be irrelevant.
 void setMaxRuleSize(int newMaxRuleSize)
          Sets the maximum number of tests in rules.
 void setMinRuleSize(int newMinRuleSize)
          Sets the minimum number of tests in rules.
 void setNumAttributes(int numAttributes)
          Sets the number of attributes the dataset should have.
 void setNumClasses(int numClasses)
          Sets the number of classes the dataset should have.
 void setNumIrrelevant(int newNumIrrelevant)
          Sets the number of irrelevant attributes.
 void setNumNumeric(int newNumNumeric)
          Sets the number of numerical attributes.
 void setOptions(java.lang.String[] options)
          Parses a list of options for this object.
 void setVoteFlag(boolean newVoteFlag)
          Sets the vote flag.
 java.lang.String voteFlagTipText()
          Returns the tip text for this property
 
Methods inherited from class weka.datagenerators.ClassificationGenerator
getNumExamples, numExamplesTipText, setNumExamples
 
Methods inherited from class weka.datagenerators.DataGenerator
debugTipText, defaultOutput, formatTipText, getDatasetFormat, getDebug, getNumExamplesAct, getOutput, getRandom, getRelationName, getSeed, makeData, outputTipText, randomTipText, relationNameTipText, seedTipText, setDatasetFormat, setDebug, setOutput, setRandom, setRelationName, setSeed
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RDG1

public RDG1()
initializes the generator with default values

Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this data generator.

Returns:
a description of the data generator suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class ClassificationGenerator
Returns:
an enumeration of all the available options

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a list of options for this object.

Valid options are:

 -h
  Prints this help.
 -o <file>
  The name of the output file, otherwise the generated data is
  printed to stdout.
 -r <name>
  The name of the relation.
 -d
  Whether to print debug informations.
 -S
  The seed for random function (default 1)
 -n <num>
  The number of examples to generate (default 100)
 -a <num>
  The number of attributes (default 10).
 -c <num>
  The number of classes (default 2)
 -R <num>
  maximum size for rules (default 10) 
 -M <num>
  minimum size for rules (default 1) 
 -I <num>
  number of irrelevant attributes (default 0)
 -N
  number of numeric attributes (default 0)
 -V
  switch on voting (default is no voting)

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class ClassificationGenerator
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the datagenerator RDG1.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class ClassificationGenerator
Returns:
an array of strings suitable for passing to setOptions
See Also:
DataGenerator.removeBlacklist(String[])

setNumAttributes

public void setNumAttributes(int numAttributes)
Sets the number of attributes the dataset should have.

Parameters:
numAttributes - the new number of attributes

getNumAttributes

public int getNumAttributes()
Gets the number of attributes that should be produced.

Returns:
the number of attributes that should be produced

numAttributesTipText

public java.lang.String numAttributesTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumClasses

public void setNumClasses(int numClasses)
Sets the number of classes the dataset should have.

Parameters:
numClasses - the new number of classes

getNumClasses

public int getNumClasses()
Gets the number of classes the dataset should have.

Returns:
the number of classes the dataset should have

numClassesTipText

public java.lang.String numClassesTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getMaxRuleSize

public int getMaxRuleSize()
Gets the maximum number of tests in rules.

Returns:
the maximum number of tests allowed in rules

setMaxRuleSize

public void setMaxRuleSize(int newMaxRuleSize)
Sets the maximum number of tests in rules.

Parameters:
newMaxRuleSize - new maximum number of tests allowed in rules.

maxRuleSizeTipText

public java.lang.String maxRuleSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getMinRuleSize

public int getMinRuleSize()
Gets the minimum number of tests in rules.

Returns:
the minimum number of tests allowed in rules

setMinRuleSize

public void setMinRuleSize(int newMinRuleSize)
Sets the minimum number of tests in rules.

Parameters:
newMinRuleSize - new minimum number of test in rules.

minRuleSizeTipText

public java.lang.String minRuleSizeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumIrrelevant

public int getNumIrrelevant()
Gets the number of irrelevant attributes.

Returns:
the number of irrelevant attributes

setNumIrrelevant

public void setNumIrrelevant(int newNumIrrelevant)
Sets the number of irrelevant attributes.

Parameters:
newNumIrrelevant - the number of irrelevant attributes.

numIrrelevantTipText

public java.lang.String numIrrelevantTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumNumeric

public int getNumNumeric()
Gets the number of numerical attributes.

Returns:
the number of numerical attributes.

setNumNumeric

public void setNumNumeric(int newNumNumeric)
Sets the number of numerical attributes.

Parameters:
newNumNumeric - the number of numerical attributes.

numNumericTipText

public java.lang.String numNumericTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getVoteFlag

public boolean getVoteFlag()
Gets the vote flag.

Returns:
voting flag.

setVoteFlag

public void setVoteFlag(boolean newVoteFlag)
Sets the vote flag.

Parameters:
newVoteFlag - boolean with the new setting of the vote flag.

voteFlagTipText

public java.lang.String voteFlagTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getSingleModeFlag

public boolean getSingleModeFlag()
Gets the single mode flag.

Specified by:
getSingleModeFlag in class DataGenerator
Returns:
true if methode generateExample can be used.

getAttList_Irr

public boolean[] getAttList_Irr()
Gets the array that defines which of the attributes are seen to be irrelevant.

Returns:
the array that defines the irrelevant attributes

setAttList_Irr

public void setAttList_Irr(boolean[] newAttList_Irr)
Sets the array that defines which of the attributes are seen to be irrelevant.

Parameters:
newAttList_Irr - array that defines the irrelevant attributes.

attList_IrrTipText

public java.lang.String attList_IrrTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

defineDataFormat

public Instances defineDataFormat()
                           throws java.lang.Exception
Initializes the format for the dataset produced.

Overrides:
defineDataFormat in class DataGenerator
Returns:
the output data format
Throws:
java.lang.Exception - data format could not be defined
See Also:
DataGenerator.defaultRelationName()

generateExample

public Instance generateExample()
                         throws java.lang.Exception
Generate an example of the dataset dataset.

Specified by:
generateExample in class DataGenerator
Returns:
the instance generated
Throws:
java.lang.Exception - if format not defined or generating
examples one by one is not possible, because voting is chosen

generateExamples

public Instances generateExamples()
                           throws java.lang.Exception
Generate all examples of the dataset.

Specified by:
generateExamples in class DataGenerator
Returns:
the instance generated
Throws:
java.lang.Exception - if format not defined or generating
examples one by one is not possible, because voting is chosen

generateExamples

public Instances generateExamples(int num,
                                  java.util.Random random,
                                  Instances format)
                           throws java.lang.Exception
Generate all examples of the dataset.

Parameters:
num - the number of examples to generate
random - the random number generator to use
format - the dataset format
Returns:
the instance generated
Throws:
java.lang.Exception - if format not defined or generating
examples one by one is not possible, because voting is chosen

generateStart

public java.lang.String generateStart()
Generates a comment string that documentates the data generator. By default this string is added at the beginning of the produced output as ARFF file type, next after the options.

Specified by:
generateStart in class DataGenerator
Returns:
string contains info about the generated rules

generateFinished

public java.lang.String generateFinished()
                                  throws java.lang.Exception
Compiles documentation about the data generation. This is the number of irrelevant attributes and the decisionlist with all rules. Considering that the decisionlist might get enhanced until the last instance is generated, this method should be called at the end of the data generation process.

Specified by:
generateFinished in class DataGenerator
Returns:
string with additional information about generated dataset
Throws:
java.lang.Exception - no input structure has been defined

getRevision

public java.lang.String getRevision()
Returns the revision string.

Returns:
the revision

main

public static void main(java.lang.String[] args)
Main method for testing this class.

Parameters:
args - should contain arguments for the data producer: