|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.filters.Filter
weka.filters.SimpleFilter
weka.filters.SimpleBatchFilter
weka.filters.unsupervised.attribute.InterquartileRange
public class InterquartileRange
A filter for detecting outliers and extreme values based on interquartile ranges. The filter skips the class attribute.
Outliers:
Q3 + OF*IQR < x <= Q3 + EVF*IQR
or
Q1 - EVF*IQR <= x < Q1 - OF*IQR
Extreme values:
x > Q3 + EVF*IQR
or
x < Q1 - EVF*IQR
Key:
Q1 = 25% quartile
Q3 = 75% quartile
IQR = Interquartile Range, difference between Q1 and Q3
OF = Outlier Factor
EVF = Extreme Value Factor
-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)Thanks to Dale for a few brainstorming sessions.
Field Summary | |
---|---|
static int |
NON_NUMERIC
indicator for non-numeric attributes |
Constructor Summary | |
---|---|
InterquartileRange()
|
Method Summary | |
---|---|
java.lang.String |
attributeIndicesTipText()
Returns the tip text for this property |
java.lang.String |
detectionPerAttributeTipText()
Returns the tip text for this property |
java.lang.String |
extremeValuesAsOutliersTipText()
Returns the tip text for this property |
java.lang.String |
extremeValuesFactorTipText()
Returns the tip text for this property |
java.lang.String |
getAttributeIndices()
Gets the current range selection |
Capabilities |
getCapabilities()
Returns the Capabilities of this filter. |
boolean |
getDetectionPerAttribute()
Gets whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false"). |
boolean |
getExtremeValuesAsOutliers()
Get whether extreme values are also tagged as outliers. |
double |
getExtremeValuesFactor()
Gets the factor for determining the thresholds for extreme values. |
java.lang.String[] |
getOptions()
Gets the current settings of the filter. |
double |
getOutlierFactor()
Gets the factor for determining the thresholds for outliers. |
boolean |
getOutputOffsetMultiplier()
Gets whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR. |
java.lang.String |
getRevision()
Returns the revision string. |
java.lang.String |
globalInfo()
Returns a string describing this filter |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
static void |
main(java.lang.String[] args)
Main method for testing this class. |
java.lang.String |
outlierFactorTipText()
Returns the tip text for this property |
java.lang.String |
outputOffsetMultiplierTipText()
Returns the tip text for this property |
void |
setAttributeIndices(java.lang.String value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used). |
void |
setAttributeIndicesArray(int[] value)
Sets which attributes are to be used for interquartile calculations and outlier/extreme value detection (only numeric attributes among the selection will be used). |
void |
setDetectionPerAttribute(boolean value)
Set whether an Outlier/ExtremeValue attribute pair is generated for each numeric attribute ("true") or just one pair for all numeric attributes together ("false"). |
void |
setExtremeValuesAsOutliers(boolean value)
Set whether extreme values are also tagged as outliers. |
void |
setExtremeValuesFactor(double value)
Sets the factor for determining the thresholds for extreme values. |
void |
setOptions(java.lang.String[] options)
Parses a list of options for this object. |
void |
setOutlierFactor(double value)
Sets the factor for determining the thresholds for outliers. |
void |
setOutputOffsetMultiplier(boolean value)
Set whether an additional attribute "Offset" is generated per Outlier/ExtremeValue attribute pair that lists the multiplier the value is off the median: value = median + 'multiplier' * IQR. |
Methods inherited from class weka.filters.SimpleBatchFilter |
---|
batchFinished, input |
Methods inherited from class weka.filters.SimpleFilter |
---|
debugTipText, getDebug, setDebug, setInputFormat |
Methods inherited from class weka.filters.Filter |
---|
batchFilterFile, filterFile, getCapabilities, getOutputFormat, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputPeek, toString, useFilter, wekaStaticWrapper |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final int NON_NUMERIC
Constructor Detail |
---|
public InterquartileRange()
Method Detail |
---|
public java.lang.String globalInfo()
globalInfo
in class SimpleFilter
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class SimpleFilter
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)
setOptions
in interface OptionHandler
setOptions
in class SimpleFilter
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedSimpleFilter.reset()
public java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class SimpleFilter
public java.lang.String attributeIndicesTipText()
public java.lang.String getAttributeIndices()
public void setAttributeIndices(java.lang.String value)
value
- a string representing the list of attributes. Since
the string will typically come from a user, attributes
are indexed from 1. java.lang.IllegalArgumentException
- if an invalid range list is suppliedpublic void setAttributeIndicesArray(int[] value)
value
- an array containing indexes of attributes to work on.
Since the array will typically come from a program,
attributes are indexed from 0.
java.lang.IllegalArgumentException
- if an invalid set of ranges is suppliedpublic java.lang.String outlierFactorTipText()
public void setOutlierFactor(double value)
value
- the factor.public double getOutlierFactor()
public java.lang.String extremeValuesFactorTipText()
public void setExtremeValuesFactor(double value)
value
- the factor.public double getExtremeValuesFactor()
public java.lang.String extremeValuesAsOutliersTipText()
public void setExtremeValuesAsOutliers(boolean value)
value
- whether or not to tag extreme values also as outliers.public boolean getExtremeValuesAsOutliers()
public java.lang.String detectionPerAttributeTipText()
public void setDetectionPerAttribute(boolean value)
value
- whether or not to generate indicator attribute pairs
for each numeric attribute.public boolean getDetectionPerAttribute()
public java.lang.String outputOffsetMultiplierTipText()
public void setOutputOffsetMultiplier(boolean value)
value
- whether or not to generate the additional attribute.public boolean getOutputOffsetMultiplier()
public Capabilities getCapabilities()
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class Filter
Capabilities
public java.lang.String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class Filter
public static void main(java.lang.String[] args)
args
- should contain arguments to the filter: use -h for help
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |