|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.clusterers.AbstractClusterer
weka.clusterers.RandomizableClusterer
weka.clusterers.XMeans
public class XMeans
Cluster data using the X-means algorithm.
X-Means is K-Means extended by an Improve-Structure part In this part of the algorithm the centers are attempted to be split in its region. The decision between the children of each center and itself is done comparing the BIC-values of the two structures.
For more information see:
Dan Pelleg, Andrew W. Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Seventeenth International Conference on Machine Learning, 727-734, 2000.
@inproceedings{Pelleg2000, author = {Dan Pelleg and Andrew W. Moore}, booktitle = {Seventeenth International Conference on Machine Learning}, pages = {727-734}, publisher = {Morgan Kaufmann}, title = {X-means: Extending K-means with Efficient Estimation of the Number of Clusters}, year = {2000} }Valid options are:
-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
RandomizableClusterer
,
Serialized FormField Summary | |
---|---|
static int |
D_CONVCHCLOSER
have a closer look at converge children. |
static int |
D_CURR
for current debug. |
static int |
D_FOLLOWSPLIT
follows the splitting of the centers. |
static int |
D_GENERAL
general debugging. |
static int |
D_ITERCOUNT
follow iterations. |
static int |
D_KDTREE
check on kdtree. |
static int |
D_METH_MISUSE
functions were maybe misused. |
static int |
D_PRINTCENTERS
print the centers. |
static int |
D_RANDOMVECTOR
check on random vectors. |
boolean |
m_CurrDebugFlag
Flag: I'm debugging. |
static int |
R_HIGH
Index in ranges for HIGH. |
static int |
R_LOW
Index in ranges for LOW. |
static int |
R_WIDTH
Index in ranges for WIDTH. |
Constructor Summary | |
---|---|
XMeans()
the default constructor. |
Method Summary | |
---|---|
java.lang.String |
binValueTipText()
Returns the tip text for this property. |
void |
buildClusterer(Instances data)
Generates the X-Means clusterer. |
boolean |
checkForNominalAttributes(Instances data)
Checks for nominal attributes in the dataset. |
int |
clusterInstance(Instance instance)
Classifies a given instance. |
java.lang.String |
cutOffFactorTipText()
Returns the tip text for this property. |
java.lang.String |
debugLevelTipText()
Returns the tip text for this property. |
java.lang.String |
debugVectorsFileTipText()
Returns the tip text for this property. |
java.lang.String |
distanceFTipText()
Returns the tip text for this property. |
double |
getBinValue()
Gets value that represents true in a new numeric attribute. |
Capabilities |
getCapabilities()
Returns default capabilities of the clusterer. |
double |
getCutOffFactor()
Gets the cutoff factor. |
int |
getDebugLevel()
Gets the debug level. |
java.io.File |
getDebugVectorsFile()
Gets the file name for a file that has the random vectors stored. |
DistanceFunction |
getDistanceF()
Gets the distance function. |
java.io.File |
getInputCenterFile()
Gets the file to read the list of centers from. |
KDTree |
getKDTree()
Gets the KDTree class. |
int |
getMaxIterations()
Gets the maximum number of iterations. |
int |
getMaxKMeans()
Gets the maximum number of iterations in KMeans. |
int |
getMaxKMeansForChildren()
Gets the maximum number of iterations in KMeans. |
int |
getMaxNumClusters()
Gets the maximum number of clusters to generate. |
int |
getMinNumClusters()
Gets the minimum number of clusters to generate. |
Instance |
getNextDebugVectorsInstance(Instances model)
Read an instance from debug vectors file. |
java.lang.String[] |
getOptions()
Gets the current settings of SimpleKMeans. |
java.io.File |
getOutputCenterFile()
Gets the file to write the list of centers to. |
java.lang.String |
getRevision()
Returns the revision string. |
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on. |
boolean |
getUseKDTree()
Gets whether the KDTree is used or not. |
java.lang.String |
globalInfo()
Returns a string describing this clusterer. |
void |
initDebugVectorsInput()
Initialises the debug vector input. |
java.lang.String |
inputCenterFileTipText()
Returns the tip text for this property. |
java.lang.String |
KDTreeTipText()
Returns the tip text for this property. |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
static void |
main(java.lang.String[] argv)
Main method for testing this class. |
java.lang.String |
maxIterationsTipText()
Returns the tip text for this property. |
java.lang.String |
maxKMeansForChildrenTipText()
Returns the tip text for this property. |
java.lang.String |
maxKMeansTipText()
Returns the tip text for this property. |
java.lang.String |
maxNumClustersTipText()
Returns the tip text for this property. |
java.lang.String |
minNumClustersTipText()
Returns the tip text for this property. |
int |
numberOfClusters()
Returns the number of clusters. |
java.lang.String |
outputCenterFileTipText()
Returns the tip text for this property. |
void |
setBinValue(double value)
Sets the distance value between true and false of binary attributes. |
void |
setCutOffFactor(double i)
Sets a new cutoff factor. |
void |
setDebugLevel(int d)
Sets the debug level. |
void |
setDebugVectorsFile(java.io.File value)
Sets the file that has the random vectors stored. |
void |
setDistanceF(DistanceFunction distanceF)
gets the "binary" distance value. |
void |
setInputCenterFile(java.io.File value)
Sets the file to read the list of centers from. |
void |
setKDTree(KDTree k)
Sets the KDTree class. |
void |
setMaxIterations(int i)
Sets the maximum number of iterations to perform. |
void |
setMaxKMeans(int i)
Set the maximum number of iterations to perform in KMeans. |
void |
setMaxKMeansForChildren(int i)
Sets the maximum number of iterations KMeans that is performed on the child centers. |
void |
setMaxNumClusters(int n)
Sets the maximum number of clusters to generate. |
void |
setMinNumClusters(int n)
Sets the minimum number of clusters to generate. |
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setOutputCenterFile(java.io.File value)
Sets file to write the list of centers to. |
void |
setUseKDTree(boolean value)
Sets whether to use the KDTree or not. |
java.lang.String |
toString()
Return a string describing this clusterer. |
java.lang.String |
useKDTreeTipText()
Returns the tip text for this property. |
Methods inherited from class weka.clusterers.RandomizableClusterer |
---|
getSeed, seedTipText, setSeed |
Methods inherited from class weka.clusterers.AbstractClusterer |
---|
distributionForInstance, forName, makeCopies, makeCopy |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static int R_LOW
public static int R_HIGH
public static int R_WIDTH
public static int D_PRINTCENTERS
public static int D_FOLLOWSPLIT
public static int D_CONVCHCLOSER
public static int D_RANDOMVECTOR
public static int D_KDTREE
public static int D_ITERCOUNT
public static int D_METH_MISUSE
public static int D_CURR
public static int D_GENERAL
public boolean m_CurrDebugFlag
Constructor Detail |
---|
public XMeans()
Method Detail |
---|
public java.lang.String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface TechnicalInformationHandler
public Capabilities getCapabilities()
getCapabilities
in interface Clusterer
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class AbstractClusterer
Capabilities
public void buildClusterer(Instances data) throws java.lang.Exception
buildClusterer
in interface Clusterer
buildClusterer
in class AbstractClusterer
data
- set of instances serving as training data
java.lang.Exception
- if the clusterer has not been
generated successfullypublic boolean checkForNominalAttributes(Instances data)
data
- the data to check
public int clusterInstance(Instance instance) throws java.lang.Exception
clusterInstance
in interface Clusterer
clusterInstance
in class AbstractClusterer
instance
- the instance to be assigned to a cluster
java.lang.Exception
- if instance could not be classified
successfullypublic int numberOfClusters()
numberOfClusters
in interface Clusterer
numberOfClusters
in class AbstractClusterer
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class RandomizableClusterer
public java.lang.String minNumClustersTipText()
public void setMinNumClusters(int n)
n
- the minimum number of clusters to generatepublic int getMinNumClusters()
public java.lang.String maxNumClustersTipText()
public void setMaxNumClusters(int n)
n
- the maximum number of clusters to generatepublic int getMaxNumClusters()
public java.lang.String maxIterationsTipText()
public void setMaxIterations(int i) throws java.lang.Exception
i
- the number of iterations
java.lang.Exception
- if i is less than 1public int getMaxIterations()
public java.lang.String maxKMeansTipText()
public void setMaxKMeans(int i)
i
- the number of iterationspublic int getMaxKMeans()
public java.lang.String maxKMeansForChildrenTipText()
public void setMaxKMeansForChildren(int i)
i
- the number of iterationspublic int getMaxKMeansForChildren()
public java.lang.String cutOffFactorTipText()
public void setCutOffFactor(double i)
i
- the new cutoff factorpublic double getCutOffFactor()
public java.lang.String binValueTipText()
public double getBinValue()
public void setBinValue(double value)
value
- the distancepublic java.lang.String distanceFTipText()
public void setDistanceF(DistanceFunction distanceF)
distanceF
- the distance function with all options setpublic DistanceFunction getDistanceF()
public java.lang.String debugVectorsFileTipText()
public void setDebugVectorsFile(java.io.File value)
value
- the file to read the random vectors frompublic java.io.File getDebugVectorsFile()
public void initDebugVectorsInput() throws java.lang.Exception
java.lang.Exception
- if there is error
opening the debug input file.public Instance getNextDebugVectorsInstance(Instances model) throws java.lang.Exception
model
- the data model for the instance.
java.lang.Exception
- if there are no debug vector
in m_DebugVectors.public java.lang.String inputCenterFileTipText()
public void setInputCenterFile(java.io.File value)
value
- the file to read centers frompublic java.io.File getInputCenterFile()
public java.lang.String outputCenterFileTipText()
public void setOutputCenterFile(java.io.File value)
value
- file to write centers topublic java.io.File getOutputCenterFile()
public java.lang.String KDTreeTipText()
public void setKDTree(KDTree k)
k
- a KDTree object with all options setpublic KDTree getKDTree()
public java.lang.String useKDTreeTipText()
public void setUseKDTree(boolean value)
value
- if true the KDTree is usedpublic boolean getUseKDTree()
public java.lang.String debugLevelTipText()
public void setDebugLevel(int d)
d
- debuglevelpublic int getDebugLevel()
public void setOptions(java.lang.String[] options) throws java.lang.Exception
-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
setOptions
in interface OptionHandler
setOptions
in class RandomizableClusterer
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class RandomizableClusterer
public java.lang.String toString()
toString
in class java.lang.Object
public java.lang.String getRevision()
getRevision
in interface RevisionHandler
public static void main(java.lang.String[] argv)
argv
- should contain options
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |