- Partitioned-based clustering
- k-means, k-median, fuzzy c-means
- Hierarchical clustering
- Produces trees of clusters
- Agglomerative, divisive
- Advantages
- It does not require the number of clusters to be specified.
- Produces a dendrogram which helps with understanding the data.
- Disadvantages
- It can never undo any previous steps throughout the algorithm.
- Sometimes difficult to identify the number of clusters by the dendrogram.
- Density-bassed clustering
- Produces arbitrary shaped clusters
- Locates regions of high density, and separates outliers
- DBSCAN
- Does not require specification of the number of clusters
- Time-series clustering by features
- Time-series clustering by features.
- Raw data.
- Autocorrelation.
- Spectral density.
- Extreme value behavior.
- Model-based time series clustering.
- Forecast based clustering.
- Model with a cluster structure.
- Time-series clustering by dependence.
- Time-series clustering by features.
- Clustering high dimensional data
- Many clustering algorithms deal with 1-3 dimensions
- These methods may not work well when the number of dimensions grows to 20
- Methods for clustering high dimensional data
- Methods can be grouped into two categories
- Subspace clustering
- CLIQUE, ProClus, and bi-clustering approaches
- Dimensionality reduction approaches
- Spectral clustering and various dimensionality reduction methods
- Subspace clustering
- Clustering should not only consider dimensions but also attributes/features
- Feature selection
- Feature transformation
- Principal component analysis, singular value decomposition
- Methods can be grouped into two categories
Friday, December 28, 2018
Clustering
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.