Mungeol Heo: Clustering

Friday, December 28, 2018

Clustering

Partitioned-based clustering
- k-means, k-median, fuzzy c-means
Hierarchical clustering
- Produces trees of clusters
- Agglomerative, divisive
- Advantages
  - It does not require the number of clusters to be specified.
  - Produces a dendrogram which helps with understanding the data.
- Disadvantages
  - It can never undo any previous steps throughout the algorithm.
  - Sometimes difficult to identify the number of clusters by the dendrogram.
Density-bassed clustering
- Produces arbitrary shaped clusters
- Locates regions of high density, and separates outliers
- DBSCAN
  - Does not require specification of the number of clusters
Time-series clustering by features
- Time-series clustering by features.
  - Raw data.
  - Autocorrelation.
  - Spectral density.
  - Extreme value behavior.
- Model-based time series clustering.
  - Forecast based clustering.
  - Model with a cluster structure.
- Time-series clustering by dependence.
Clustering high dimensional data
- Many clustering algorithms deal with 1-3 dimensions
- These methods may not work well when the number of dimensions grows to 20
Methods for clustering high dimensional data
- Methods can be grouped into two categories
  - Subspace clustering
    - CLIQUE, ProClus, and bi-clustering approaches
  - Dimensionality reduction approaches
    - Spectral clustering and various dimensionality reduction methods
- Clustering should not only consider dimensions but also attributes/features
  - Feature selection
  - Feature transformation
    - Principal component analysis, singular value decomposition

Mungeol Heo

Friday, December 28, 2018

Clustering

No comments:

Post a Comment