aaanalysis.AAclust
- class aaanalysis.AAclust(model_class=<class 'sklearn.cluster._kmeans.KMeans'>, model_kwargs=None, verbose=True, random_state=None)[source]
Bases:
WrapperAmino Acid clustering (AAclust) class: A k-optimized clustering wrapper for selecting redundancy-reduced sets of numerical scales [Breimann24a].
AAclust uses clustering models that require a pre-defined number of clusters (k, set by
n_clusters), such as k-means or other scikit-learn clustering models. It optimizes the value of k by utilizing Pearson correlation and then selects a representative sample (‘medoid’) for each cluster closest to the center, resulting in a redundancy-reduced sample set.Added in version 0.1.0.
- labels_
Cluster labels in the order of samples in
X.- Type:
array-like, shape (n_samples)
- centers_
Average scale values corresponding to each cluster.
- Type:
array-like, shape (n_clusters, n_features)
- labels_centers_
Cluster labels for each cluster center.
- Type:
array-like, shape (n_clusters)
- medoids_
Representative samples, one for each cluster.
- Type:
array-like, shape (n_clusters, n_features)
- labels_medoids_
Cluster labels for each medoid.
- Type:
array-like, shape (n_clusters)
- is_medoid_
Array indicating samples being medoids (1) or not (0). Same order as
labels_.- Type:
array-like, shape (n_samples)
- Parameters:
Methods
comp_centers(X[, labels])Computes the center of each cluster based on the given labels.
comp_correlation(X[, labels, X_ref, ...])Computes the Pearson correlation of given data with reference data.
comp_coverage([names, names_ref])Computes the percentage of unique names from
namesthat are present innames_ref.comp_medoids(X[, labels, metric])Computes the medoid of each cluster based on the given labels.
eval(X[, list_labels, names_datasets])Evaluates the quality of different clustering results.
filter_coverage(X[, scale_ids, names_ref, ...])Select a redundancy-reduced set of numerical scales with defined subcategory coverage.
fit(X[, n_clusters, on_center, min_th, ...])Applies AAclust algorithm to feature matrix (
X).name_clusters(X[, labels, names, shorten_names])Assigns names to clusters based on the frequency of names.
- __init__(model_class=<class 'sklearn.cluster._kmeans.KMeans'>, model_kwargs=None, verbose=True, random_state=None)[source]
- Parameters:
model_class (Type[ClusterMixin], default=KMeans) – A clustering model class with
n_clustersparameter.model_kwargs (dict, optional) – Keyword arguments to pass to the selected clustering model.
verbose (bool, default=True) – If
True, verbose outputs are enabled.random_state (int, optional) – The seed used by the random number generator. If a positive integer, results of stochastic processes are consistent, enabling reproducibility. If
None, stochastic processes will be truly random.
Notes
All attributes are set during fitting via the
AAclust.fit()method and can be directly accessed.AAclust is designed primarily for amino acid scales but can be used for any set of numerical indices.
See also
AAclustPlot: the respective plotting class.Scikit-learn clustering model classes.
Examples
The
AAclustclustering wrapper class can utilize any clustering model that uses then_clustersparameter:from sklearn.cluster import (KMeans, AgglomerativeClustering, MiniBatchKMeans, SpectralClustering) import aaanalysis as aa # AAclust with KMens (default) aac = aa.AAclust(model_class=KMeans) # AAclust with MiniBatchKMeans aac = aa.AAclust(model_class=MiniBatchKMeans) # AAclust with SpectralClustering aac = aa.AAclust(model_class=SpectralClustering)
The hierarchical agglomerative clustering model utilizes four different linkage measures, which can be provided to
AAclustby itsmodel_kwargsparameter:# AAclust using AgglomerativeClustering with Euclidean distance aac = aa.AAclust(model_class=AgglomerativeClustering, model_kwargs=dict(linkage='average')) # Other linkage methods are 'ward', 'complete', and 'single'