aaanalysis.AAclust

class aaanalysis.AAclust(model_class=<class 'sklearn.cluster._kmeans.KMeans'>, model_kwargs=None, verbose=True, random_state=None)[source]

Bases: Wrapper

Amino Acid clustering (AAclust) class: A k-optimized clustering wrapper for selecting redundancy-reduced sets of numerical scales [Breimann24a].

AAclust uses clustering models that require a pre-defined number of clusters (k, set by n_clusters), such as k-means or other scikit-learn clustering models. It optimizes the value of k by utilizing Pearson correlation and then selects a representative sample (‘medoid’) for each cluster closest to the center, resulting in a redundancy-reduced sample set.

Added in version 0.1.0.

model

The fitted clustering model object after calling the fit method.

Type:

object

n_clusters

Number of clusters obtained by AAclust.

Type:

int

labels_

Cluster labels in the order of samples in X.

Type:

array-like, shape (n_samples)

centers_

Average scale values corresponding to each cluster.

Type:

array-like, shape (n_clusters, n_features)

labels_centers_

Cluster labels for each cluster center.

Type:

array-like, shape (n_clusters)

medoids_

Representative samples, one for each cluster.

Type:

array-like, shape (n_clusters, n_features)

labels_medoids_

Cluster labels for each medoid.

Type:

array-like, shape (n_clusters)

is_medoid_

Array indicating samples being medoids (1) or not (0). Same order as labels_.

Type:

array-like, shape (n_samples)

medoid_names_

Names of the medoids. Set if names is provided to .fit.

Type:

list

Parameters:

Methods

comp_centers(X[, labels])

Computes the center of each cluster based on the given labels.

comp_correlation(X[, labels, X_ref, ...])

Computes the Pearson correlation of given data with reference data.

comp_coverage([names, names_ref])

Computes the percentage of unique names from names that are present in names_ref.

comp_medoids(X[, labels, metric])

Computes the medoid of each cluster based on the given labels.

eval(X[, list_labels, names_datasets])

Evaluates the quality of different clustering results.

filter_coverage(X[, scale_ids, names_ref, ...])

Select a redundancy-reduced set of numerical scales with defined subcategory coverage.

fit(X[, n_clusters, on_center, min_th, ...])

Applies AAclust algorithm to feature matrix (X).

name_clusters(X[, labels, names, shorten_names])

Assigns names to clusters based on the frequency of names.

__init__(model_class=<class 'sklearn.cluster._kmeans.KMeans'>, model_kwargs=None, verbose=True, random_state=None)[source]
Parameters:
  • model_class (Type[ClusterMixin], default=KMeans) – A clustering model class with n_clusters parameter.

  • model_kwargs (dict, optional) – Keyword arguments to pass to the selected clustering model.

  • verbose (bool, default=True) – If True, verbose outputs are enabled.

  • random_state (int, optional) – The seed used by the random number generator. If a positive integer, results of stochastic processes are consistent, enabling reproducibility. If None, stochastic processes will be truly random.

Notes

  • All attributes are set during fitting via the AAclust.fit() method and can be directly accessed.

  • AAclust is designed primarily for amino acid scales but can be used for any set of numerical indices.

See also

Examples

The AAclust clustering wrapper class can utilize any clustering model that uses the n_clusters parameter:

from sklearn.cluster import (KMeans, AgglomerativeClustering, MiniBatchKMeans, SpectralClustering)
import aaanalysis as aa

# AAclust with KMens (default)
aac = aa.AAclust(model_class=KMeans)
# AAclust with MiniBatchKMeans
aac = aa.AAclust(model_class=MiniBatchKMeans)
# AAclust with SpectralClustering
aac = aa.AAclust(model_class=SpectralClustering)

The hierarchical agglomerative clustering model utilizes four different linkage measures, which can be provided to AAclustby its model_kwargs parameter:

# AAclust using AgglomerativeClustering with Euclidean distance
aac = aa.AAclust(model_class=AgglomerativeClustering, model_kwargs=dict(linkage='average'))
# Other linkage methods are 'ward', 'complete', and 'single'