AAclust.comp_medoids

static AAclust.comp_medoids(X, labels, metric='correlation')[source]

Computes the medoid of each cluster based on the given labels.

For each cluster, the medoid is the sample closest to the cluster center as measured by the chosen metric (default Pearson correlation) [Breimann24a]. Use this method independently of AAclust.fit() when cluster labels are already available.

Added in version 0.1.0.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Feature matrix. Rows typically correspond to scales and columns to amino acids.

  • labels (array-like, shape (n_samples,)) – Cluster labels for each sample in X.

  • metric ({'correlation', 'euclidean', 'manhattan', 'cosine'}, default='correlation') –

    Similarity measure used to obtain medoids:

    • correlation: Pearson correlation (maximum)

    • euclidean: Euclidean distance (minimum)

    • manhattan: Manhattan distance (minimum)

    • cosine: Cosine distance (minimum)

Returns:

  • medoids (array-like, shape (n_clusters, n_features)) – The medoid for each cluster.

  • labels_medoids (array-like, shape (n_clusters,)) – The labels corresponding to each medoid.

Examples

The representative samples for each cluster (called ‘medoids’) can be obtained using the AAclust().comp_medoids() method:

import aaanalysis as aa
import pandas as pd
# Create example dataset comprising 100 scales
df_scales = aa.load_scales().T.sample(100).T
X = df_scales.T

# Fit AAclust model and obtain clustering centers for 5 clusters
aac = aa.AAclust()
labels = aac.fit(X, n_clusters=5).labels_
centers, labels_centers = aac.comp_medoids(X=X, labels=labels)

# Create DataFrame with cluster centers
columns = [f"Cluster {i}" for i in labels_centers]
df_medoids = pd.DataFrame(centers.T, columns=columns, index=df_scales.index)
aa.display_df(df_medoids)
  Cluster 4 Cluster 2 Cluster 1 Cluster 3 Cluster 0
AA          
A 0.863000 0.908000 0.174000 0.100000 0.616000
C 0.589000 0.454000 0.115000 0.200000 0.680000
D 0.619000 0.532000 1.000000 0.400000 0.028000
E 0.495000 0.979000 0.802000 0.500000 0.043000
F 0.463000 0.688000 0.068000 0.700000 1.000000
G 1.000000 0.135000 0.328000 0.000000 0.501000
H 0.269000 0.553000 0.650000 0.600000 0.165000
I 0.563000 0.582000 0.000000 0.400000 0.943000
K 0.439000 0.716000 0.809000 0.500000 0.283000
L 0.577000 0.759000 0.012000 0.400000 0.943000
M 0.453000 1.000000 0.122000 0.400000 0.738000
N 0.535000 0.248000 0.648000 0.400000 0.236000
P 0.850000 0.000000 0.692000 0.300000 0.711000
Q 0.452000 0.723000 0.724000 0.500000 0.251000
R 0.332000 0.652000 0.778000 0.700000 0.000000
S 0.678000 0.284000 0.352000 0.200000 0.359000
T 0.611000 0.433000 0.274000 0.300000 0.450000
V 0.638000 0.482000 0.071000 0.300000 0.825000
W 0.000000 0.567000 0.220000 1.000000 0.878000
Y 0.232000 0.319000 0.313000 0.800000 0.880000

Further parameters. AAclust.comp_medoids also accepts: metric — Similarity measure used to obtain medoids: - correlation: Pearson correlation (maximum) - euclidean: Euclidean distance (minimum) - manhattan: Manhattan distance (minimum) - cosine: Cosine distance (minimum).