aaanalysis.AAclust.comp_medoids

static AAclust.comp_medoids(X, labels=None, metric='correlation')[source]

Computes the medoid of each cluster based on the given labels.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Feature matrix. Rows typically correspond to scales and columns to amino acids.

  • labels (array-like, shape (n_samples,)) – Cluster labels for each sample in X.

  • metric ({'correlation', 'euclidean', 'manhattan', 'cosine'}, default='correlation') –

    Similarity measure used to obtain medoids:

    • correlation: Pearson correlation (maximum)

    • euclidean: Euclidean distance (minimum)

    • manhattan: Manhattan distance (minimum)

    • cosine: Cosine distance (minimum)

Returns:

  • medoids (array-like, shape (n_clusters,)) – The medoid for each cluster.

  • labels_medoids (array-like, shape (n_clusters,)) – The labels corresponding to each medoid.

Examples

The representative samples for each cluster (called ‘medoids’) can be obtained using the AAclust().comp_medoids() method:

import aaanalysis as aa
import pandas as pd
# Create example dataset comprising 100 scales
df_scales = aa.load_scales().T.sample(100).T
X = df_scales.T

# Fit AAclust model and obtain clustering centers for 5 clusters
aac = aa.AAclust()
labels = aac.fit(X, n_clusters=5).labels_
centers, labels_centers = aac.comp_medoids(X=X, labels=labels)

# Create DataFrame with cluster centers
columns = [f"Cluster {i}" for i in labels_centers]
df_medoids = pd.DataFrame(centers.T, columns=columns, index=df_scales.index)
aa.display_df(df_medoids)
  Cluster 4 Cluster 2 Cluster 1 Cluster 3 Cluster 0
AA          
A 0.863000 0.908000 0.174000 0.100000 0.616000
C 0.589000 0.454000 0.115000 0.200000 0.680000
D 0.619000 0.532000 1.000000 0.400000 0.028000
E 0.495000 0.979000 0.802000 0.500000 0.043000
F 0.463000 0.688000 0.068000 0.700000 1.000000
G 1.000000 0.135000 0.328000 0.000000 0.501000
H 0.269000 0.553000 0.650000 0.600000 0.165000
I 0.563000 0.582000 0.000000 0.400000 0.943000
K 0.439000 0.716000 0.809000 0.500000 0.283000
L 0.577000 0.759000 0.012000 0.400000 0.943000
M 0.453000 1.000000 0.122000 0.400000 0.738000
N 0.535000 0.248000 0.648000 0.400000 0.236000
P 0.850000 0.000000 0.692000 0.300000 0.711000
Q 0.452000 0.723000 0.724000 0.500000 0.251000
R 0.332000 0.652000 0.778000 0.700000 0.000000
S 0.678000 0.284000 0.352000 0.200000 0.359000
T 0.611000 0.433000 0.274000 0.300000 0.450000
V 0.638000 0.482000 0.071000 0.300000 0.825000
W 0.000000 0.567000 0.220000 1.000000 0.878000
Y 0.232000 0.319000 0.313000 0.800000 0.880000