aaanalysis.AAclust.comp_medoids
- static AAclust.comp_medoids(X, labels=None, metric='correlation')[source]
Computes the medoid of each cluster based on the given labels.
- Parameters:
X (array-like, shape (n_samples, n_features)) – Feature matrix. Rows typically correspond to scales and columns to amino acids.
labels (array-like, shape (n_samples,)) – Cluster labels for each sample in
X.metric ({'correlation', 'euclidean', 'manhattan', 'cosine'}, default='correlation') –
Similarity measure used to obtain medoids:
correlation: Pearson correlation (maximum)euclidean: Euclidean distance (minimum)manhattan: Manhattan distance (minimum)cosine: Cosine distance (minimum)
- Returns:
medoids (array-like, shape (n_clusters,)) – The medoid for each cluster.
labels_medoids (array-like, shape (n_clusters,)) – The labels corresponding to each medoid.
Examples
The representative samples for each cluster (called ‘medoids’) can be obtained using the
AAclust().comp_medoids()method:import aaanalysis as aa import pandas as pd # Create example dataset comprising 100 scales df_scales = aa.load_scales().T.sample(100).T X = df_scales.T # Fit AAclust model and obtain clustering centers for 5 clusters aac = aa.AAclust() labels = aac.fit(X, n_clusters=5).labels_ centers, labels_centers = aac.comp_medoids(X=X, labels=labels) # Create DataFrame with cluster centers columns = [f"Cluster {i}" for i in labels_centers] df_medoids = pd.DataFrame(centers.T, columns=columns, index=df_scales.index) aa.display_df(df_medoids)Cluster 4 Cluster 2 Cluster 1 Cluster 3 Cluster 0 AA A 0.863000 0.908000 0.174000 0.100000 0.616000 C 0.589000 0.454000 0.115000 0.200000 0.680000 D 0.619000 0.532000 1.000000 0.400000 0.028000 E 0.495000 0.979000 0.802000 0.500000 0.043000 F 0.463000 0.688000 0.068000 0.700000 1.000000 G 1.000000 0.135000 0.328000 0.000000 0.501000 H 0.269000 0.553000 0.650000 0.600000 0.165000 I 0.563000 0.582000 0.000000 0.400000 0.943000 K 0.439000 0.716000 0.809000 0.500000 0.283000 L 0.577000 0.759000 0.012000 0.400000 0.943000 M 0.453000 1.000000 0.122000 0.400000 0.738000 N 0.535000 0.248000 0.648000 0.400000 0.236000 P 0.850000 0.000000 0.692000 0.300000 0.711000 Q 0.452000 0.723000 0.724000 0.500000 0.251000 R 0.332000 0.652000 0.778000 0.700000 0.000000 S 0.678000 0.284000 0.352000 0.200000 0.359000 T 0.611000 0.433000 0.274000 0.300000 0.450000 V 0.638000 0.482000 0.071000 0.300000 0.825000 W 0.000000 0.567000 0.220000 1.000000 0.878000 Y 0.232000 0.319000 0.313000 0.800000 0.880000