AAclust.comp_medoids
- static AAclust.comp_medoids(X, labels, metric='correlation')[source]
Computes the medoid of each cluster based on the given labels.
For each cluster, the medoid is the sample closest to the cluster center as measured by the chosen
metric(default Pearson correlation) [Breimann24a]. Use this method independently ofAAclust.fit()when cluster labels are already available.Added in version 0.1.0.
- Parameters:
X (array-like, shape (n_samples, n_features)) – Feature matrix. Rows typically correspond to scales and columns to amino acids.
labels (array-like, shape (n_samples,)) – Cluster labels for each sample in
X.metric ({'correlation', 'euclidean', 'manhattan', 'cosine'}, default='correlation') –
Similarity measure used to obtain medoids:
correlation: Pearson correlation (maximum)euclidean: Euclidean distance (minimum)manhattan: Manhattan distance (minimum)cosine: Cosine distance (minimum)
- Returns:
medoids (array-like, shape (n_clusters, n_features)) – The medoid for each cluster.
labels_medoids (array-like, shape (n_clusters,)) – The labels corresponding to each medoid.
Examples
The representative samples for each cluster (called ‘medoids’) can be obtained using the
AAclust().comp_medoids()method:import aaanalysis as aa import pandas as pd # Create example dataset comprising 100 scales df_scales = aa.load_scales().T.sample(100).T X = df_scales.T # Fit AAclust model and obtain clustering centers for 5 clusters aac = aa.AAclust() labels = aac.fit(X, n_clusters=5).labels_ centers, labels_centers = aac.comp_medoids(X=X, labels=labels) # Create DataFrame with cluster centers columns = [f"Cluster {i}" for i in labels_centers] df_medoids = pd.DataFrame(centers.T, columns=columns, index=df_scales.index) aa.display_df(df_medoids)Cluster 4 Cluster 2 Cluster 1 Cluster 3 Cluster 0 AA A 0.863000 0.908000 0.174000 0.100000 0.616000 C 0.589000 0.454000 0.115000 0.200000 0.680000 D 0.619000 0.532000 1.000000 0.400000 0.028000 E 0.495000 0.979000 0.802000 0.500000 0.043000 F 0.463000 0.688000 0.068000 0.700000 1.000000 G 1.000000 0.135000 0.328000 0.000000 0.501000 H 0.269000 0.553000 0.650000 0.600000 0.165000 I 0.563000 0.582000 0.000000 0.400000 0.943000 K 0.439000 0.716000 0.809000 0.500000 0.283000 L 0.577000 0.759000 0.012000 0.400000 0.943000 M 0.453000 1.000000 0.122000 0.400000 0.738000 N 0.535000 0.248000 0.648000 0.400000 0.236000 P 0.850000 0.000000 0.692000 0.300000 0.711000 Q 0.452000 0.723000 0.724000 0.500000 0.251000 R 0.332000 0.652000 0.778000 0.700000 0.000000 S 0.678000 0.284000 0.352000 0.200000 0.359000 T 0.611000 0.433000 0.274000 0.300000 0.450000 V 0.638000 0.482000 0.071000 0.300000 0.825000 W 0.000000 0.567000 0.220000 1.000000 0.878000 Y 0.232000 0.319000 0.313000 0.800000 0.880000 Further parameters.
AAclust.comp_medoidsalso accepts:metric— Similarity measure used to obtain medoids: -correlation: Pearson correlation (maximum) -euclidean: Euclidean distance (minimum) -manhattan: Manhattan distance (minimum) -cosine: Cosine distance (minimum).