aaanalysis.AAclustPlot.medoids

AAclustPlot.medoids(X, labels=None, component_x=1, component_y=2, metric='euclidean', ax=None, figsize=(7, 6), legend=True, dot_size=100, dot_alpha=0.75, palette=None)[source]

PCA plot of clustering with medoids highlighted

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Feature matrix. Rows typically correspond to scales and columns to amino acids.

  • labels (array-like, shape (n_samples,)) – Cluster labels for each sample in X. If None, no grouping is used.

  • component_x (int, default=1) – Index of the PCA component for the x-axis. Must be >= 1.

  • component_y (int, default=1) – Index of the PCA component for the y-axis. Must be >= 1.

  • metric ({'correlation', 'euclidean', 'manhattan', 'cosine'}, default='euclidean') –

    The distance metric for calculating medoid.

    • correlation: Pearson correlation (maximum)

    • euclidean: Euclidean distance (minimum)

    • manhattan: Manhattan distance (minimum)

    • cosine: Cosine distance (minimum)

  • ax (plt.Axes, optional) – Pre-defined Axes object to plot on. If None, a new Axes object is created.

  • figsize (tuple, default=(7, 6)) – Figure dimensions (width, height) in inches.

  • legend (bool, default=True) – Whether to show the legend.

  • dot_size (int, default=100) – Size of the plotted dots.

  • dot_alpha (float or int, default=0.75) – The transparency alpha value [0-1] of the plotted dots.

  • palette (list, optional) – Colormap for the labels or list of colors. If None, a default colormap is used.

Returns:

  • ax (plt.Axes) – PCA plot axes object.

  • df_components (pd.DataFrame) – DataFrame with the PCA components.

Notes

  • Ensure X and labels are in the same order to avoid mislabeling.

See also

Examples

We first create an example dataset for the AAclustPlot().medoids(), which visualizes ‘medoids’ as obtained by the AAclust().comp_medoids() method:

from sklearn.decomposition import PCA, KernelPCA, FastICA, TruncatedSVD, NMF
import matplotlib.pyplot as plt
import aaanalysis as aa
aa.options["verbose"] = False
# Obtain example scale dataset
df_scales = aa.load_scales()
X = df_scales.T
# Fit AAclust model retrieve labels to compute centers
aac = aa.AAclust()
labels = aac.fit(X, n_clusters=5).labels_

All data points are visualized in the PCA plot including the cluster representative samples (‘medoids’) highlighted by a bigger dot:

aac_plot = aa.AAclustPlot(model_class=PCA)
aa.plot_settings()
ax, df_components = aac_plot.medoids(X, labels=labels)
plt.show()
# DataFrame for respective components are returned
aa.display_df(df_components, n_rows=10, show_shape=True)
../_images/aac_plot_medoids_1_output_3_0.png
DataFrame shape: (586, 2)
  PC1 (33.6%) PC2 (17.7%)
1 -0.181292 0.579504
2 0.823876 -0.591823
3 0.723627 -0.838029
4 0.860664 -0.746315
5 0.645413 0.481089
6 1.266436 -0.148832
7 -0.753006 0.412799
8 -1.074425 0.348078
9 0.501059 0.261917
10 1.304114 -0.139382

Select other PCs using the component_x and component_y parameters:

aac_plot.medoids(X, labels=labels, component_x=3, component_y=4)
plt.show()
../_images/aac_plot_medoids_2_output_5_0.png

Medoids can be obtained using different kinds of metric. To compare them in a single plot, you can use the ax and legend parameters:

list_metrics = ["correlation", "euclidean", "manhattan", "cosine"]
fig, axes = plt.subplots(4, 1, figsize=(7, 14), sharex=True, sharey=True)
for i, metric in enumerate(list_metrics):
    ax = axes[i]
    # Set legend only for first subplot
    aac_plot.medoids(X, labels=labels, ax=ax, legend=i==0, metric=metric)
    ax.set_title(metric)
plt.tight_layout()
plt.show()
plt.close()
../_images/aac_plot_medoids_3_output_7_0.png

Adjust the style of the scatter plot using the dot_size and dot_alpha arguments to change the size of the dots and their transparency:

aac_plot = aa.AAclustPlot(model_class=PCA)
aac_plot.medoids(X, labels=labels, dot_size=50, dot_alpha=1)
plt.show()
../_images/aac_plot_medoids_4_output_9_0.png

The cluster colors can be adjusted by the palette argument by providing either a list of colors or a color map:

colors = aa.plot_get_clist(n_colors=5)
aac_plot.medoids(X, labels=labels, palette=colors)
plt.show()
aac_plot.medoids(X, labels=labels, palette="viridis")
plt.show()
../_images/aac_plot_medoids_5_output_11_0.png ../_images/aac_plot_medoids_6_output_11_1.png