aaanalysis.AAclustPlot.centers

AAclustPlot.centers(X, labels=None, component_x=1, component_y=2, ax=None, figsize=(7, 6), legend=True, dot_size=100, dot_alpha=0.75, palette=None)[source]

PCA plot of clustering with centers highlighted

Parameters:
  • X (array-like, shape (n_samples, n_features)) – Feature matrix. Rows typically correspond to scales and columns to amino acids.

  • labels (array-like, shape (n_samples,)) – Cluster labels for each sample in X. If None, no grouping is used.

  • component_x (int, default=1) – Index of the PCA component for the x-axis. Must be >= 1.

  • component_y (int, default=2) – Index of the PCA component for the y-axis. Must be >= 1.

  • ax (plt.Axes, optional) – Pre-defined Axes object to plot on. If None, a new Axes object is created.

  • figsize (tuple, default=(7, 6)) – Figure dimensions (width, height) in inches.

  • legend (bool, default=True) – Whether to show the legend.

  • dot_size (int, default=100) – Size of the plotted dots.

  • dot_alpha (float or int, default=0.75) – The transparency alpha value [0-1] of the plotted dots.

  • palette (list, optional) – Colormap for the labels or list of colors. If None, a default colormap is used.

Returns:

  • ax (plt.Axes) – PCA plot axes object.

  • df_components (pd.DataFrame) – DataFrame with the PCA components.

Notes

  • Ensure X and labels are in the same order to avoid mislabeling.

See also

Examples

We first create an example dataset for the AAclustPlot().centers() method, which visualizes cluster ‘centers’ as obtained by the AAclust().comp_centers() method:

from sklearn.decomposition import PCA, KernelPCA, FastICA, TruncatedSVD, NMF
import matplotlib.pyplot as plt
import aaanalysis as aa
aa.options["verbose"] = False
# Obtain example scale dataset
df_scales = aa.load_scales()
X = df_scales.T
# Fit AAclust model retrieve labels to compute centers
aac = aa.AAclust()
labels = aac.fit(X, n_clusters=5).labels_

All data points are visualized in the PCA plot including the cluster centers highlighted by an ‘x’:

aac_plot = aa.AAclustPlot(model_class=PCA)
aa.plot_settings()
ax, df_components = aac_plot.centers(X, labels=labels)
plt.show()
# DataFrame for respective components are returned
aa.display_df(df_components, n_rows=10, show_shape=True)
../_images/aac_plot_centers_1_output_3_0.png
DataFrame shape: (586, 2)
  PC1 (33.6%) PC2 (17.7%)
1 -0.181292 0.579504
2 0.823876 -0.591823
3 0.723627 -0.838029
4 0.860664 -0.746315
5 0.645413 0.481089
6 1.266436 -0.148832
7 -0.753006 0.412799
8 -1.074425 0.348078
9 0.501059 0.261917
10 1.304114 -0.139382

Select other PCs using the component_x and component_y parameters:

aac_plot.centers(X, labels=labels, component_x=3, component_y=4)
plt.show()
../_images/aac_plot_centers_2_output_5_0.png

To compare the feature space compression of different Transformer models in a single plot, you can use the ax and legend parameters:

list_models = [KernelPCA, FastICA, TruncatedSVD, NMF]
model_names = ["KernelPCA", "FastICA", "TruncatedSVD", "NMF"]
dict_models = dict(zip(model_names, list_models))
fig, axes = plt.subplots(4, 1, figsize=(7, 14))
for i, model_name in enumerate(dict_models):
    ax = axes[i]
    aac_plot = aa.AAclustPlot(model_class=dict_models[model_name])
    # Set legend only for first subplot
    aac_plot.centers(X, labels=labels, ax=ax, legend=i==0)
plt.tight_layout()
plt.show()
plt.close()
../_images/aac_plot_centers_3_output_7_0.png

Adjust the style of the scatter plot using the dot_size and dot_alpha arguments to change the size of the dots and their transparency:

aac_plot = aa.AAclustPlot(model_class=PCA)
aac_plot.centers(X, labels=labels, dot_size=50, dot_alpha=1)
plt.show()
../_images/aac_plot_centers_4_output_9_0.png

The cluster colors can be adjusted by the palette argument by providing either a list of colors or a color map:

colors = aa.plot_get_clist(n_colors=5)
aac_plot.centers(X, labels=labels, palette=colors)
plt.show()
aac_plot.centers(X, labels=labels, palette="viridis")
plt.show()
../_images/aac_plot_centers_5_output_11_0.png ../_images/aac_plot_centers_6_output_11_1.png