aaanalysis.AAclust.name_clusters
- static AAclust.name_clusters(X, labels=None, names=None, shorten_names=True)[source]
Assigns names to clusters based on the frequency of names.
Names with higher frequency are prioritized. If a name is already assigned to a cluster, or the cluster contains one sample, its name is set to ‘Unclassified’.
- Parameters:
X (array-like, shape (n_samples, n_features)) – Feature matrix. Rows typically correspond to scales and columns to amino acids.
labels (array-like, shape (n_samples,)) – Cluster labels for each sample in
X.names (list of str) – List of sample names corresponding to
X.shorten_names (bool, default=True) – If
True, shorten version of the names will be used.
- Returns:
cluster_names – A list of renamed clusters based on names.
- Return type:
Examples
We first create an example dataset of 100 scales and obtain their
AAontolgysubcategory names to showcase the automatic cluster naming byAAclust().name_clusters()method:import aaanalysis as aa # Create example dataset comprising 100 scales df_scales = aa.load_scales().T.sample(100).T X = df_scales.T df_cat = aa.load_scales(name="scales_cat") dict_scale_name = dict(zip(df_cat["scale_id"], df_cat["subcategory"])) names = [dict_scale_name[s] for s in list(df_scales)] # Fit AAclust model and obtain clustering label for 10 clusters aac = aa.AAclust() aac.fit(X, n_clusters=7) labels = aac.labels_
We can now provide the feature matrix
X,names, andlabelsto theAAclust().name_clusters()method:cluster_names = aac.name_clusters(X, labels=labels, names=names) print("Name of clusters:\n", list(sorted(set(cluster_names))))Name of clusters: ['Accessible surface area', 'Buried', 'Hydrophobicity', 'Side chain length', 'α-helix', 'α-helix (α-proteins)', 'β-turn']
These names are automatically shorten, which can be disabled by setting
shorten_names=False:cluster_names = aac.name_clusters(X, labels=labels, names=names, shorten_names=False) print("Longer names:\n", list(sorted(set(cluster_names))))Longer names: ['AA composition', 'Accessible surface area (ASA)', 'Buried', 'Side chain length', 'α-helix', 'β-sheet', 'β-turn']