aaanalysis.CPPPlot

class aaanalysis.CPPPlot(df_scales=None, df_cat=None, jmd_n_len=10, jmd_c_len=10, accept_gaps=False, verbose=True)[source]

Bases: object

Plotting class for CPP (Comparative Physicochemical Profiling) results [Breimann25a].

This class supports multiple plot types for group or sample-level analysis, including ranking plots, profiles, heatmaps, and feature maps.

Added in version 0.1.2.

Parameters:

Methods

eval([df_eval, figsize, dict_xlims, legend, ...])

Plot evaluation output of CPP comparing multiple sets of identified feature sets.

feature([feature, df_seq, labels, ...])

Plot distributions of CPP feature values for test and reference datasets highlighting their mean difference.

feature_map([df_feat, col_cat, col_val, ...])

Plot CPP feature map showing feature value mean difference and feature importance per scale subcategory (y-axis) and residue position (x-axis).

heatmap([df_feat, shap_plot, col_cat, ...])

Plot a CPP/-SHAP heatmap showing the feature value mean difference/feature impact per scale subcategory (y-axis) and residue position (x-axis).

profile([df_feat, shap_plot, col_imp, ...])

Plot CPP/-SHAP profile showing feature importance/impact per residue position.

ranking([df_feat, shap_plot, col_dif, ...])

Plot CPP/-SHAP feature ranking based on feature importance or sample-specif feature impact.

update_seq_size([ax, fig, max_x_dist, ...])

Update the font size of the sequence characters to prevent overlap.

__init__(df_scales=None, df_cat=None, jmd_n_len=10, jmd_c_len=10, accept_gaps=False, verbose=True)[source]
Parameters:
  • df_scales (pd.DataFrame, shape (n_letters, n_scales), optional) – DataFrame of scales with letters typically representing amino acids. Default from load_scales() unless specified in options['df_scales'].

  • df_cat (pd.DataFrame, shape (n_scales, n_scales_info), optional) – DataFrame of categories for physicochemical scales. Must contain all scales from df_scales. Default from load_scales() with name='scales_cat', unless specified in options['df_cat'].

  • jmd_n_len (int, default=10) – Length of JMD-N (>=0).

  • jmd_c_len (int, default=10) – Length of JMD-C (>=0).

  • accept_gaps (bool, default=False) – Whether to accept missing values by enabling omitting for computations (if True).

  • verbose (bool, default=True) – If True, verbose outputs are enabled.

Notes

Several methods provide the shap_plot parameter, which allows to specify whether a plot visualizes the results of the group-level CPP analysis or the sample-level CPP-SHAP analysis (if shap_plot=True).

  • CPP Analysis: Group-level analysis of the most discriminant features between a test and a reference group. The overall results are visualized by the CPPPlot.feature_map(), revealing the characteristic physicochemical signature of the test group.

  • CPP-SHAP Analysis: Sample-level analysis of the CPP feature impact with single-residue resolution.

The methods showing the CPP and CPP-SHAP analysis results are as follows:

  • CPPPlot.ranking(): the ‘CPP/-SHAP ranking plot’ shows the top n ranked features, their feature value differences, and feature importance/impact.

  • CPPPlot.profile(): the ‘CPP/-SHAP profile’ shows the cumulative feature importance/impact per residue position.

  • CPPPlot.heatmap(): the ‘CPP/-SHAP heatmap’ shows the feature value difference or feature impact per residue position (x-axis) and scale subcategory (y-axis).

See also

  • CPP: the respective computation class for the CPP Analysis.

  • ShapModel: the class combining CPP with the SHAP explainable Artificial Intelligence (AI) framework.

  • Anatomy of a figure matplotlib guide on figure elements.

Examples

The CPPPlot object offers various visualizations of the feature obtained CPP and can be instantiated without setting any parameter:

import aaanalysis as aa
cpp_plot = aa.CPPPlot()

If the used Scales are deviating from the AAanalysis default ones, you should provide matching df_scales and df_cat DataFrames:

import pandas as pd
df_scales = pd.DataFrame({"Scale 1": [1, 1, 2]}, index=["A", "B", "C"])
df_cat = pd.DataFrame({"scale_id": ["Scale 1"], "category": ["Category 1"],
                       "subcategory": ["subcategory 1"], "scale_name": ["Scale_name 1"]})
cpp_plot = aa.CPPPlot(df_scales=df_scales, df_cat=df_cat)

Adjust the length of the N- and C-terminal JMDs by using jmd_n_len and jmd_c_len parameters:

cpp_plot = aa.CPPPlot(jmd_c_len=0, jmd_n_len=20)