API

This Application Programming Interface (API) is the public interface for the building blocks of our AAanalysis Python toolkit: the explicit objects and functions, imported by:

import aaanalysis as aa

You can then access all methods and objects via the aa alias, such as aa.load_dataset.

For the high-level, one-call golden pipelines (aap) that chain these building blocks into complete workflows, see the API (Pipelines) reference.

Data Handling

`load_dataset`([name, n, random, ...])	Load protein benchmarking datasets.
`load_scales`([name, just_aaindex, ...])	Load amino acid scales or their classification (AAontology).
`load_features`([name])	Load feature sets for protein benchmarking datasets.
`read_fasta`(file_path[, col_id, col_seq, ...])	Read a FASTA file into a DataFrame.
`to_fasta`(df_seq, file_path[, col_id, ...])	Write sequence DataFrame to a FASTA file.
`SequencePreprocessor`()	Preprocessing class for representing protein sequences as numeric inputs [Breimann25].
`StructurePreprocessor`([verbose])	Preprocessing class ([pro], requires `aaanalysis[pro]`) for protein structure features (PDB / CIF / AlphaFold).
`EmbeddingPreprocessor`([verbose])	Preprocessing class for protein language model (PLM) embeddings.
`AnnotationPreprocessor`([verbose])	Preprocessing class ([pro], requires `aaanalysis[pro]`) for per-residue post-translational modification (PTM) / functional-site annotations.
`combine_dict_nums`(dict_nums)	Concatenate multiple per-residue `dict_num` inputs along the D axis.

Sequence Analysis

`AAlogo`([logo_type])	Amino Acid logo (AAlogo) class for computing sequence logo matrices and conservation scores.
`AAlogoPlot`([logo_type, jmd_n_len, ...])	Amino Acid logo Plot (AAlogoPlot) class for visualizing sequence logos.
`AAWindowSampler`([verbose, random_state, ...])	Utility class for sampling amino-acid windows / segments from full protein sequences.
`comp_seq_sim`([seq1, seq2, df_seq])	Compute pairwise similarity between two or more sequences ([pro], requires `aaanalysis[pro]`).
`filter_seq`(df_seq[, method, ...])	Redundancy reduction of sequences using clustering-based algorithms ([pro], requires `aaanalysis[pro]`).
`scan_motif`(df_seq[, pos_col, n, ...])	Scan candidate proteins for statistically significant Position Weight Matrix (PWM) occurrences using FIMO ([pro], requires `aaanalysis[pro]`).

Feature Engineering

`AAclust`([model_class, model_kwargs, ...])	Amino Acid clustering (AAclust) class: A k-optimized clustering wrapper for selecting redundancy-reduced sets of numerical scales [Breimann24a].
`AAclustPlot`([model_class, model_kwargs, ...])	Plotting class for `AAclust` (Amino Acid clustering) results, providing dimensionality-reduction scatter plots, correlation heatmaps, and clustering evaluation charts [Breimann24a].
`SequenceFeature`([verbose])	Utility feature engineering class using sequences to create `CPP` feature components (Parts, Splits, and Scales) and data structures [Breimann25].
`NumericalFeature`()	Utility feature engineering class to process and filter numerical data structures, such as amino acid scales or a feature matrix.
`CPP`(df_parts[, split_kws, df_scales, ...])	Comparative Physicochemical Profiling (CPP) class to create and filter features that are most discriminant between two sets of sequences [Breimann25].
`CPPGrid`(df_seq, labels[, dict_num, ...])	Grid-style sweep over Comparative Physicochemical Profiling (CPP) configurations (Tool) [Breimann25].
`CPPPlot`([df_scales, df_cat, jmd_n_len, ...])	Plotting class for `CPP` (Comparative Physicochemical Profiling) results [Breimann25].
`CPPStructurePlot`([jmd_n_len, jmd_c_len, ...])	Plotting class for painting `CPP` feature impact onto a 3D protein structure ([pro], requires `aaanalysis[pro]`) [Breimann25].

PU Learning

`dPULearn`([model_kwargs, verbose, random_state])	Deterministic Positive-Unlabeled Learning (dPULearn) class for identifying reliable negatives from unlabeled data [Breimann25].
`dPULearnPlot`()	Plotting class for `dPULearn` (deterministic Positive-Unlabeled Learning) results [Breimann25].

Explainable AI

`TreeModel`([list_model_classes, ...])	Tree Model class: A wrapper for tree-based models to obtain Monte Carlo estimates of feature importance and predictions [Breimann25].
`ShapModel`([explainer_class, ...])	SHAP Model class ([pro], requires `aaanalysis[pro]`): A wrapper for SHAP (SHapley Additive exPlanations) [Lundberg20] explainers to obtain Monte Carlo estimates for feature impact [Breimann25].

Prediction

`AAPred`([models, list_model_classes, ...])	AAPred: evaluate and deploy sequence-based prediction models (Wrapper) [Breimann25].
`AAPredPlot`()	Plotting class for `AAPred` evaluation and prediction results [Breimann25].
`ReliabilityModel`([verbose, random_state])	Assess how much to trust each prediction — the reliability of a score, not the score itself.
`ReliabilityModelPlot`()	Visualize `ReliabilityModel` outputs — calibration and the two trust axes.

Protein Engineering

`AAMut`([verbose, df_scales])	Amino Acid Mutator (AAMut) class for analyzing the physicochemical impact of amino acid substitutions on property scales [Breimann24a].
`AAMutPlot`([verbose, df_scales])	Plotting class for `AAMut` (Amino Acid Mutator) results [Breimann24a].
`SeqMut`([verbose, df_scales, model, target_class])	Sequence Mutator (SeqMut) class for CPP-guided sequence mutation and ΔCPP analysis [Breimann24a].
`SeqMutPlot`([verbose])	Plotting class for `SeqMut` (Sequence Mutator) results [Breimann24a].
`SeqOpt`([mode, model, target_class, ...])	Sequence Optimizer (SeqOpt) class for multi-objective directed evolution over sequence variants [Breimann24a].
`SeqOptPlot`([verbose])	Plotting class for `SeqOpt` (Sequence Optimizer) results [Breimann24a].

Utility Functions

`comp_auc_adjusted`(X, labels[, label_test, ...])	Compute an adjusted Area Under the Curve (AUC) [-0.5, 0.5] assessing the similarity between two groups.
`comp_bic_score`(X, labels)	Compute an adjusted Bayesian Information Criterion (BIC) (-∞, ∞) for assessing clustering quality.
`comp_bootstrap_ci`(values[, n_rounds, ci, seed])	Compute a percentile bootstrap Confidence Interval (CI) of the mean.
`comp_detection_metrics`(list_scores, ...[, ...])	Compute pooled detection metrics at a fixed score threshold.
`comp_kld`(X, labels[, label_test, label_ref])	Compute the Kullback-Leibler Divergence (KLD) [0, ∞) for assessing the similarity between two groups.
`comp_per_protein_ap`(list_scores, list_positions)	Compute per-protein average precision (AP) for windowed site prediction.
`comp_smooth_scores`(scores[, method, window, ...])	Smooth a per-residue score vector with a NaN-aware, peak-preserving kernel.
`display_df`([df, max_width_pct, max_height, ...])	Display DataFrame with specific style as HTML output for jupyter notebooks.
`options`	A class for managing system-level settings for AAanalysis.
`plot_gcfs`([option])	Get the current font size (or axes linewidth).
`plot_get_cdict`([name])	Get color dictionaries specified for AAanalysis.
`plot_get_clist`([n_colors, kind, cmap, ...])	Get a list of `n_colors` colors for a categorical, continuous, or diverging palette.
`plot_get_cmap`([name, n_colors, facecolor_dark])	Get colormaps specified for AAanalysis.
`plot_legend`([ax, dict_color, list_cat, ...])	Set an independently customizable plot legend.
`plot_settings`([font_scale, font, ...])	Configure general plot settings.