API
This Application Programming Interface (API) is the public interface for the objects and functions of our AAanalysis Python toolkit, which can be imported by:
import aaanalysis as aa
You can then access all methods and objects via the aa alias, such as aa.load_dataset.
Data Handling
|
Load protein benchmarking datasets. |
|
Load amino acid scales or their classification (AAontology). |
|
Load feature sets for protein benchmarking datasets. |
|
Read a FASTA file into a DataFrame. |
|
Write sequence DataFrame to a FASTA file. |
Preprocessing class for representing protein sequences as numeric inputs [Breimann25]. |
|
|
Preprocessing class ([pro], requires |
|
Preprocessing class for protein language model (PLM) embeddings. |
|
Preprocessing class ([pro], requires |
|
Concatenate multiple per-residue |
Sequence Analysis
|
Amino Acid logo (AAlogo) class for computing sequence logo matrices and conservation scores. |
|
Amino Acid logo Plot (AAlogoPlot) class for visualizing sequence logos. |
|
Utility class for sampling amino-acid windows / segments from full protein sequences. |
|
Compute pairwise similarity between two or more sequences ([pro], requires |
|
Redundancy reduction of sequences using clustering-based algorithms ([pro], requires |
|
Scan candidate proteins for statistically significant Position Weight Matrix (PWM) occurrences using FIMO ([pro], requires |
Feature Engineering
|
Amino Acid clustering (AAclust) class: A k-optimized clustering wrapper for selecting redundancy-reduced sets of numerical scales [Breimann24a]. |
|
Plotting class for |
|
Utility feature engineering class using sequences to create |
Utility feature engineering class to process and filter numerical data structures, such as amino acid scales or a feature matrix. |
|
|
Comparative Physicochemical Profiling (CPP) class to create and filter features that are most discriminant between two sets of sequences [Breimann25]. |
|
Grid-style sweep over Comparative Physicochemical Profiling (CPP) configurations (Tool) [Breimann25]. |
|
Plotting class for |
PU Learning
|
Deterministic Positive-Unlabeled Learning (dPULearn) class for identifying reliable negatives from unlabeled data [Breimann25]. |
Plotting class for |
Explainable AI
|
Tree Model class: A wrapper for tree-based models to obtain Monte Carlo estimates of feature importance and predictions [Breimann25]. |
|
SHAP Model class ([pro], requires |
Protein Design
|
Amino Acid Mutator (AAMut) class for analyzing the physicochemical impact of amino acid substitutions on property scales [Breimann24a]. |
|
Plotting class for |
|
Sequence Mutator (SeqMut) class for CPP-guided sequence mutation and ΔCPP analysis [Breimann24a]. |
|
Plotting class for |
Utility Functions
|
Compute an adjusted Area Under the Curve (AUC) [-0.5, 0.5] assessing the similarity between two groups. |
|
Compute an adjusted Bayesian Information Criterion (BIC) (-∞, ∞) for assessing clustering quality. |
|
Compute a percentile bootstrap Confidence Interval (CI) of the mean. |
|
Compute pooled detection metrics at a fixed score threshold. |
|
Compute the Kullback-Leibler Divergence (KLD) [0, ∞) for assessing the similarity between two groups. |
|
Compute per-protein average precision (AP) for windowed site prediction. |
|
Smooth a per-residue score vector with a NaN-aware, peak-preserving kernel. |
|
Display DataFrame with specific style as HTML output for jupyter notebooks. |
A class for managing system-level settings for AAanalysis. |
|
|
Get the current font size (or axes linewidth). |
|
Get color dictionaries specified for AAanalysis. |
|
Get a manually curated list of 2 to 9 colors or 'husl' palette for more than 9 colors. |
|
Get colormaps specified for AAanalysis. |
|
Set an independently customizable plot legend. |
|
Plot a per-protein rank scatter: max-score-per-protein sorted by score, colored by group. |
|
Configure general plot settings. |