API
This Application Programming Interface (API) is the public interface for the objects and functions of our AAanalysis Python toolkit, which can be imported by:
import aaanalysis as aa
You can then access all methods and objects via the aa alias, such as aa.load_dataset.
Data Handling
|
Load protein benchmarking datasets. |
|
Load amino acid scales or their classification (AAontology). |
|
Load feature sets for protein benchmarking datasets. |
|
Read an FASTA file into a DataFrame. |
|
Write sequence DataFrame to a FASTA file. |
|
Compute pairwise similarity between two or more sequences. |
|
Redundancy reduction of sequences using clustering-based algorithms. |
Utility data preprocessing class to encode and represent protein sequences. |
Sequence Analysis
|
Amino Acid logo (AAlogo) class for computing sequence logo matrices and conservation scores. |
|
Amino Acid logo Plot (AAlogoPlot) class for visualizing sequence logos. |
|
Compute pairwise similarity between two or more sequences. |
|
Redundancy reduction of sequences using clustering-based algorithms. |
Feature Engineering
|
Amino Acid clustering (AAclust) class: A k-optimized clustering wrapper for selecting redundancy-reduced sets of numerical scales [Breimann24a]. |
|
Plotting class for |
|
Utility feature engineering class using sequences to create |
Utility feature engineering class to process and filter numerical data structures, such as amino acid scales or a feature matrix. |
|
|
Comparative Physicochemical Profiling (CPP) class to create and filter features that are most discriminant between two sets of sequences [Breimann25a]. |
|
Plotting class for |
PU Learning
|
Deterministic Positive-Unlabeled Learning (dPULearn) class for identifying reliable negatives from unlabeled data [Breimann25a]. |
Plotting class for |
Explainable AI
|
Tree Model class: A wrapper for tree-based models to obtain Monte Carlo estimates of feature importance and predictions [Breimann25a]. |
|
SHAP Model class: A wrapper for SHAP (SHapley Additive exPlanations) explainers to obtain Monte Carlo estimates for feature impact [Breimann25a]. |
Protein Design
|
UNDER CONSTRUCTION - Amino Acid Mutator (AAMut) class for analyzing the impact of amino acid substitutions on amino acid scales. |
|
UNDER CONSTRUCTION - Plotting class for |
|
UNDER CONSTRUCTION - Sequence Mutator (SeqMut) class for analyzing the impact of amino acid substitutions in protein sequences. |
|
UNDER CONSTRUCTION - Plotting class for |
Utility Functions
|
Compute an adjusted Area Under the Curve (AUC) [-0.5, 0.5] assessing the similarity between two groups. |
|
Compute an adjusted Bayesian Information Criterion (BIC) (-∞, ∞) for assessing clustering quality. |
|
Compute the Kullback-Leibler Divergence (KLD) [0, ∞) for assessing the similarity between two groups. |
|
Display DataFrame with specific style as HTML output for jupyter notebooks. |
A class for managing system-level settings for AAanalysis. |
|
|
Get the current font size (or axes linewidth). |
|
Get color dictionaries specified for AAanalysis. |
|
Get a manually curated list of 2 to 9 colors or 'husl' palette for more than 9 colors. |
|
Get colormaps specified for AAanalysis. |
|
Set an independently customizable plot legend. |
|
Configure general plot settings. |