Welcome to the AAanalysis documentation!
Distribution |
|
|---|---|
Quality |
AAanalysis (Amino Acid analysis) is a Python framework for interpretable sequence-based protein prediction. Its foundation are the following algorithms:
CPP: Comparative Physicochemical Profiling, a feature engineering algorithm comparing two sets of protein sequences to identify the set of most distinctive features.
dPULearn: deterministic Positive-Unlabeled (PU) Learning algorithm to enable training on unbalanced and small datasets.
AAclust: k-optimized clustering wrapper framework to select redundancy-reduced sets of numerical scales (e.g., amino acid scales).
In addition, AAanalysis provide functions for loading various protein benchmark datasets, amino acid scales, and their two-level classification (AAontology). We combined CPP with the explainable AI SHAP framework to explain sample level predictions with single-residue resolution.
If you are looking to make publication-ready plots with a view lines of code, see our Plotting Prelude.
You can find the source code of AAanalysis at GitHub.
Find Your Way Around
Pick a section by what you want to do:
Which section do I want?
New here and want a first result → Getting Started.
Learn what one specific tool does, with its parameters and outputs → Tutorials.
Design a valid, end-to-end analysis for a biological question → Protocols.
Adapt a full biological case study to your own data → Use Cases (coming soon).
Look up the exact signature, parameters, or return value → API.
Install
AAanalysis can be installed from PyPi:
pip install aaanalysis
For extended features, including the explainable AI module:
pip install "aaanalysis[pro]"
If you use uv, the equivalent commands are:
uv pip install aaanalysis
uv pip install "aaanalysis[pro]"
Contributing
We appreciate bug reports, feature requests, or updates on documentation and code. For details, please refer to Contributing Guidelines. These cover AAanalysis development conventions and the automated quality gates every change must pass. For further questions or suggestions, please email stephanbreimann@gmail.com.
Cheat Sheet
The cheat sheet distills AAanalysis into a three-page summary: the golden workflow, the main classes grouped by capability, the prediction levels (residue / domain / protein), and the Part × Split × Scale feature ontology.
Click the image to open the cheat sheet in your browser or click here to download the PDF cheat sheet.
The AAanalysis Ecosystem
AAanalysis is the interpretable middle layer between bioinformatics I/O and the downstream machine
learning, explainable AI, and protein-design stack. It consumes upstream representations (sequences,
embeddings, structures) and even competitor descriptor sets, and runs them through its interpretable
core (Part × Split × Scale · AAontology · CPP). Downstream machine-learning and explainable-AI
methods then either consume these features directly or are integrated into AAanalysis through
wrappers or native implementations — for example SHAP via ShapModel, or machine-learning models
such as random forests via TreeModel — so the resulting features, explanations, and design
objectives feed straight into the standard ML / XAI / optimization tools.
Click the diagram to view and download the full map, or open the ecosystem positioning page — a self-contained walkthrough with the map, its introduction, and further background.
Citation
If you use AAanalysis in your work, please cite the respective publication as follows:
- AAclust:
[Breimann24a] Breimann and Frishman (2024a), AAclust: k-optimized clustering for selecting redundancy-reduced sets of amino acid scales, Bioinformatics Advances.
- AAontology:
[Breimann24b] Breimann et al. (2024b), AAontology: An ontology of amino acid scales for interpretable machine learning, Journal of Molecular Biology.
- CPP:
[Breimann25] Breimann and Kamp et al. (2025), Charting γ-secretase substrates by explainable AI, Nature Communications.
- dPULearn:
[Breimann25] Breimann and Kamp et al. (2025), Charting γ-secretase substrates by explainable AI, Nature Communications.