Welcome to the AAanalysis documentation!

Distribution

License PyPI - Package Version Supported Python Versions Downloads

Quality

PyPI - Status CI/CD Pipeline Codecov CodeQL GitHub Stars

AAanalysis Model Overview

AAanalysis (Amino Acid analysis) is a Python framework for interpretable sequence-based protein prediction. Its foundation are the following algorithms:

  • CPP: Comparative Physicochemical Profiling, a feature engineering algorithm comparing two sets of protein sequences to identify the set of most distinctive features.

  • dPULearn: deterministic Positive-Unlabeled (PU) Learning algorithm to enable training on unbalanced and small datasets.

  • AAclust: k-optimized clustering wrapper framework to select redundancy-reduced sets of numerical scales (e.g., amino acid scales).

In addition, AAanalysis provide functions for loading various protein benchmark datasets, amino acid scales, and their two-level classification (AAontology). We combined CPP with the explainable AI SHAP framework to explain sample level predictions with single-residue resolution.

If you are looking to make publication-ready plots with a view lines of code, see our Plotting Prelude.

You can find the source code of AAanalysis at GitHub.

Find Your Way Around

The documentation is organized into pillars, each answering one question:

  • Getting StartedHow do I install AAanalysis and get a first result? Fast onboarding, no deep theory.

  • TutorialsHow does this tool work? Tool-level teaching of the AAanalysis building blocks — each tool, its parameters, and its outputs. The mechanics; Protocols reuse these and link back rather than repeating them.

  • ProtocolsHow do I design a valid, end-to-end analysis? Concept-level workflow teaching that builds the mental model for when and why to reach for each tool — linking to the Tutorials for the mechanics, so the two never overlap.

  • Use CasesHow do I adapt a full biological analysis? End-to-end biological case studies (coming soon).

  • APIWhat is the exact signature or parameter? Technical reference only, no teaching narrative.

Which section do I want?

  • You are new and want a first result → Getting Started.

  • You want to learn one specific tool (its parameters and outputs) → Tutorials.

  • You want to design a valid workflow for a biological question → Protocols.

  • You want to adapt a complete biological analysisUse Cases (coming soon).

  • You want exact technical details of a function or class → API.

You want to… / Go to

You want to…

Go to

Install AAanalysis and run a first CPP analysis

Getting Started

Learn what a specific function does and how to call it

Tutorials

Design a valid, end-to-end analysis for a biological question

Protocols

Adapt a full biological case study to your own data

Use Cases (coming soon)

Look up the exact signature, parameters, or return value

API

Install

AAanalysis can be installed from PyPi:

pip install aaanalysis

For extended features, including the explainable AI module:

pip install "aaanalysis[pro]"

If you use uv, the equivalent commands are:

uv pip install aaanalysis
uv pip install "aaanalysis[pro]"

Contributing

We appreciate bug reports, feature requests, or updates on documentation and code. For details, please refer to Contributing Guidelines. These cover AAanalysis development conventions and the automated quality gates every change must pass. For further questions or suggestions, please email stephanbreimann@gmail.com.

Cheat Sheet

The cheat sheet distills AAanalysis into a three-page summary: the golden workflow, the main classes grouped by capability, the prediction levels (residue / domain / protein), and the Part × Split × Scale feature ontology.

Click the image to open the interactive cheat sheet in your browser or click here to download the PDF cheat sheet.

AAanalysis cheat sheet (page 1 of 3)

The AAanalysis Ecosystem

AAanalysis is the interpretable middle layer between bioinformatics I/O and the downstream machine learning, explainable AI, and protein-design stack. It consumes upstream representations (sequences, embeddings, structures) and even competitor descriptor sets, runs them through its interpretable core (Part × Split × Scale · AAontology · CPP · ShapModel), and exposes the resulting features, explanations, and objectives to the standard ML / XAI / optimization tools.

The AAanalysis ecosystem — where AAanalysis fits in the protein-ML stack

Explore the full ecosystem map — per-category packages, the comparison matrix, and where AAanalysis sits in the protein-ML stack. Click the diagram to open it.

Citation

If you use AAanalysis in your work, please cite the respective publication as follows:

AAclust:

[Breimann24a] Breimann and Frishman (2024a), AAclust: k-optimized clustering for selecting redundancy-reduced sets of amino acid scales, Bioinformatics Advances.

AAontology:

[Breimann24b] Breimann et al. (2024b), AAontology: An ontology of amino acid scales for interpretable machine learning, Journal of Molecular Biology.

CPP:

[Breimann25] Breimann and Kamp et al. (2025), Charting γ-secretase substrates by explainable AI, Nature Communications.

dPULearn:

[Breimann25] Breimann and Kamp et al. (2025), Charting γ-secretase substrates by explainable AI, Nature Communications.