API

This Application Programming Interface (API) is the public interface for the objects and functions of our AAanalysis Python toolkit, which can be imported by:

import aaanalysis as aa

You can then access all methods and objects via the aa alias, such as aa.load_dataset.

Data Handling

load_dataset([name, n, random, ...])

Load protein benchmarking datasets.

load_scales([name, just_aaindex, ...])

Load amino acid scales or their classification (AAontology).

load_features([name])

Load feature sets for protein benchmarking datasets.

read_fasta(file_path[, col_id, col_seq, ...])

Read an FASTA file into a DataFrame.

to_fasta([df_seq, file_path, col_id, ...])

Write sequence DataFrame to a FASTA file.

comp_seq_sim([seq1, seq2, df_seq])

Compute pairwise similarity between two or more sequences.

filter_seq([df_seq, method, ...])

Redundancy reduction of sequences using clustering-based algorithms.

SequencePreprocessor()

Utility data preprocessing class to encode and represent protein sequences.

Sequence Analysis

AAlogo([logo_type])

Amino Acid logo (AAlogo) class for computing sequence logo matrices and conservation scores.

AAlogoPlot([logo_type, jmd_n_len, ...])

Amino Acid logo Plot (AAlogoPlot) class for visualizing sequence logos.

comp_seq_sim([seq1, seq2, df_seq])

Compute pairwise similarity between two or more sequences.

filter_seq([df_seq, method, ...])

Redundancy reduction of sequences using clustering-based algorithms.

Feature Engineering

AAclust([model_class, model_kwargs, ...])

Amino Acid clustering (AAclust) class: A k-optimized clustering wrapper for selecting redundancy-reduced sets of numerical scales [Breimann24a].

AAclustPlot([model_class, model_kwargs, ...])

Plotting class for AAclust (Amino Acid clustering) results [Breimann24a].

SequenceFeature([verbose])

Utility feature engineering class using sequences to create CPP feature components (Parts, Splits, and Scales) and data structures [Breimann25a].

NumericalFeature()

Utility feature engineering class to process and filter numerical data structures, such as amino acid scales or a feature matrix.

CPP([df_parts, split_kws, df_scales, ...])

Comparative Physicochemical Profiling (CPP) class to create and filter features that are most discriminant between two sets of sequences [Breimann25a].

CPPPlot([df_scales, df_cat, jmd_n_len, ...])

Plotting class for CPP (Comparative Physicochemical Profiling) results [Breimann25a].

PU Learning

dPULearn([model_kwargs, verbose, random_state])

Deterministic Positive-Unlabeled Learning (dPULearn) class for identifying reliable negatives from unlabeled data [Breimann25a].

dPULearnPlot()

Plotting class for dPULearn (deterministic Positive-Unlabeled Learning) results [Breimann25a].

Explainable AI

TreeModel([list_model_classes, ...])

Tree Model class: A wrapper for tree-based models to obtain Monte Carlo estimates of feature importance and predictions [Breimann25a].

ShapModel([explainer_class, ...])

SHAP Model class: A wrapper for SHAP (SHapley Additive exPlanations) explainers to obtain Monte Carlo estimates for feature impact [Breimann25a].

Protein Design

AAMut([verbose, df_scales])

UNDER CONSTRUCTION - Amino Acid Mutator (AAMut) class for analyzing the impact of amino acid substitutions on amino acid scales.

AAMutPlot([verbose, df_scales])

UNDER CONSTRUCTION - Plotting class for AAMut (Amino Acid Mutator).

SeqMut([verbose])

UNDER CONSTRUCTION - Sequence Mutator (SeqMut) class for analyzing the impact of amino acid substitutions in protein sequences.

SeqMutPlot([verbose])

UNDER CONSTRUCTION - Plotting class for SeqMut (Sequence Mutator).

Utility Functions

comp_auc_adjusted([X, labels, label_test, ...])

Compute an adjusted Area Under the Curve (AUC) [-0.5, 0.5] assessing the similarity between two groups.

comp_bic_score([X, labels])

Compute an adjusted Bayesian Information Criterion (BIC) (-∞, ∞) for assessing clustering quality.

comp_kld([X, labels, label_test, label_ref])

Compute the Kullback-Leibler Divergence (KLD) [0, ∞) for assessing the similarity between two groups.

display_df([df, max_width_pct, max_height, ...])

Display DataFrame with specific style as HTML output for jupyter notebooks.

options

A class for managing system-level settings for AAanalysis.

plot_gcfs([option])

Get the current font size (or axes linewidth).

plot_get_cdict([name])

Get color dictionaries specified for AAanalysis.

plot_get_clist([n_colors])

Get a manually curated list of 2 to 9 colors or 'husl' palette for more than 9 colors.

plot_get_cmap([name, n_colors, facecolor_dark])

Get colormaps specified for AAanalysis.

plot_legend([ax, dict_color, list_cat, ...])

Set an independently customizable plot legend.

plot_settings([font_scale, font, ...])

Configure general plot settings.