AnnotationPreprocessor.ingest

AnnotationPreprocessor.ingest(df_user)[source]

Ingest a user / predictor annotation table into df_annot.

Every ingested feature_type is treated as a 'Functional sites' key; unknown keys auto-register (num_dims=1, identity normalization) unless previously registered via register_feature().

Added in version 1.1.0.

Parameters:

df_user (pd.DataFrame) – Must contain protein_id, start (1-based position), and feature_type columns. Optional: end (defaults to start), source (defaults to 'user'), score (defaults to 1.0, must lie in [0, 1]), aa (expected residue; '' disables the encode-time guard for that row).

Returns:

df_annot – Canonical per-residue annotation schema, category='Functional sites' for every row.

Return type:

pd.DataFrame

Raises:

ValueError – On missing required columns or out-of-range scores.

Examples

ingest takes a user / predictor table (protein_id, start, feature_type, optional end / source / score / aa) and maps it into the canonical df_annot schema. Every feature_type becomes a Functional sites key; unknown keys auto-register (num_dims=1, identity normalization) unless previously set via register_feature.

import warnings
import numpy as np
import pandas as pd
import aaanalysis as aa
import aaanalysis.utils as ut
aa.options['verbose'] = False
warnings.filterwarnings('ignore')

ap = aa.AnnotationPreprocessor(verbose=False)
df_seq = pd.DataFrame({'entry': ['AF_TINY'],
                       'sequence': ['ACDEFGHIKLMNPQRSTVWYACDEFGHIKL']})
# A small user/predictor table -> Functional sites (open vocabulary).
df_user = pd.DataFrame({ut.COL_PROTEIN_ID: ['AF_TINY', 'AF_TINY'],
                        ut.COL_START: [3, 16],
                        ut.COL_FEATURE_TYPE: ['hotspot', 'hotspot'],
                        ut.COL_SCORE: [0.92, 0.40]})
df_annot = ap.ingest(df_user)

print("auto-registered 'hotspot':", 'hotspot' in ap._registry)
df_annot
auto-registered 'hotspot': True
protein_id start end aa feature_type category source evidence score bond_id
0 AF_TINY 3 3 hotspot Functional sites user 0.92 None
1 AF_TINY 16 16 hotspot Functional sites user 0.40 None