AnnotationPreprocessor.ingest
- AnnotationPreprocessor.ingest(df_user)[source]
Ingest a user / predictor annotation table into
df_annot.Every ingested
feature_typeis treated as a'Functional sites'key; unknown keys auto-register (num_dims=1, identity normalization) unless previously registered viaregister_feature().Added in version 1.1.0.
- Parameters:
df_user (pd.DataFrame) – Must contain
protein_id,start(1-based position), andfeature_typecolumns. Optional:end(defaults tostart),source(defaults to'user'),score(defaults to1.0, must lie in[0, 1]),aa(expected residue;''disables the encode-time guard for that row).- Returns:
df_annot – Canonical per-residue annotation schema,
category='Functional sites'for every row.- Return type:
pd.DataFrame
- Raises:
ValueError – On missing required columns or out-of-range scores.
Examples
ingesttakes a user / predictor table (protein_id,start,feature_type, optionalend/source/score/aa) and maps it into the canonicaldf_annotschema. Everyfeature_typebecomes a Functional sites key; unknown keys auto-register (num_dims=1, identity normalization) unless previously set viaregister_feature.import warnings import numpy as np import pandas as pd import aaanalysis as aa import aaanalysis.utils as ut aa.options['verbose'] = False warnings.filterwarnings('ignore') ap = aa.AnnotationPreprocessor(verbose=False) df_seq = pd.DataFrame({'entry': ['AF_TINY'], 'sequence': ['ACDEFGHIKLMNPQRSTVWYACDEFGHIKL']}) # A small user/predictor table -> Functional sites (open vocabulary). df_user = pd.DataFrame({ut.COL_PROTEIN_ID: ['AF_TINY', 'AF_TINY'], ut.COL_START: [3, 16], ut.COL_FEATURE_TYPE: ['hotspot', 'hotspot'], ut.COL_SCORE: [0.92, 0.40]}) df_annot = ap.ingest(df_user) print("auto-registered 'hotspot':", 'hotspot' in ap._registry) df_annot
auto-registered 'hotspot': True
protein_id start end aa feature_type category source evidence score bond_id 0 AF_TINY 3 3 hotspot Functional sites user 0.92 None 1 AF_TINY 16 16 hotspot Functional sites user 0.40 None