AnnotationPreprocessor.build_cat
- AnnotationPreprocessor.build_cat(features, dim_names_override=None)[source]
Build the
df_catmetadata frame forfeatures(corpus-free).df_cat[category]is'PTMs'or'Functional sites';df_cat[subcategory]carries the per-key semantic split.Added in version 1.1.0.
- Parameters:
- Returns:
df_cat – One row per dimension:
scale_id,category,subcategory,scale_name,scale_description.- Return type:
pd.DataFrame, shape (D, 5)
- Raises:
ValueError – On invalid or unregistered feature keys in
features.
Examples
build_catreturns the corpus-freedf_catmetadata tagging each annotation dimension with itscategory(PTMsfor the closed UniProt vocabulary,Functional sitesfor the open one) and locked color — the drop-indf_catforCPP.run_num.import warnings import numpy as np import pandas as pd import aaanalysis as aa import aaanalysis.utils as ut aa.options['verbose'] = False warnings.filterwarnings('ignore') ap = aa.AnnotationPreprocessor(verbose=False) df_seq = pd.DataFrame({'entry': ['AF_TINY'], 'sequence': ['ACDEFGHIKLMNPQRSTVWYACDEFGHIKL']}) # A small user/predictor table -> Functional sites (open vocabulary). df_user = pd.DataFrame({ut.COL_PROTEIN_ID: ['AF_TINY', 'AF_TINY'], ut.COL_START: [3, 16], ut.COL_FEATURE_TYPE: ['hotspot', 'hotspot'], ut.COL_SCORE: [0.92, 0.40]}) df_annot = ap.ingest(df_user) df_cat = ap.build_cat(features=['hotspot']) print('category:', df_cat[ut.COL_CAT].iloc[0], '| color:', ut.DICT_COLOR_CAT[df_cat[ut.COL_CAT].iloc[0]]) df_cat
category: Functional sites | color: #2C6E9E
scale_id category subcategory scale_name scale_description 0 hotspot Functional sites FUNC_hotspot hotspot Functional sites/FUNC_hotspot Further parameters.
AnnotationPreprocessor.build_catalso accepts:dim_names_override— Replacement names for the D columns.