AnnotationPreprocessor.build_cat

AnnotationPreprocessor.build_cat(features, dim_names_override=None)[source]

Build the df_cat metadata frame for features (corpus-free).

df_cat[category] is 'PTMs' or 'Functional sites'; df_cat[subcategory] carries the per-key semantic split.

Added in version 1.1.0.

Parameters:

features (list of str) – Registry keys, in the order they appear along the D axis.
dim_names_override (list of str, optional) – Replacement names for the D columns.

Returns:

df_cat – One row per dimension: scale_id, category, subcategory, scale_name, scale_description.

Return type:

pd.DataFrame, shape (D, 5)

Raises:

ValueError – On invalid or unregistered feature keys in features.

Examples

build_cat returns the corpus-free df_cat metadata tagging each annotation dimension with its category (PTMs for the closed UniProt vocabulary, Functional sites for the open one) and locked color — the drop-in df_cat for CPP.run_num.

import warnings
import numpy as np
import pandas as pd
import aaanalysis as aa
import aaanalysis.utils as ut
aa.options['verbose'] = False
warnings.filterwarnings('ignore')

annp = aa.AnnotationPreprocessor(verbose=False)
df_seq = pd.DataFrame({'entry': ['AF_TINY'],
                       'sequence': ['ACDEFGHIKLMNPQRSTVWYACDEFGHIKL']})
# A small user/predictor table -> Functional sites (open vocabulary).
df_user = pd.DataFrame({ut.COL_PROTEIN_ID: ['AF_TINY', 'AF_TINY'],
                        ut.COL_START: [3, 16],
                        ut.COL_FEATURE_TYPE: ['hotspot', 'hotspot'],
                        ut.COL_SCORE: [0.92, 0.40]})
df_annot = annp.ingest(df_user)

df_cat = annp.build_cat(features=['hotspot'])
print('category:', df_cat[ut.COL_CAT].iloc[0],
      '| color:', ut.DICT_COLOR_CAT[df_cat[ut.COL_CAT].iloc[0]])
df_cat

category: Functional sites | color: #2C6E9E

	scale_id	category	subcategory	scale_name	scale_description
0	hotspot	Functional sites	FUNC_hotspot	hotspot	Functional sites/FUNC_hotspot

Further parameters. AnnotationPreprocessor.build_cat also accepts: dim_names_override — Replacement names for the D columns.

# Further parameter: rename the D annotation dimensions via ``dim_names_override``
# (length must equal D; here D = 1).
df_cat_named = annp.build_cat(features=['hotspot'],
                              dim_names_override=['hotspot_dim'])
aa.display_df(df_cat_named, n_rows=10, show_shape=True)

DataFrame shape: (1, 5)

	scale_id	category	subcategory	scale_name	scale_description
1	hotspot_dim	Functional sites	FUNC_hotspot	hotspot_dim	Functional sites/FUNC_hotspot