StructurePreprocessor.build_cat
- StructurePreprocessor.build_cat(features, dim_names_override=None)[source]
Build the
df_catmetadata frame forfeatures.Pure registry lookup — corpus-free.
df_cat[category]is always'Structure'for every StructurePreprocessor feature; the per-key semantics live indf_cat[subcategory](see registry).Added in version 1.1.0.
- Parameters:
- Returns:
df_cat – One row per dimension:
scale_id,category,subcategory,scale_name,scale_description.categoryis the top-level color/redundancy-bucket bucket;subcategorycarries the fine-grained semantic split ('DSSP_SS_3state','Flexibility_bfactor', etc.).- Return type:
pd.DataFrame, shape (D_total, 5)
See also
StructurePreprocessor.build_scales(): corpus-derived pseudo-scale companion todf_cat.
Examples
build_catreturns the corpus-freedf_catmetadata that names each structure dimension with itscategory(alwaysStructure, the locked redundancy / color bucket) and a descriptivesubcategoryfor theCPPPlot.feature_mapy-axis — the drop-indf_catforCPP.run_num.import warnings from pathlib import Path import numpy as np import pandas as pd import aaanalysis as aa import aaanalysis.utils as ut aa.options['verbose'] = False warnings.filterwarnings('ignore') PDB_FIXTURES = Path(aa.__file__).resolve().parent / '_data' / 'pdb_test' stp = aa.StructurePreprocessor(verbose=False) df_seq = pd.DataFrame({'entry': ['AF_TINY'], 'sequence': ['ACDEFGHIKLMNPQRSTVWYACDEFGHIKL']}) df_cat = stp.build_cat(features=['plddt', 'contact_count_8A', 'bfactor']) print('categories:', df_cat[ut.COL_CAT].unique().tolist()) df_cat
categories: ['Structure']
scale_id category subcategory scale_name scale_description 0 plddt Structure AlphaFold pLDDT (raw) plddt Structure/AlphaFold pLDDT (raw) 1 contacts_8A Structure CA-CA contacts (8 A) contacts_8A Structure/CA-CA contacts (8 A) 2 bfactor Structure B-factor (CA mean) bfactor Structure/B-factor (CA mean) Further parameters.
StructurePreprocessor.build_catalso accepts:dim_names_override— Replacement names for the D columns; length must equal the total dimensionality acrossfeatures.