aaanalysis.combine_dict_nums

class aaanalysis.combine_dict_nums(dict_nums=None)[source]

Bases:

Concatenate multiple per-residue dict_num inputs along the D axis.

Source-agnostic: works with any combination of dicts whose values are (L_entry, D_i) ndarrays — typically the outputs of the three per-residue feature-source preprocessors (EmbeddingPreprocessor, StructurePreprocessor, AnnotationPreprocessor), but also user-supplied per-residue embeddings or other per-residue numerical representations.

Added in version 1.1.0.

Parameters:

dict_nums (list of dict[str, np.ndarray]) – Each input is a {entry: (L_entry, D_i) ndarray}. All inputs must share the same entry set; per entry, all inputs must share the same L. The D axis is concatenated; output’s D equals the sum of input Ds.

Returns:

dict_num{entry: (L_entry, D_total) ndarray} ready for NumericalFeature.get_parts().

Return type:

dict[str, np.ndarray]

Raises:

ValueError – If dict_nums is not a non-empty list of dicts, if the entry sets diverge across inputs, if any value is not a 2D np.ndarray, or if the per-entry L differs across inputs.

See also

Examples

combine_dict_nums concatenates several per-residue tensors ({entry: (L, D)}) along the feature axis, so embedding, structure, and annotation sources can be fused into one dict_num for :meth:CPP.run_num.

import numpy as np
import aaanalysis as aa
aa.options["verbose"] = False

df_seq = aa.load_dataset(name="DOM_GSEC", n=4)
rng = np.random.default_rng(0)
d1 = {e: rng.random((len(s), 3)) for e, s in zip(df_seq["entry"], df_seq["sequence"])}
d2 = {e: rng.random((len(s), 5)) for e, s in zip(df_seq["entry"], df_seq["sequence"])}

combined = aa.combine_dict_nums(dict_nums=[d1, d2])
{e: v.shape for e, v in combined.items()}
{'Q14802': (87, 8),
 'Q86UE4': (582, 8),
 'Q969W9': (287, 8),
 'P53801': (180, 8),
 'P05067': (770, 8),
 'P14925': (976, 8),
 'P70180': (536, 8),
 'Q03157': (654, 8)}