aaanalysis.load_scales
- class aaanalysis.load_scales(name='scales', just_aaindex=False, unclassified_out=False, top60_n=None)[source]
Bases:
Load amino acid scales or their classification (AAontology).
The amino acid scales (
name='scales_raw') encompass all scales from AAindex ([Kawashima08]) along with two additional data sources. These scales were min-max normalized (name='scales') and organized in a two-level classification called AAontology (name='scales_cat'), as detailed in [Breimann24b]. The first 20 principal components (PCs) of all compressed scales are provided (name='scales_pc') and were used for an in-depth analysis of redundancy-reduced scale subsets obtained byAAclust([Breimann24a]). The top 60 scale sets from this analysis are available either collectively (name='top60') or individually (top60_n='1-60'), accompanied by their evaluations in the'top60_eval'dataset.Added in version 0.1.0.
- Parameters:
name (str, default='scales') –
Name of the loaded dataset:
scales_raw: All amino acid scales.scales: Min-max normalized raw scales.scales_cat: Two-level classification (AAontology).scales_pc: First 20 PCs of compressed scales.top60: Selection of 60 best performing scale sets.top60_eval: Evaluation of 60 best performing scale sets.
Or Number between 1 and 60 to select the i-th top60 dataset.
just_aaindex (bool, default=False) – If
True, returns only scales from AAindex. Relevant only for ‘scales’, ‘scales_raw’, or ‘scales_cat’.unclassified_out (bool, default=False) – Determines exclusion of unclassified scales. Relevant only for ‘scales’, ‘scales_raw’, or ‘scales_cat’.
top60_n (int or str, optional) – Select the n-th scale set from top60 sets and return it for ‘scales’, ‘scales_raw’, or ‘scales_cat’. Allowed strings are AAclust ids (e.g., ‘AAC01’).
- Returns:
df – DataFrame containing the chosen dataset, recommended to be named by their name suffix (
df_scales,df_cat).- Return type:
Notes
df_catincludes the following columns:‘scale_id’: ID of scale (from AAindex or following the same naming convention).
‘category’: Category of scale (defined in AAontology).
‘subcategory’: Subcategory of scale (AAontology).
‘scale_name’: Name of scale derived from scale description.
‘scale_description’: Description of scale (derived from AAindex).
Scales under the ‘Others’ category are considered unclassified.
See also
Overview of all loading options: Amino Acid Scale Datasets.
AAontology: Categories and Subcategories tables.
Step-by-step guide in the Scale Loading Tutorial.
AAclustfor customizing redundancy-reduced scale sets.
Examples
You can load all scales from
AAontology(see [Breimann24b]) and their description using theload_scales()function. The min-max normalized numerical scales are provided by default:import aaanalysis as aa df_scales = aa.load_scales() print(len(df_scales.T), "scales exist in AAontology") aa.display_df(df_scales, n_cols=5)
586 scales exist in AAontology
ANDN920101 ARGP820101 ARGP820102 ARGP820103 BEGF750101 AA A 0.494000 0.230000 0.355000 0.504000 1.000000 C 0.864000 0.404000 0.579000 0.387000 0.000000 D 1.000000 0.174000 0.000000 0.000000 0.404000 E 0.420000 0.177000 0.019000 0.032000 0.713000 F 0.877000 0.762000 0.601000 0.670000 0.574000 G 0.025000 0.026000 0.138000 0.170000 0.309000 H 0.840000 0.230000 0.082000 0.053000 0.574000 I 0.000000 0.838000 0.440000 0.543000 0.713000 K 0.506000 0.434000 0.003000 0.004000 0.574000 L 0.272000 0.577000 1.000000 0.989000 1.000000 M 0.704000 0.445000 0.824000 1.000000 1.000000 N 0.988000 0.023000 0.057000 0.046000 0.309000 P 0.605000 0.736000 0.223000 0.220000 0.000000 Q 0.519000 0.000000 0.211000 0.131000 0.404000 R 0.531000 0.226000 0.047000 0.110000 0.489000 S 0.679000 0.019000 0.289000 0.238000 0.309000 T 0.494000 0.019000 0.248000 0.273000 0.404000 V 0.000000 0.498000 0.324000 0.355000 0.809000 W 0.926000 1.000000 0.226000 0.333000 0.713000 Y 0.802000 0.709000 0.107000 0.191000 0.404000 To retrieve the un-normalized scales, set
name='scales_raw':df_scales = aa.load_scales(name="scales_raw") aa.display_df(df_scales, n_cols=5, n_rows=5)
ANDN920101 ARGP820101 ARGP820102 ARGP820103 BEGF750101 AA A 4.350000 0.610000 1.180000 1.560000 1.000000 C 4.650000 1.070000 1.890000 1.230000 0.060000 D 4.760000 0.460000 0.050000 0.140000 0.440000 E 4.290000 0.470000 0.110000 0.230000 0.730000 F 4.660000 2.020000 1.960000 2.030000 0.600000 The first 20 PCs of all compressed scales can be retrieved by
name='scales_pca':df_pc = aa.load_scales(name="scales_pc") aa.display_df(df_pc, show_shape=True)
DataFrame shape: (20, 20)
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20 AA A 0.073190 0.259030 -0.270090 -0.374730 -0.009310 0.159360 -0.090600 0.193800 0.106010 0.165780 0.127210 -0.161270 -0.035360 -0.133400 0.084230 -0.136950 -0.503810 -0.441080 -0.235460 -0.115310 C 0.205620 0.130650 -0.164980 0.302520 0.539970 0.264480 0.488150 0.051980 0.287850 -0.268360 -0.083770 -0.121410 -0.109260 -0.091250 -0.008110 0.108830 -0.001420 0.002640 -0.049160 -0.090410 D -0.264620 0.245490 -0.055430 0.063120 0.125590 0.255040 -0.203960 -0.491080 -0.010360 -0.073790 0.119410 0.009770 -0.070680 0.337390 0.401750 0.068220 0.266940 -0.350000 -0.008320 0.021170 E -0.195640 0.341280 0.127870 -0.314750 -0.018020 0.337190 -0.088000 -0.325790 0.083820 -0.160650 0.150770 -0.083790 0.078590 -0.304560 -0.185970 0.050030 -0.122970 0.526190 0.093960 -0.049910 F 0.348890 0.125190 0.034520 0.118960 -0.091320 0.009140 -0.197540 -0.011910 -0.212210 0.005670 0.030470 0.057490 -0.231740 -0.003790 0.022030 0.772250 -0.266000 -0.022190 0.162590 0.046080 G -0.145180 0.043160 -0.570540 0.098060 0.142450 -0.225710 -0.419920 0.279690 0.152640 -0.258630 0.331190 0.123200 0.159960 -0.040240 -0.082410 0.126100 0.172820 0.143800 -0.021170 -0.013710 H 0.011380 0.269740 0.159750 0.175360 0.140680 0.039330 -0.029280 0.243770 -0.640950 -0.052370 0.149440 -0.454060 0.277910 -0.122510 -0.045280 -0.071300 0.183940 -0.094790 0.008380 -0.069890 I 0.374000 0.121890 -0.126860 -0.065250 -0.155980 -0.140290 0.056690 -0.245830 -0.074940 -0.230000 -0.066920 0.126430 0.278090 0.317410 -0.136170 -0.116940 -0.048480 -0.011570 0.106600 -0.643530 K -0.202260 0.363130 0.183790 -0.111210 -0.171380 -0.214410 0.039890 0.320010 0.256790 -0.247370 -0.467600 -0.097600 0.259350 0.079820 0.361420 0.200120 0.001610 0.031450 0.022120 0.025380 L 0.329240 0.177160 -0.139820 -0.286860 -0.194360 0.007340 -0.111360 0.074030 0.021080 -0.084450 -0.184180 -0.240430 -0.529920 -0.014800 -0.095810 -0.092210 0.533010 0.065930 -0.124290 0.027420 M 0.289160 0.251510 0.091470 -0.045450 0.115730 0.289220 -0.027170 0.324950 -0.160960 0.102270 0.098870 0.568290 0.015620 0.155680 0.312940 -0.242760 0.031370 0.270270 -0.003700 0.121800 N -0.204120 0.231170 -0.097670 0.214020 0.165010 -0.073740 -0.200140 -0.064790 -0.228230 -0.089820 -0.412980 0.046050 -0.217960 0.321250 -0.358480 -0.182220 -0.346170 0.128620 -0.247410 0.172590 P -0.166120 0.037590 -0.276880 0.384140 -0.691610 0.405000 0.245850 0.129850 -0.038030 -0.107960 0.076900 -0.010790 -0.025870 0.046690 -0.030560 -0.027880 -0.038210 0.019500 -0.022540 0.032680 Q -0.119850 0.334480 0.116780 0.002460 -0.024550 0.011320 0.091330 0.022790 0.054670 0.106180 -0.094670 0.476790 0.092050 -0.269600 -0.491950 0.112910 0.272950 -0.434320 0.026510 0.014460 R -0.146320 0.295020 0.283520 0.027240 -0.055660 -0.400980 0.290780 0.098030 0.116960 -0.054230 0.562180 -0.022100 -0.359900 0.268070 -0.071670 -0.043500 -0.086680 0.028750 0.001300 -0.049780 S -0.134520 0.220350 -0.315510 0.084300 0.067610 -0.126640 0.060490 -0.000750 -0.029870 0.390700 -0.158630 -0.068780 -0.175300 -0.085460 0.095610 -0.162100 -0.032190 0.065770 0.729960 -0.085650 T -0.024920 0.194840 -0.219810 0.062620 -0.014410 -0.171330 0.220850 -0.164790 -0.032160 0.593290 -0.017750 -0.030370 0.163560 0.008680 0.093110 0.238680 0.145260 0.287260 -0.489980 -0.115160 V 0.310250 0.155950 -0.230960 -0.132630 -0.074050 -0.208070 0.242330 -0.272770 -0.034070 -0.123100 0.093210 -0.064920 0.304670 0.058230 -0.044470 -0.078980 -0.046460 -0.060360 0.107190 0.689670 W 0.287410 0.142550 0.249530 0.355490 -0.053760 0.136800 -0.371840 0.031690 0.498030 0.300260 0.064360 -0.257830 0.200040 0.189230 -0.198120 -0.114360 0.021200 0.008150 0.065530 0.076070 Y 0.160400 0.150020 0.081790 0.400500 -0.128640 -0.293600 -0.164880 -0.258810 0.019610 -0.154980 -0.015850 0.121660 -0.130260 -0.567950 0.309210 -0.269350 -0.099280 -0.021080 -0.166870 -0.049600 The
AAontolgytwo-level classification can be retrieved byname='scales_cat'df_cat = aa.load_scales(name="scales_cat") aa.display_df(df_cat, n_rows=50, show_shape=True)
DataFrame shape: (586, 5)
scale_id category subcategory scale_name scale_description 1 LINS030110 ASA/Volume Accessible surface area (ASA) ASA (folded coil/turn) Total median ac...s et al., 2003) 2 LINS030113 ASA/Volume Accessible surface area (ASA) ASA (folded coil/turn) % total accessi...s et al., 2003) 3 JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978) 4 JANJ780103 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Percentage of e...n et al., 1978) 5 LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003) 6 LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003) 7 CHOC760102 ASA/Volume Accessible surface area (ASA) ASA (folded proteins) Residue accessi...(Chothia, 1976) 8 LINS030116 ASA/Volume Accessible surface area (ASA) ASA (folded β-strand) Total median ac...s et al., 2003) 9 LINS030119 ASA/Volume Accessible surface area (ASA) ASA (folded β-strand) % total accessi...s et al., 2003) 10 LINS030103 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA Hydrophilic acc...s et al., 2003) 11 LINS030112 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...lded coil/turn) Hydrophilic med...s et al., 2003) 12 LINS030115 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...lded coil/turn) % Hydrophilic a...s et al., 2003) 13 LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003) 14 LINS030109 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) % Hydrophilic a...s et al., 2003) 15 LINS030118 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded β-strand) Hydrophilic med...s et al., 2003) 16 LINS030121 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded β-strand) % Hydrophilic a...s et al., 2003) 17 LINS030114 ASA/Volume Accessible surface area (ASA) Hydrophobic ASA...lded coil/turn) % Hydrophobic a...s et al., 2003) 18 LINS030108 ASA/Volume Accessible surface area (ASA) Hydrophobic ASA...folded protein) % Hydrophobic a...s et al., 2003) 19 LINS030120 ASA/Volume Accessible surface area (ASA) Hydrophobic ASA...olded β-strand) % Hydrophobic a...s et al., 2003) 20 GUYH850104 ASA/Volume Accessible surface area (ASA) Partition energy Apparent partit...dex (Guy, 1985) 21 GUYH850105 ASA/Volume Accessible surface area (ASA) Partition energy Apparent partit...dex (Guy, 1985) 22 RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977) 23 VHEG790101 ASA/Volume Accessible surface area (ASA) TFE to lipophilic phase Transfer free e...Blomberg, 1979) 24 BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988) 25 BIOV880102 ASA/Volume Buried Buriability Information val...u et al., 1988) 26 WERD780101 ASA/Volume Buried Buriability Propensity to b...Scheraga, 1978) 27 ZHOH040103 ASA/Volume Buried Buriability Buriability (Zhou-Zhou, 2004) 28 ARGP820103 ASA/Volume Buried Buried Membrane-buried...s et al., 1982) 29 CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 30 CHOC760104 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976) 31 JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978) 32 JANJ790101 ASA/Volume Buried Buried Ratio of buried...s (Janin, 1979) 33 OLSK800101 ASA/Volume Buried Buried Average interna...s (Olsen, 1980) 34 NISK800101 ASA/Volume Buried Interactivity 8 A contact num...kawa-Ooi, 1980) 35 WARP780101 ASA/Volume Buried Interactivity Average interac...e-Morgan, 1978) 36 LINS030111 ASA/Volume Hydrophobic ASA Hydrophobic ASA...lded coil/turn) Hydrophobic med...s et al., 2003) 37 LINS030105 ASA/Volume Hydrophobic ASA Hydrophobic ASA...olded proteins) Hydrophobic med...s et al., 2003) 38 LINS030117 ASA/Volume Hydrophobic ASA Hydrophobic ASA...olded β-strand) Hydrophobic med...s et al., 2003) 39 PONP800107 ASA/Volume Partial specific volume Accessibility reduction ratio Accessibility r...y et al., 1980) 40 ZIMJ680102 ASA/Volume Partial specific volume Bulkiness Bulkiness (Zimm...n et al., 1968) 41 LINS030102 ASA/Volume Partial specific volume Hydrophobic ASA Hydrophobic acc...s et al., 2003) 42 BASU050101 ASA/Volume Partial specific volume Interactivity Interactivity s...a et al., 2005) 43 BASU050103 ASA/Volume Partial specific volume Interactivity Interactivity s...a et al., 2005) 44 NISK860101 ASA/Volume Partial specific volume Interactivity 14 A contact nu...kawa-Ooi, 1986) 45 BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 46 COHE430101 ASA/Volume Partial specific volume Partial specific volume Partial specifi...n-Edsall, 1943) 47 PLIV810101 ASA/Volume Partial specific volume Partition coefficient Partition Coeff...a et al., 1981) 48 CHOC760101 ASA/Volume Volume Accessible surface area (ASA) Residue accessi...(Chothia, 1976) 49 LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003) 50 RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988) Additionally, we tested a plethora of scale sets (see [Breimann24a]) across 12 different protein benchmarking datasets—six at the sequence level and six at the amino acid level, each with three amino acid windows. This resulted in 24 evaluations, for which we identified the top 60 scale sets, referred to as ‘top60’. These can be accessed by setting
name='top60':df_top60 = aa.load_scales(name="top60") aa.display_df(df_top60, n_cols=5, show_shape=True)
DataFrame shape: (60, 586)
ANDN920101 ARGP820101 ARGP820102 ARGP820103 BEGF750101 top60_id AAC01 0 0 0 1 0 AAC02 1 0 0 1 0 AAC03 1 0 0 1 0 AAC04 1 0 0 1 0 AAC05 1 0 0 1 0 AAC06 1 0 0 1 0 AAC07 0 0 0 1 0 AAC08 1 0 0 1 0 AAC09 1 0 0 1 0 AAC10 1 0 0 1 0 AAC11 1 0 0 1 0 AAC12 1 0 0 1 0 AAC13 1 0 0 1 0 AAC14 0 0 0 1 0 AAC15 1 1 0 1 0 AAC16 1 0 1 0 0 AAC17 0 0 0 0 0 AAC18 1 0 1 0 0 AAC19 1 1 0 1 1 AAC20 0 0 0 0 0 AAC21 1 0 0 1 0 AAC22 1 1 0 0 0 AAC23 1 0 0 0 0 AAC24 1 1 0 1 1 AAC25 0 0 0 0 0 AAC26 0 0 0 0 0 AAC27 1 0 1 0 0 AAC28 0 0 0 0 0 AAC29 1 1 1 1 1 AAC30 1 1 1 1 1 AAC31 1 0 0 0 0 AAC32 1 1 0 1 1 AAC33 0 0 0 0 0 AAC34 1 0 0 0 0 AAC35 1 0 0 0 0 AAC36 1 0 0 0 0 AAC37 1 0 0 0 0 AAC38 1 0 0 0 0 AAC39 1 0 0 1 0 AAC40 1 0 1 0 0 AAC41 0 0 0 0 0 AAC42 1 1 0 0 0 AAC43 1 0 0 0 0 AAC44 1 0 0 1 0 AAC45 1 0 0 0 0 AAC46 0 0 0 0 0 AAC47 0 0 0 0 0 AAC48 0 0 0 0 0 AAC49 0 0 0 1 0 AAC50 0 1 0 0 0 AAC51 0 0 0 0 0 AAC52 0 0 0 0 0 AAC53 0 0 0 0 0 AAC54 0 0 0 0 0 AAC55 0 0 0 0 0 AAC56 0 0 0 0 0 AAC57 0 0 0 0 0 AAC58 0 0 0 0 0 AAC59 0 0 0 0 0 AAC60 0 0 0 0 0 To obtain their evaluation, set
name='top60_eval':df_top60_eval = aa.load_scales(name="top60_eval") aa.display_df(df_top60_eval, show_shape=True)
DataFrame shape: (60, 25)
n_scales SEQ_AMYLO SEQ_CAPSID SEQ_DISULFIDE SEQ_LOCATION SEQ_SOLUBLE SEQ_TAIL AA5_CASPASE3 AA5_FURIN AA5_LDR AA5_MMP2 AA5_RNABIND AA5_SA AA9_CASPASE3 AA9_FURIN AA9_LDR AA9_MMP2 AA9_RNABIND AA9_SA AA13_CASPASE3 AA13_FURIN AA13_LDR AA13_MMP2 AA13_RNABIND AA13_SA top60_id AAC01 183 0.761000 0.827000 0.732000 0.746000 0.646000 0.884000 0.862000 0.901000 0.612000 0.680000 0.652000 0.642000 0.816000 0.916000 0.644000 0.703000 0.659000 0.664000 0.790000 0.918000 0.694000 0.681000 0.652000 0.615000 AAC02 170 0.747000 0.830000 0.733000 0.742000 0.653000 0.886000 0.855000 0.907000 0.608000 0.688000 0.660000 0.640000 0.819000 0.915000 0.642000 0.706000 0.657000 0.671000 0.792000 0.916000 0.690000 0.676000 0.656000 0.608000 AAC03 137 0.741000 0.829000 0.734000 0.746000 0.648000 0.884000 0.857000 0.904000 0.601000 0.685000 0.661000 0.640000 0.818000 0.917000 0.636000 0.710000 0.659000 0.670000 0.791000 0.914000 0.695000 0.684000 0.656000 0.613000 AAC04 144 0.747000 0.828000 0.731000 0.747000 0.654000 0.885000 0.859000 0.906000 0.605000 0.686000 0.657000 0.639000 0.822000 0.913000 0.640000 0.714000 0.654000 0.664000 0.790000 0.915000 0.689000 0.680000 0.656000 0.610000 AAC05 138 0.739000 0.830000 0.735000 0.752000 0.646000 0.888000 0.859000 0.906000 0.601000 0.684000 0.655000 0.638000 0.823000 0.916000 0.640000 0.713000 0.658000 0.671000 0.790000 0.918000 0.689000 0.682000 0.649000 0.607000 AAC06 139 0.743000 0.827000 0.736000 0.746000 0.652000 0.883000 0.857000 0.906000 0.608000 0.684000 0.657000 0.640000 0.821000 0.914000 0.642000 0.709000 0.659000 0.665000 0.789000 0.915000 0.691000 0.680000 0.653000 0.611000 AAC07 121 0.742000 0.833000 0.736000 0.747000 0.650000 0.882000 0.858000 0.901000 0.606000 0.688000 0.655000 0.638000 0.820000 0.915000 0.638000 0.711000 0.661000 0.671000 0.789000 0.914000 0.689000 0.682000 0.655000 0.606000 AAC08 142 0.743000 0.831000 0.733000 0.746000 0.650000 0.884000 0.858000 0.903000 0.603000 0.687000 0.657000 0.640000 0.819000 0.916000 0.640000 0.710000 0.658000 0.669000 0.787000 0.916000 0.689000 0.681000 0.654000 0.608000 AAC09 263 0.753000 0.826000 0.736000 0.747000 0.647000 0.882000 0.858000 0.905000 0.608000 0.684000 0.655000 0.648000 0.820000 0.918000 0.642000 0.703000 0.653000 0.664000 0.787000 0.915000 0.690000 0.679000 0.647000 0.614000 AAC10 152 0.750000 0.828000 0.734000 0.748000 0.646000 0.886000 0.860000 0.908000 0.602000 0.684000 0.656000 0.645000 0.819000 0.913000 0.632000 0.711000 0.655000 0.670000 0.787000 0.913000 0.689000 0.676000 0.653000 0.610000 AAC11 150 0.749000 0.832000 0.732000 0.751000 0.647000 0.883000 0.860000 0.904000 0.605000 0.695000 0.650000 0.637000 0.815000 0.919000 0.638000 0.707000 0.654000 0.665000 0.781000 0.913000 0.689000 0.683000 0.652000 0.610000 AAC12 164 0.750000 0.828000 0.739000 0.746000 0.646000 0.889000 0.859000 0.904000 0.608000 0.688000 0.649000 0.637000 0.818000 0.913000 0.638000 0.708000 0.658000 0.666000 0.785000 0.917000 0.688000 0.676000 0.648000 0.613000 AAC13 125 0.744000 0.831000 0.738000 0.749000 0.650000 0.882000 0.856000 0.903000 0.600000 0.682000 0.653000 0.632000 0.818000 0.916000 0.634000 0.716000 0.657000 0.670000 0.790000 0.915000 0.688000 0.680000 0.654000 0.612000 AAC14 108 0.743000 0.831000 0.741000 0.744000 0.645000 0.884000 0.860000 0.902000 0.610000 0.673000 0.655000 0.634000 0.820000 0.914000 0.645000 0.710000 0.654000 0.666000 0.794000 0.914000 0.689000 0.678000 0.655000 0.604000 AAC15 183 0.753000 0.827000 0.732000 0.744000 0.646000 0.883000 0.857000 0.903000 0.617000 0.684000 0.659000 0.637000 0.821000 0.914000 0.632000 0.708000 0.654000 0.660000 0.785000 0.915000 0.687000 0.682000 0.649000 0.611000 AAC16 255 0.755000 0.830000 0.732000 0.741000 0.645000 0.884000 0.858000 0.902000 0.611000 0.684000 0.649000 0.648000 0.821000 0.917000 0.635000 0.702000 0.656000 0.666000 0.789000 0.915000 0.682000 0.681000 0.644000 0.611000 AAC17 101 0.740000 0.835000 0.736000 0.744000 0.644000 0.884000 0.858000 0.904000 0.606000 0.680000 0.646000 0.631000 0.817000 0.916000 0.646000 0.705000 0.651000 0.666000 0.793000 0.915000 0.691000 0.686000 0.654000 0.609000 AAC18 154 0.757000 0.826000 0.730000 0.742000 0.643000 0.881000 0.856000 0.899000 0.602000 0.684000 0.657000 0.635000 0.825000 0.922000 0.636000 0.705000 0.662000 0.658000 0.792000 0.912000 0.685000 0.679000 0.654000 0.612000 AAC19 406 0.764000 0.826000 0.730000 0.745000 0.636000 0.887000 0.860000 0.901000 0.605000 0.683000 0.649000 0.635000 0.819000 0.920000 0.633000 0.707000 0.651000 0.669000 0.789000 0.909000 0.687000 0.679000 0.654000 0.610000 AAC20 90 0.740000 0.833000 0.737000 0.742000 0.646000 0.887000 0.857000 0.901000 0.608000 0.677000 0.654000 0.632000 0.819000 0.915000 0.652000 0.702000 0.656000 0.669000 0.789000 0.908000 0.695000 0.675000 0.644000 0.610000 AAC21 202 0.757000 0.828000 0.730000 0.742000 0.637000 0.883000 0.853000 0.909000 0.607000 0.679000 0.657000 0.633000 0.820000 0.918000 0.640000 0.710000 0.648000 0.657000 0.789000 0.916000 0.688000 0.682000 0.651000 0.612000 AAC22 259 0.757000 0.823000 0.733000 0.739000 0.642000 0.880000 0.859000 0.903000 0.599000 0.682000 0.650000 0.638000 0.826000 0.917000 0.635000 0.700000 0.655000 0.665000 0.788000 0.917000 0.686000 0.682000 0.655000 0.611000 AAC23 219 0.756000 0.826000 0.734000 0.745000 0.644000 0.880000 0.858000 0.899000 0.601000 0.693000 0.650000 0.639000 0.823000 0.917000 0.629000 0.700000 0.655000 0.660000 0.783000 0.913000 0.689000 0.682000 0.655000 0.610000 AAC24 305 0.756000 0.824000 0.740000 0.741000 0.641000 0.880000 0.859000 0.903000 0.601000 0.685000 0.652000 0.635000 0.818000 0.919000 0.634000 0.705000 0.655000 0.663000 0.788000 0.914000 0.683000 0.677000 0.659000 0.607000 AAC25 96 0.764000 0.831000 0.732000 0.744000 0.639000 0.879000 0.862000 0.905000 0.603000 0.678000 0.659000 0.633000 0.816000 0.917000 0.636000 0.707000 0.646000 0.663000 0.790000 0.916000 0.678000 0.676000 0.656000 0.605000 AAC26 96 0.741000 0.828000 0.734000 0.745000 0.642000 0.887000 0.855000 0.904000 0.602000 0.680000 0.655000 0.634000 0.816000 0.914000 0.639000 0.705000 0.650000 0.671000 0.792000 0.914000 0.698000 0.681000 0.646000 0.601000 AAC27 127 0.753000 0.832000 0.731000 0.744000 0.640000 0.880000 0.854000 0.901000 0.607000 0.680000 0.658000 0.638000 0.820000 0.912000 0.638000 0.700000 0.643000 0.663000 0.793000 0.914000 0.683000 0.681000 0.658000 0.611000 AAC28 58 0.737000 0.821000 0.736000 0.745000 0.654000 0.883000 0.858000 0.898000 0.600000 0.675000 0.652000 0.641000 0.822000 0.910000 0.642000 0.703000 0.654000 0.670000 0.789000 0.914000 0.691000 0.679000 0.655000 0.605000 AAC29 514 0.763000 0.824000 0.731000 0.746000 0.636000 0.880000 0.857000 0.900000 0.602000 0.675000 0.646000 0.635000 0.827000 0.914000 0.632000 0.705000 0.662000 0.666000 0.796000 0.908000 0.692000 0.679000 0.647000 0.610000 AAC30 573 0.763000 0.821000 0.732000 0.746000 0.644000 0.879000 0.854000 0.898000 0.606000 0.680000 0.655000 0.632000 0.818000 0.915000 0.633000 0.701000 0.659000 0.656000 0.793000 0.911000 0.682000 0.675000 0.664000 0.616000 AAC31 92 0.757000 0.821000 0.727000 0.747000 0.631000 0.879000 0.858000 0.900000 0.601000 0.685000 0.653000 0.641000 0.819000 0.918000 0.644000 0.711000 0.654000 0.662000 0.784000 0.912000 0.686000 0.678000 0.653000 0.611000 AAC32 431 0.761000 0.827000 0.736000 0.745000 0.637000 0.880000 0.857000 0.898000 0.602000 0.675000 0.651000 0.638000 0.818000 0.917000 0.634000 0.701000 0.670000 0.658000 0.792000 0.911000 0.678000 0.682000 0.650000 0.613000 AAC33 155 0.762000 0.817000 0.732000 0.739000 0.645000 0.877000 0.860000 0.902000 0.606000 0.678000 0.654000 0.631000 0.822000 0.916000 0.635000 0.700000 0.651000 0.661000 0.785000 0.922000 0.688000 0.683000 0.655000 0.608000 AAC34 49 0.755000 0.823000 0.729000 0.741000 0.640000 0.884000 0.859000 0.900000 0.602000 0.682000 0.667000 0.624000 0.818000 0.921000 0.634000 0.707000 0.645000 0.664000 0.787000 0.915000 0.688000 0.685000 0.649000 0.608000 AAC35 91 0.758000 0.822000 0.727000 0.741000 0.639000 0.875000 0.856000 0.904000 0.605000 0.679000 0.656000 0.632000 0.821000 0.912000 0.632000 0.702000 0.658000 0.658000 0.791000 0.914000 0.683000 0.678000 0.663000 0.618000 AAC36 103 0.759000 0.824000 0.734000 0.742000 0.642000 0.874000 0.864000 0.902000 0.600000 0.680000 0.657000 0.630000 0.817000 0.918000 0.634000 0.707000 0.651000 0.664000 0.787000 0.915000 0.688000 0.677000 0.646000 0.609000 AAC37 69 0.748000 0.823000 0.732000 0.738000 0.633000 0.882000 0.858000 0.899000 0.596000 0.684000 0.653000 0.635000 0.816000 0.919000 0.647000 0.706000 0.647000 0.655000 0.789000 0.916000 0.693000 0.682000 0.646000 0.606000 AAC38 46 0.766000 0.809000 0.731000 0.744000 0.620000 0.857000 0.857000 0.898000 0.609000 0.675000 0.650000 0.642000 0.815000 0.916000 0.650000 0.701000 0.661000 0.664000 0.783000 0.921000 0.685000 0.679000 0.652000 0.618000 AAC39 114 0.755000 0.823000 0.731000 0.744000 0.653000 0.874000 0.856000 0.898000 0.600000 0.682000 0.655000 0.636000 0.820000 0.913000 0.634000 0.698000 0.643000 0.660000 0.786000 0.913000 0.687000 0.676000 0.650000 0.615000 AAC40 50 0.756000 0.819000 0.738000 0.742000 0.629000 0.870000 0.851000 0.898000 0.602000 0.685000 0.668000 0.643000 0.814000 0.913000 0.640000 0.706000 0.655000 0.659000 0.775000 0.910000 0.682000 0.680000 0.652000 0.611000 AAC41 51 0.751000 0.822000 0.734000 0.740000 0.637000 0.882000 0.854000 0.896000 0.600000 0.684000 0.655000 0.644000 0.805000 0.914000 0.636000 0.695000 0.653000 0.661000 0.779000 0.912000 0.685000 0.691000 0.653000 0.607000 AAC42 68 0.761000 0.811000 0.731000 0.742000 0.635000 0.868000 0.861000 0.895000 0.604000 0.676000 0.648000 0.633000 0.816000 0.916000 0.636000 0.705000 0.657000 0.662000 0.799000 0.916000 0.681000 0.674000 0.646000 0.612000 AAC43 55 0.767000 0.815000 0.731000 0.740000 0.631000 0.871000 0.848000 0.899000 0.603000 0.681000 0.649000 0.639000 0.807000 0.913000 0.632000 0.701000 0.655000 0.661000 0.782000 0.914000 0.681000 0.676000 0.646000 0.625000 AAC44 65 0.758000 0.825000 0.734000 0.741000 0.621000 0.875000 0.855000 0.902000 0.588000 0.678000 0.657000 0.633000 0.810000 0.913000 0.635000 0.696000 0.658000 0.660000 0.780000 0.923000 0.682000 0.677000 0.653000 0.611000 AAC45 56 0.760000 0.823000 0.733000 0.741000 0.629000 0.873000 0.844000 0.893000 0.610000 0.681000 0.654000 0.635000 0.809000 0.907000 0.632000 0.702000 0.646000 0.672000 0.774000 0.914000 0.684000 0.681000 0.648000 0.612000 AAC46 54 0.759000 0.812000 0.733000 0.739000 0.626000 0.872000 0.843000 0.896000 0.599000 0.679000 0.644000 0.637000 0.810000 0.914000 0.631000 0.696000 0.667000 0.664000 0.779000 0.912000 0.681000 0.676000 0.650000 0.623000 AAC47 31 0.748000 0.817000 0.717000 0.738000 0.624000 0.882000 0.855000 0.896000 0.602000 0.675000 0.648000 0.627000 0.819000 0.910000 0.636000 0.694000 0.652000 0.653000 0.800000 0.916000 0.684000 0.672000 0.649000 0.605000 AAC48 30 0.771000 0.803000 0.723000 0.737000 0.626000 0.857000 0.848000 0.891000 0.603000 0.673000 0.657000 0.631000 0.813000 0.912000 0.638000 0.695000 0.662000 0.658000 0.776000 0.910000 0.674000 0.679000 0.648000 0.619000 AAC49 27 0.758000 0.807000 0.731000 0.751000 0.630000 0.850000 0.853000 0.892000 0.590000 0.675000 0.651000 0.633000 0.815000 0.914000 0.631000 0.696000 0.647000 0.660000 0.784000 0.904000 0.679000 0.677000 0.646000 0.615000 AAC50 32 0.762000 0.802000 0.728000 0.734000 0.620000 0.855000 0.831000 0.903000 0.607000 0.671000 0.645000 0.640000 0.800000 0.910000 0.641000 0.692000 0.650000 0.671000 0.773000 0.908000 0.687000 0.678000 0.649000 0.623000 AAC51 20 0.768000 0.802000 0.727000 0.737000 0.610000 0.865000 0.836000 0.894000 0.605000 0.668000 0.641000 0.632000 0.802000 0.902000 0.626000 0.697000 0.652000 0.660000 0.767000 0.903000 0.680000 0.691000 0.653000 0.614000 AAC52 14 0.755000 0.802000 0.729000 0.740000 0.613000 0.867000 0.814000 0.886000 0.601000 0.672000 0.644000 0.632000 0.796000 0.901000 0.631000 0.691000 0.650000 0.668000 0.755000 0.901000 0.683000 0.667000 0.650000 0.633000 AAC53 21 0.765000 0.798000 0.729000 0.732000 0.629000 0.862000 0.782000 0.885000 0.614000 0.661000 0.647000 0.638000 0.759000 0.905000 0.630000 0.681000 0.640000 0.658000 0.747000 0.899000 0.674000 0.671000 0.643000 0.616000 AAC54 11 0.743000 0.780000 0.699000 0.714000 0.619000 0.844000 0.840000 0.895000 0.587000 0.669000 0.622000 0.624000 0.806000 0.926000 0.628000 0.671000 0.620000 0.656000 0.770000 0.911000 0.685000 0.651000 0.623000 0.620000 AAC55 8 0.747000 0.769000 0.710000 0.726000 0.580000 0.835000 0.732000 0.832000 0.595000 0.640000 0.590000 0.630000 0.745000 0.866000 0.630000 0.667000 0.618000 0.680000 0.723000 0.843000 0.656000 0.637000 0.604000 0.620000 AAC56 4 0.782000 0.738000 0.687000 0.710000 0.570000 0.768000 0.685000 0.857000 0.596000 0.610000 0.584000 0.636000 0.687000 0.886000 0.625000 0.651000 0.606000 0.668000 0.691000 0.859000 0.670000 0.602000 0.601000 0.617000 AAC57 5 0.768000 0.734000 0.684000 0.713000 0.571000 0.774000 0.663000 0.843000 0.600000 0.613000 0.599000 0.639000 0.689000 0.873000 0.631000 0.633000 0.621000 0.668000 0.677000 0.855000 0.670000 0.619000 0.600000 0.629000 AAC58 6 0.776000 0.775000 0.701000 0.715000 0.594000 0.817000 0.662000 0.800000 0.599000 0.628000 0.546000 0.635000 0.675000 0.825000 0.629000 0.653000 0.568000 0.671000 0.666000 0.803000 0.665000 0.633000 0.556000 0.619000 AAC59 3 0.794000 0.725000 0.680000 0.710000 0.570000 0.802000 0.600000 0.797000 0.600000 0.589000 0.565000 0.628000 0.632000 0.859000 0.619000 0.592000 0.562000 0.670000 0.623000 0.815000 0.649000 0.585000 0.568000 0.615000 AAC60 3 0.789000 0.653000 0.656000 0.675000 0.563000 0.772000 0.636000 0.783000 0.590000 0.626000 0.535000 0.635000 0.670000 0.830000 0.637000 0.639000 0.529000 0.677000 0.670000 0.813000 0.645000 0.613000 0.546000 0.613000 You can for example select the scale set with the best performance for the ‘SEQ_CAPSID’ dataset as follows:
# Sort 'df_top60_eval' by 'SEQ_CAPSID' and get the index of the top row top_id = df_top60_eval.sort_values(by="SEQ_CAPSID", ascending=False).index[0] df_scales_top = aa.load_scales(top60_n=top_id) # Select a specific set using integers df_cat_top20 = aa.load_scales(name="scales_cat", top60_n=20) aa.display_df(df_cat_top20, n_rows=6, show_shape=True)
DataFrame shape: (90, 5)
scale_id category subcategory scale_name scale_description 1 BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974) 2 DIGM050101 Composition AA composition (surface) Hydrostatic pressure Hydrostatic Pre...i Giulio, 2005) 3 FUKS010103 Composition AA composition (surface) Proteins of mesophiles (EXT) Surface composi...ishikawa, 2001) 4 GRAR740101 Composition Unclassified (Composition) Substitution Frequency Composition (Grantham, 1974) 5 CHOP780207 Conformation Coil Non helical reg...on (C-terminal) Normalized freq...-Fasman, 1978b) 6 ROBB760107 Conformation Coil (C-term) Coil (C-terminal) Information mea...n-Suzuki, 1976) Two optional filtering steps are provided by the
just_aaindexandunclassified_outparameters:n_all_scales = len(aa.load_scales().T) n_just_aaindex = len(aa.load_scales(just_aaindex=True).T) n_classified = len(aa.load_scales(unclassified_out=True).T) n_both_filter_steps = len(aa.load_scales(just_aaindex=True, unclassified_out=True).T) print(n_all_scales, " scales") print(n_just_aaindex, " from AAindex") print(n_classified, " are classified in AAontology") print(n_both_filter_steps, " fulfill both.")
586 scales 553 from AAindex 532 are classified in AAontology 499 fulfill both.