aaanalysis.load_scales

class aaanalysis.load_scales(name='scales', just_aaindex=False, unclassified_out=False, top60_n=None)[source]

Bases:

Load amino acid scales or their classification (AAontology).

The amino acid scales (name='scales_raw') encompass all scales from AAindex ([Kawashima08]) along with two additional data sources. These scales were min-max normalized (name='scales') and organized in a two-level classification called AAontology (name='scales_cat'), as detailed in [Breimann24b]. The first 20 principal components (PCs) of all compressed scales are provided (name='scales_pc') and were used for an in-depth analysis of redundancy-reduced scale subsets obtained by AAclust ([Breimann24a]). The top 60 scale sets from this analysis are available either collectively (name='top60') or individually (top60_n='1-60'), accompanied by their evaluations in the 'top60_eval' dataset.

Added in version 0.1.0.

Parameters:
  • name (str, default='scales') –

    Name of the loaded dataset:

    • scales_raw: All amino acid scales.

    • scales: Min-max normalized raw scales.

    • scales_cat: Two-level classification (AAontology).

    • scales_pc: First 20 PCs of compressed scales.

    • top60: Selection of 60 best performing scale sets.

    • top60_eval: Evaluation of 60 best performing scale sets.

    Or Number between 1 and 60 to select the i-th top60 dataset.

  • just_aaindex (bool, default=False) – If True, returns only scales from AAindex. Relevant only for ‘scales’, ‘scales_raw’, or ‘scales_cat’.

  • unclassified_out (bool, default=False) – Determines exclusion of unclassified scales. Relevant only for ‘scales’, ‘scales_raw’, or ‘scales_cat’.

  • top60_n (int or str, optional) – Select the n-th scale set from top60 sets and return it for ‘scales’, ‘scales_raw’, or ‘scales_cat’. Allowed strings are AAclust ids (e.g., ‘AAC01’).

Returns:

df – DataFrame containing the chosen dataset, recommended to be named by their name suffix (df_scales, df_cat).

Return type:

pandas.DataFrame

Notes

  • df_cat includes the following columns:

    • ‘scale_id’: ID of scale (from AAindex or following the same naming convention).

    • ‘category’: Category of scale (defined in AAontology).

    • ‘subcategory’: Subcategory of scale (AAontology).

    • ‘scale_name’: Name of scale derived from scale description.

    • ‘scale_description’: Description of scale (derived from AAindex).

  • Scales under the ‘Others’ category are considered unclassified.

See also

Examples

You can load all scales from AAontology (see [Breimann24b]) and their description using the load_scales() function. The min-max normalized numerical scales are provided by default:

import aaanalysis as aa
df_scales = aa.load_scales()
print(len(df_scales.T), "scales exist in AAontology")
aa.display_df(df_scales, n_cols=5)
586 scales exist in AAontology
  ANDN920101 ARGP820101 ARGP820102 ARGP820103 BEGF750101
AA          
A 0.494000 0.230000 0.355000 0.504000 1.000000
C 0.864000 0.404000 0.579000 0.387000 0.000000
D 1.000000 0.174000 0.000000 0.000000 0.404000
E 0.420000 0.177000 0.019000 0.032000 0.713000
F 0.877000 0.762000 0.601000 0.670000 0.574000
G 0.025000 0.026000 0.138000 0.170000 0.309000
H 0.840000 0.230000 0.082000 0.053000 0.574000
I 0.000000 0.838000 0.440000 0.543000 0.713000
K 0.506000 0.434000 0.003000 0.004000 0.574000
L 0.272000 0.577000 1.000000 0.989000 1.000000
M 0.704000 0.445000 0.824000 1.000000 1.000000
N 0.988000 0.023000 0.057000 0.046000 0.309000
P 0.605000 0.736000 0.223000 0.220000 0.000000
Q 0.519000 0.000000 0.211000 0.131000 0.404000
R 0.531000 0.226000 0.047000 0.110000 0.489000
S 0.679000 0.019000 0.289000 0.238000 0.309000
T 0.494000 0.019000 0.248000 0.273000 0.404000
V 0.000000 0.498000 0.324000 0.355000 0.809000
W 0.926000 1.000000 0.226000 0.333000 0.713000
Y 0.802000 0.709000 0.107000 0.191000 0.404000

To retrieve the un-normalized scales, set name='scales_raw':

df_scales = aa.load_scales(name="scales_raw")
aa.display_df(df_scales, n_cols=5, n_rows=5)
  ANDN920101 ARGP820101 ARGP820102 ARGP820103 BEGF750101
AA          
A 4.350000 0.610000 1.180000 1.560000 1.000000
C 4.650000 1.070000 1.890000 1.230000 0.060000
D 4.760000 0.460000 0.050000 0.140000 0.440000
E 4.290000 0.470000 0.110000 0.230000 0.730000
F 4.660000 2.020000 1.960000 2.030000 0.600000

The first 20 PCs of all compressed scales can be retrieved by name='scales_pca':

df_pc = aa.load_scales(name="scales_pc")
aa.display_df(df_pc, show_shape=True)
DataFrame shape: (20, 20)
  PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20
AA                                        
A 0.073190 0.259030 -0.270090 -0.374730 -0.009310 0.159360 -0.090600 0.193800 0.106010 0.165780 0.127210 -0.161270 -0.035360 -0.133400 0.084230 -0.136950 -0.503810 -0.441080 -0.235460 -0.115310
C 0.205620 0.130650 -0.164980 0.302520 0.539970 0.264480 0.488150 0.051980 0.287850 -0.268360 -0.083770 -0.121410 -0.109260 -0.091250 -0.008110 0.108830 -0.001420 0.002640 -0.049160 -0.090410
D -0.264620 0.245490 -0.055430 0.063120 0.125590 0.255040 -0.203960 -0.491080 -0.010360 -0.073790 0.119410 0.009770 -0.070680 0.337390 0.401750 0.068220 0.266940 -0.350000 -0.008320 0.021170
E -0.195640 0.341280 0.127870 -0.314750 -0.018020 0.337190 -0.088000 -0.325790 0.083820 -0.160650 0.150770 -0.083790 0.078590 -0.304560 -0.185970 0.050030 -0.122970 0.526190 0.093960 -0.049910
F 0.348890 0.125190 0.034520 0.118960 -0.091320 0.009140 -0.197540 -0.011910 -0.212210 0.005670 0.030470 0.057490 -0.231740 -0.003790 0.022030 0.772250 -0.266000 -0.022190 0.162590 0.046080
G -0.145180 0.043160 -0.570540 0.098060 0.142450 -0.225710 -0.419920 0.279690 0.152640 -0.258630 0.331190 0.123200 0.159960 -0.040240 -0.082410 0.126100 0.172820 0.143800 -0.021170 -0.013710
H 0.011380 0.269740 0.159750 0.175360 0.140680 0.039330 -0.029280 0.243770 -0.640950 -0.052370 0.149440 -0.454060 0.277910 -0.122510 -0.045280 -0.071300 0.183940 -0.094790 0.008380 -0.069890
I 0.374000 0.121890 -0.126860 -0.065250 -0.155980 -0.140290 0.056690 -0.245830 -0.074940 -0.230000 -0.066920 0.126430 0.278090 0.317410 -0.136170 -0.116940 -0.048480 -0.011570 0.106600 -0.643530
K -0.202260 0.363130 0.183790 -0.111210 -0.171380 -0.214410 0.039890 0.320010 0.256790 -0.247370 -0.467600 -0.097600 0.259350 0.079820 0.361420 0.200120 0.001610 0.031450 0.022120 0.025380
L 0.329240 0.177160 -0.139820 -0.286860 -0.194360 0.007340 -0.111360 0.074030 0.021080 -0.084450 -0.184180 -0.240430 -0.529920 -0.014800 -0.095810 -0.092210 0.533010 0.065930 -0.124290 0.027420
M 0.289160 0.251510 0.091470 -0.045450 0.115730 0.289220 -0.027170 0.324950 -0.160960 0.102270 0.098870 0.568290 0.015620 0.155680 0.312940 -0.242760 0.031370 0.270270 -0.003700 0.121800
N -0.204120 0.231170 -0.097670 0.214020 0.165010 -0.073740 -0.200140 -0.064790 -0.228230 -0.089820 -0.412980 0.046050 -0.217960 0.321250 -0.358480 -0.182220 -0.346170 0.128620 -0.247410 0.172590
P -0.166120 0.037590 -0.276880 0.384140 -0.691610 0.405000 0.245850 0.129850 -0.038030 -0.107960 0.076900 -0.010790 -0.025870 0.046690 -0.030560 -0.027880 -0.038210 0.019500 -0.022540 0.032680
Q -0.119850 0.334480 0.116780 0.002460 -0.024550 0.011320 0.091330 0.022790 0.054670 0.106180 -0.094670 0.476790 0.092050 -0.269600 -0.491950 0.112910 0.272950 -0.434320 0.026510 0.014460
R -0.146320 0.295020 0.283520 0.027240 -0.055660 -0.400980 0.290780 0.098030 0.116960 -0.054230 0.562180 -0.022100 -0.359900 0.268070 -0.071670 -0.043500 -0.086680 0.028750 0.001300 -0.049780
S -0.134520 0.220350 -0.315510 0.084300 0.067610 -0.126640 0.060490 -0.000750 -0.029870 0.390700 -0.158630 -0.068780 -0.175300 -0.085460 0.095610 -0.162100 -0.032190 0.065770 0.729960 -0.085650
T -0.024920 0.194840 -0.219810 0.062620 -0.014410 -0.171330 0.220850 -0.164790 -0.032160 0.593290 -0.017750 -0.030370 0.163560 0.008680 0.093110 0.238680 0.145260 0.287260 -0.489980 -0.115160
V 0.310250 0.155950 -0.230960 -0.132630 -0.074050 -0.208070 0.242330 -0.272770 -0.034070 -0.123100 0.093210 -0.064920 0.304670 0.058230 -0.044470 -0.078980 -0.046460 -0.060360 0.107190 0.689670
W 0.287410 0.142550 0.249530 0.355490 -0.053760 0.136800 -0.371840 0.031690 0.498030 0.300260 0.064360 -0.257830 0.200040 0.189230 -0.198120 -0.114360 0.021200 0.008150 0.065530 0.076070
Y 0.160400 0.150020 0.081790 0.400500 -0.128640 -0.293600 -0.164880 -0.258810 0.019610 -0.154980 -0.015850 0.121660 -0.130260 -0.567950 0.309210 -0.269350 -0.099280 -0.021080 -0.166870 -0.049600

The AAontolgy two-level classification can be retrieved by name='scales_cat'

df_cat = aa.load_scales(name="scales_cat")
aa.display_df(df_cat, n_rows=50, show_shape=True)
DataFrame shape: (586, 5)
  scale_id category subcategory scale_name scale_description
1 LINS030110 ASA/Volume Accessible surface area (ASA) ASA (folded coil/turn) Total median ac...s et al., 2003)
2 LINS030113 ASA/Volume Accessible surface area (ASA) ASA (folded coil/turn) % total accessi...s et al., 2003)
3 JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978)
4 JANJ780103 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Percentage of e...n et al., 1978)
5 LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003)
6 LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003)
7 CHOC760102 ASA/Volume Accessible surface area (ASA) ASA (folded proteins) Residue accessi...(Chothia, 1976)
8 LINS030116 ASA/Volume Accessible surface area (ASA) ASA (folded β-strand) Total median ac...s et al., 2003)
9 LINS030119 ASA/Volume Accessible surface area (ASA) ASA (folded β-strand) % total accessi...s et al., 2003)
10 LINS030103 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA Hydrophilic acc...s et al., 2003)
11 LINS030112 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...lded coil/turn) Hydrophilic med...s et al., 2003)
12 LINS030115 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...lded coil/turn) % Hydrophilic a...s et al., 2003)
13 LINS030106 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) Hydrophilic med...s et al., 2003)
14 LINS030109 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded proteins) % Hydrophilic a...s et al., 2003)
15 LINS030118 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded β-strand) Hydrophilic med...s et al., 2003)
16 LINS030121 ASA/Volume Accessible surface area (ASA) Hydrophilic ASA...olded β-strand) % Hydrophilic a...s et al., 2003)
17 LINS030114 ASA/Volume Accessible surface area (ASA) Hydrophobic ASA...lded coil/turn) % Hydrophobic a...s et al., 2003)
18 LINS030108 ASA/Volume Accessible surface area (ASA) Hydrophobic ASA...folded protein) % Hydrophobic a...s et al., 2003)
19 LINS030120 ASA/Volume Accessible surface area (ASA) Hydrophobic ASA...olded β-strand) % Hydrophobic a...s et al., 2003)
20 GUYH850104 ASA/Volume Accessible surface area (ASA) Partition energy Apparent partit...dex (Guy, 1985)
21 GUYH850105 ASA/Volume Accessible surface area (ASA) Partition energy Apparent partit...dex (Guy, 1985)
22 RACS770103 ASA/Volume Accessible surface area (ASA) Side chain orientation Side chain orie...Scheraga, 1977)
23 VHEG790101 ASA/Volume Accessible surface area (ASA) TFE to lipophilic phase Transfer free e...Blomberg, 1979)
24 BIOV880101 ASA/Volume Buried Buriability Information val...u et al., 1988)
25 BIOV880102 ASA/Volume Buried Buriability Information val...u et al., 1988)
26 WERD780101 ASA/Volume Buried Buriability Propensity to b...Scheraga, 1978)
27 ZHOH040103 ASA/Volume Buried Buriability Buriability (Zhou-Zhou, 2004)
28 ARGP820103 ASA/Volume Buried Buried Membrane-buried...s et al., 1982)
29 CHOC760103 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976)
30 CHOC760104 ASA/Volume Buried Buried Proportion of r...(Chothia, 1976)
31 JANJ780102 ASA/Volume Buried Buried Percentage of b...n et al., 1978)
32 JANJ790101 ASA/Volume Buried Buried Ratio of buried...s (Janin, 1979)
33 OLSK800101 ASA/Volume Buried Buried Average interna...s (Olsen, 1980)
34 NISK800101 ASA/Volume Buried Interactivity 8 A contact num...kawa-Ooi, 1980)
35 WARP780101 ASA/Volume Buried Interactivity Average interac...e-Morgan, 1978)
36 LINS030111 ASA/Volume Hydrophobic ASA Hydrophobic ASA...lded coil/turn) Hydrophobic med...s et al., 2003)
37 LINS030105 ASA/Volume Hydrophobic ASA Hydrophobic ASA...olded proteins) Hydrophobic med...s et al., 2003)
38 LINS030117 ASA/Volume Hydrophobic ASA Hydrophobic ASA...olded β-strand) Hydrophobic med...s et al., 2003)
39 PONP800107 ASA/Volume Partial specific volume Accessibility reduction ratio Accessibility r...y et al., 1980)
40 ZIMJ680102 ASA/Volume Partial specific volume Bulkiness Bulkiness (Zimm...n et al., 1968)
41 LINS030102 ASA/Volume Partial specific volume Hydrophobic ASA Hydrophobic acc...s et al., 2003)
42 BASU050101 ASA/Volume Partial specific volume Interactivity Interactivity s...a et al., 2005)
43 BASU050103 ASA/Volume Partial specific volume Interactivity Interactivity s...a et al., 2005)
44 NISK860101 ASA/Volume Partial specific volume Interactivity 14 A contact nu...kawa-Ooi, 1986)
45 BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974)
46 COHE430101 ASA/Volume Partial specific volume Partial specific volume Partial specifi...n-Edsall, 1943)
47 PLIV810101 ASA/Volume Partial specific volume Partition coefficient Partition Coeff...a et al., 1981)
48 CHOC760101 ASA/Volume Volume Accessible surface area (ASA) Residue accessi...(Chothia, 1976)
49 LINS030101 ASA/Volume Volume Accessible surface area (ASA) Total accessibl...s et al., 2003)
50 RADA880106 ASA/Volume Volume Accessible surface area (ASA) Accessible surf...olfenden, 1988)

Additionally, we tested a plethora of scale sets (see [Breimann24a]) across 12 different protein benchmarking datasets—six at the sequence level and six at the amino acid level, each with three amino acid windows. This resulted in 24 evaluations, for which we identified the top 60 scale sets, referred to as ‘top60’. These can be accessed by setting name='top60':

df_top60 = aa.load_scales(name="top60")
aa.display_df(df_top60, n_cols=5, show_shape=True)
DataFrame shape: (60, 586)
  ANDN920101 ARGP820101 ARGP820102 ARGP820103 BEGF750101
top60_id          
AAC01 0 0 0 1 0
AAC02 1 0 0 1 0
AAC03 1 0 0 1 0
AAC04 1 0 0 1 0
AAC05 1 0 0 1 0
AAC06 1 0 0 1 0
AAC07 0 0 0 1 0
AAC08 1 0 0 1 0
AAC09 1 0 0 1 0
AAC10 1 0 0 1 0
AAC11 1 0 0 1 0
AAC12 1 0 0 1 0
AAC13 1 0 0 1 0
AAC14 0 0 0 1 0
AAC15 1 1 0 1 0
AAC16 1 0 1 0 0
AAC17 0 0 0 0 0
AAC18 1 0 1 0 0
AAC19 1 1 0 1 1
AAC20 0 0 0 0 0
AAC21 1 0 0 1 0
AAC22 1 1 0 0 0
AAC23 1 0 0 0 0
AAC24 1 1 0 1 1
AAC25 0 0 0 0 0
AAC26 0 0 0 0 0
AAC27 1 0 1 0 0
AAC28 0 0 0 0 0
AAC29 1 1 1 1 1
AAC30 1 1 1 1 1
AAC31 1 0 0 0 0
AAC32 1 1 0 1 1
AAC33 0 0 0 0 0
AAC34 1 0 0 0 0
AAC35 1 0 0 0 0
AAC36 1 0 0 0 0
AAC37 1 0 0 0 0
AAC38 1 0 0 0 0
AAC39 1 0 0 1 0
AAC40 1 0 1 0 0
AAC41 0 0 0 0 0
AAC42 1 1 0 0 0
AAC43 1 0 0 0 0
AAC44 1 0 0 1 0
AAC45 1 0 0 0 0
AAC46 0 0 0 0 0
AAC47 0 0 0 0 0
AAC48 0 0 0 0 0
AAC49 0 0 0 1 0
AAC50 0 1 0 0 0
AAC51 0 0 0 0 0
AAC52 0 0 0 0 0
AAC53 0 0 0 0 0
AAC54 0 0 0 0 0
AAC55 0 0 0 0 0
AAC56 0 0 0 0 0
AAC57 0 0 0 0 0
AAC58 0 0 0 0 0
AAC59 0 0 0 0 0
AAC60 0 0 0 0 0

To obtain their evaluation, set name='top60_eval':

df_top60_eval = aa.load_scales(name="top60_eval")
aa.display_df(df_top60_eval, show_shape=True)
DataFrame shape: (60, 25)
  n_scales SEQ_AMYLO SEQ_CAPSID SEQ_DISULFIDE SEQ_LOCATION SEQ_SOLUBLE SEQ_TAIL AA5_CASPASE3 AA5_FURIN AA5_LDR AA5_MMP2 AA5_RNABIND AA5_SA AA9_CASPASE3 AA9_FURIN AA9_LDR AA9_MMP2 AA9_RNABIND AA9_SA AA13_CASPASE3 AA13_FURIN AA13_LDR AA13_MMP2 AA13_RNABIND AA13_SA
top60_id                                                  
AAC01 183 0.761000 0.827000 0.732000 0.746000 0.646000 0.884000 0.862000 0.901000 0.612000 0.680000 0.652000 0.642000 0.816000 0.916000 0.644000 0.703000 0.659000 0.664000 0.790000 0.918000 0.694000 0.681000 0.652000 0.615000
AAC02 170 0.747000 0.830000 0.733000 0.742000 0.653000 0.886000 0.855000 0.907000 0.608000 0.688000 0.660000 0.640000 0.819000 0.915000 0.642000 0.706000 0.657000 0.671000 0.792000 0.916000 0.690000 0.676000 0.656000 0.608000
AAC03 137 0.741000 0.829000 0.734000 0.746000 0.648000 0.884000 0.857000 0.904000 0.601000 0.685000 0.661000 0.640000 0.818000 0.917000 0.636000 0.710000 0.659000 0.670000 0.791000 0.914000 0.695000 0.684000 0.656000 0.613000
AAC04 144 0.747000 0.828000 0.731000 0.747000 0.654000 0.885000 0.859000 0.906000 0.605000 0.686000 0.657000 0.639000 0.822000 0.913000 0.640000 0.714000 0.654000 0.664000 0.790000 0.915000 0.689000 0.680000 0.656000 0.610000
AAC05 138 0.739000 0.830000 0.735000 0.752000 0.646000 0.888000 0.859000 0.906000 0.601000 0.684000 0.655000 0.638000 0.823000 0.916000 0.640000 0.713000 0.658000 0.671000 0.790000 0.918000 0.689000 0.682000 0.649000 0.607000
AAC06 139 0.743000 0.827000 0.736000 0.746000 0.652000 0.883000 0.857000 0.906000 0.608000 0.684000 0.657000 0.640000 0.821000 0.914000 0.642000 0.709000 0.659000 0.665000 0.789000 0.915000 0.691000 0.680000 0.653000 0.611000
AAC07 121 0.742000 0.833000 0.736000 0.747000 0.650000 0.882000 0.858000 0.901000 0.606000 0.688000 0.655000 0.638000 0.820000 0.915000 0.638000 0.711000 0.661000 0.671000 0.789000 0.914000 0.689000 0.682000 0.655000 0.606000
AAC08 142 0.743000 0.831000 0.733000 0.746000 0.650000 0.884000 0.858000 0.903000 0.603000 0.687000 0.657000 0.640000 0.819000 0.916000 0.640000 0.710000 0.658000 0.669000 0.787000 0.916000 0.689000 0.681000 0.654000 0.608000
AAC09 263 0.753000 0.826000 0.736000 0.747000 0.647000 0.882000 0.858000 0.905000 0.608000 0.684000 0.655000 0.648000 0.820000 0.918000 0.642000 0.703000 0.653000 0.664000 0.787000 0.915000 0.690000 0.679000 0.647000 0.614000
AAC10 152 0.750000 0.828000 0.734000 0.748000 0.646000 0.886000 0.860000 0.908000 0.602000 0.684000 0.656000 0.645000 0.819000 0.913000 0.632000 0.711000 0.655000 0.670000 0.787000 0.913000 0.689000 0.676000 0.653000 0.610000
AAC11 150 0.749000 0.832000 0.732000 0.751000 0.647000 0.883000 0.860000 0.904000 0.605000 0.695000 0.650000 0.637000 0.815000 0.919000 0.638000 0.707000 0.654000 0.665000 0.781000 0.913000 0.689000 0.683000 0.652000 0.610000
AAC12 164 0.750000 0.828000 0.739000 0.746000 0.646000 0.889000 0.859000 0.904000 0.608000 0.688000 0.649000 0.637000 0.818000 0.913000 0.638000 0.708000 0.658000 0.666000 0.785000 0.917000 0.688000 0.676000 0.648000 0.613000
AAC13 125 0.744000 0.831000 0.738000 0.749000 0.650000 0.882000 0.856000 0.903000 0.600000 0.682000 0.653000 0.632000 0.818000 0.916000 0.634000 0.716000 0.657000 0.670000 0.790000 0.915000 0.688000 0.680000 0.654000 0.612000
AAC14 108 0.743000 0.831000 0.741000 0.744000 0.645000 0.884000 0.860000 0.902000 0.610000 0.673000 0.655000 0.634000 0.820000 0.914000 0.645000 0.710000 0.654000 0.666000 0.794000 0.914000 0.689000 0.678000 0.655000 0.604000
AAC15 183 0.753000 0.827000 0.732000 0.744000 0.646000 0.883000 0.857000 0.903000 0.617000 0.684000 0.659000 0.637000 0.821000 0.914000 0.632000 0.708000 0.654000 0.660000 0.785000 0.915000 0.687000 0.682000 0.649000 0.611000
AAC16 255 0.755000 0.830000 0.732000 0.741000 0.645000 0.884000 0.858000 0.902000 0.611000 0.684000 0.649000 0.648000 0.821000 0.917000 0.635000 0.702000 0.656000 0.666000 0.789000 0.915000 0.682000 0.681000 0.644000 0.611000
AAC17 101 0.740000 0.835000 0.736000 0.744000 0.644000 0.884000 0.858000 0.904000 0.606000 0.680000 0.646000 0.631000 0.817000 0.916000 0.646000 0.705000 0.651000 0.666000 0.793000 0.915000 0.691000 0.686000 0.654000 0.609000
AAC18 154 0.757000 0.826000 0.730000 0.742000 0.643000 0.881000 0.856000 0.899000 0.602000 0.684000 0.657000 0.635000 0.825000 0.922000 0.636000 0.705000 0.662000 0.658000 0.792000 0.912000 0.685000 0.679000 0.654000 0.612000
AAC19 406 0.764000 0.826000 0.730000 0.745000 0.636000 0.887000 0.860000 0.901000 0.605000 0.683000 0.649000 0.635000 0.819000 0.920000 0.633000 0.707000 0.651000 0.669000 0.789000 0.909000 0.687000 0.679000 0.654000 0.610000
AAC20 90 0.740000 0.833000 0.737000 0.742000 0.646000 0.887000 0.857000 0.901000 0.608000 0.677000 0.654000 0.632000 0.819000 0.915000 0.652000 0.702000 0.656000 0.669000 0.789000 0.908000 0.695000 0.675000 0.644000 0.610000
AAC21 202 0.757000 0.828000 0.730000 0.742000 0.637000 0.883000 0.853000 0.909000 0.607000 0.679000 0.657000 0.633000 0.820000 0.918000 0.640000 0.710000 0.648000 0.657000 0.789000 0.916000 0.688000 0.682000 0.651000 0.612000
AAC22 259 0.757000 0.823000 0.733000 0.739000 0.642000 0.880000 0.859000 0.903000 0.599000 0.682000 0.650000 0.638000 0.826000 0.917000 0.635000 0.700000 0.655000 0.665000 0.788000 0.917000 0.686000 0.682000 0.655000 0.611000
AAC23 219 0.756000 0.826000 0.734000 0.745000 0.644000 0.880000 0.858000 0.899000 0.601000 0.693000 0.650000 0.639000 0.823000 0.917000 0.629000 0.700000 0.655000 0.660000 0.783000 0.913000 0.689000 0.682000 0.655000 0.610000
AAC24 305 0.756000 0.824000 0.740000 0.741000 0.641000 0.880000 0.859000 0.903000 0.601000 0.685000 0.652000 0.635000 0.818000 0.919000 0.634000 0.705000 0.655000 0.663000 0.788000 0.914000 0.683000 0.677000 0.659000 0.607000
AAC25 96 0.764000 0.831000 0.732000 0.744000 0.639000 0.879000 0.862000 0.905000 0.603000 0.678000 0.659000 0.633000 0.816000 0.917000 0.636000 0.707000 0.646000 0.663000 0.790000 0.916000 0.678000 0.676000 0.656000 0.605000
AAC26 96 0.741000 0.828000 0.734000 0.745000 0.642000 0.887000 0.855000 0.904000 0.602000 0.680000 0.655000 0.634000 0.816000 0.914000 0.639000 0.705000 0.650000 0.671000 0.792000 0.914000 0.698000 0.681000 0.646000 0.601000
AAC27 127 0.753000 0.832000 0.731000 0.744000 0.640000 0.880000 0.854000 0.901000 0.607000 0.680000 0.658000 0.638000 0.820000 0.912000 0.638000 0.700000 0.643000 0.663000 0.793000 0.914000 0.683000 0.681000 0.658000 0.611000
AAC28 58 0.737000 0.821000 0.736000 0.745000 0.654000 0.883000 0.858000 0.898000 0.600000 0.675000 0.652000 0.641000 0.822000 0.910000 0.642000 0.703000 0.654000 0.670000 0.789000 0.914000 0.691000 0.679000 0.655000 0.605000
AAC29 514 0.763000 0.824000 0.731000 0.746000 0.636000 0.880000 0.857000 0.900000 0.602000 0.675000 0.646000 0.635000 0.827000 0.914000 0.632000 0.705000 0.662000 0.666000 0.796000 0.908000 0.692000 0.679000 0.647000 0.610000
AAC30 573 0.763000 0.821000 0.732000 0.746000 0.644000 0.879000 0.854000 0.898000 0.606000 0.680000 0.655000 0.632000 0.818000 0.915000 0.633000 0.701000 0.659000 0.656000 0.793000 0.911000 0.682000 0.675000 0.664000 0.616000
AAC31 92 0.757000 0.821000 0.727000 0.747000 0.631000 0.879000 0.858000 0.900000 0.601000 0.685000 0.653000 0.641000 0.819000 0.918000 0.644000 0.711000 0.654000 0.662000 0.784000 0.912000 0.686000 0.678000 0.653000 0.611000
AAC32 431 0.761000 0.827000 0.736000 0.745000 0.637000 0.880000 0.857000 0.898000 0.602000 0.675000 0.651000 0.638000 0.818000 0.917000 0.634000 0.701000 0.670000 0.658000 0.792000 0.911000 0.678000 0.682000 0.650000 0.613000
AAC33 155 0.762000 0.817000 0.732000 0.739000 0.645000 0.877000 0.860000 0.902000 0.606000 0.678000 0.654000 0.631000 0.822000 0.916000 0.635000 0.700000 0.651000 0.661000 0.785000 0.922000 0.688000 0.683000 0.655000 0.608000
AAC34 49 0.755000 0.823000 0.729000 0.741000 0.640000 0.884000 0.859000 0.900000 0.602000 0.682000 0.667000 0.624000 0.818000 0.921000 0.634000 0.707000 0.645000 0.664000 0.787000 0.915000 0.688000 0.685000 0.649000 0.608000
AAC35 91 0.758000 0.822000 0.727000 0.741000 0.639000 0.875000 0.856000 0.904000 0.605000 0.679000 0.656000 0.632000 0.821000 0.912000 0.632000 0.702000 0.658000 0.658000 0.791000 0.914000 0.683000 0.678000 0.663000 0.618000
AAC36 103 0.759000 0.824000 0.734000 0.742000 0.642000 0.874000 0.864000 0.902000 0.600000 0.680000 0.657000 0.630000 0.817000 0.918000 0.634000 0.707000 0.651000 0.664000 0.787000 0.915000 0.688000 0.677000 0.646000 0.609000
AAC37 69 0.748000 0.823000 0.732000 0.738000 0.633000 0.882000 0.858000 0.899000 0.596000 0.684000 0.653000 0.635000 0.816000 0.919000 0.647000 0.706000 0.647000 0.655000 0.789000 0.916000 0.693000 0.682000 0.646000 0.606000
AAC38 46 0.766000 0.809000 0.731000 0.744000 0.620000 0.857000 0.857000 0.898000 0.609000 0.675000 0.650000 0.642000 0.815000 0.916000 0.650000 0.701000 0.661000 0.664000 0.783000 0.921000 0.685000 0.679000 0.652000 0.618000
AAC39 114 0.755000 0.823000 0.731000 0.744000 0.653000 0.874000 0.856000 0.898000 0.600000 0.682000 0.655000 0.636000 0.820000 0.913000 0.634000 0.698000 0.643000 0.660000 0.786000 0.913000 0.687000 0.676000 0.650000 0.615000
AAC40 50 0.756000 0.819000 0.738000 0.742000 0.629000 0.870000 0.851000 0.898000 0.602000 0.685000 0.668000 0.643000 0.814000 0.913000 0.640000 0.706000 0.655000 0.659000 0.775000 0.910000 0.682000 0.680000 0.652000 0.611000
AAC41 51 0.751000 0.822000 0.734000 0.740000 0.637000 0.882000 0.854000 0.896000 0.600000 0.684000 0.655000 0.644000 0.805000 0.914000 0.636000 0.695000 0.653000 0.661000 0.779000 0.912000 0.685000 0.691000 0.653000 0.607000
AAC42 68 0.761000 0.811000 0.731000 0.742000 0.635000 0.868000 0.861000 0.895000 0.604000 0.676000 0.648000 0.633000 0.816000 0.916000 0.636000 0.705000 0.657000 0.662000 0.799000 0.916000 0.681000 0.674000 0.646000 0.612000
AAC43 55 0.767000 0.815000 0.731000 0.740000 0.631000 0.871000 0.848000 0.899000 0.603000 0.681000 0.649000 0.639000 0.807000 0.913000 0.632000 0.701000 0.655000 0.661000 0.782000 0.914000 0.681000 0.676000 0.646000 0.625000
AAC44 65 0.758000 0.825000 0.734000 0.741000 0.621000 0.875000 0.855000 0.902000 0.588000 0.678000 0.657000 0.633000 0.810000 0.913000 0.635000 0.696000 0.658000 0.660000 0.780000 0.923000 0.682000 0.677000 0.653000 0.611000
AAC45 56 0.760000 0.823000 0.733000 0.741000 0.629000 0.873000 0.844000 0.893000 0.610000 0.681000 0.654000 0.635000 0.809000 0.907000 0.632000 0.702000 0.646000 0.672000 0.774000 0.914000 0.684000 0.681000 0.648000 0.612000
AAC46 54 0.759000 0.812000 0.733000 0.739000 0.626000 0.872000 0.843000 0.896000 0.599000 0.679000 0.644000 0.637000 0.810000 0.914000 0.631000 0.696000 0.667000 0.664000 0.779000 0.912000 0.681000 0.676000 0.650000 0.623000
AAC47 31 0.748000 0.817000 0.717000 0.738000 0.624000 0.882000 0.855000 0.896000 0.602000 0.675000 0.648000 0.627000 0.819000 0.910000 0.636000 0.694000 0.652000 0.653000 0.800000 0.916000 0.684000 0.672000 0.649000 0.605000
AAC48 30 0.771000 0.803000 0.723000 0.737000 0.626000 0.857000 0.848000 0.891000 0.603000 0.673000 0.657000 0.631000 0.813000 0.912000 0.638000 0.695000 0.662000 0.658000 0.776000 0.910000 0.674000 0.679000 0.648000 0.619000
AAC49 27 0.758000 0.807000 0.731000 0.751000 0.630000 0.850000 0.853000 0.892000 0.590000 0.675000 0.651000 0.633000 0.815000 0.914000 0.631000 0.696000 0.647000 0.660000 0.784000 0.904000 0.679000 0.677000 0.646000 0.615000
AAC50 32 0.762000 0.802000 0.728000 0.734000 0.620000 0.855000 0.831000 0.903000 0.607000 0.671000 0.645000 0.640000 0.800000 0.910000 0.641000 0.692000 0.650000 0.671000 0.773000 0.908000 0.687000 0.678000 0.649000 0.623000
AAC51 20 0.768000 0.802000 0.727000 0.737000 0.610000 0.865000 0.836000 0.894000 0.605000 0.668000 0.641000 0.632000 0.802000 0.902000 0.626000 0.697000 0.652000 0.660000 0.767000 0.903000 0.680000 0.691000 0.653000 0.614000
AAC52 14 0.755000 0.802000 0.729000 0.740000 0.613000 0.867000 0.814000 0.886000 0.601000 0.672000 0.644000 0.632000 0.796000 0.901000 0.631000 0.691000 0.650000 0.668000 0.755000 0.901000 0.683000 0.667000 0.650000 0.633000
AAC53 21 0.765000 0.798000 0.729000 0.732000 0.629000 0.862000 0.782000 0.885000 0.614000 0.661000 0.647000 0.638000 0.759000 0.905000 0.630000 0.681000 0.640000 0.658000 0.747000 0.899000 0.674000 0.671000 0.643000 0.616000
AAC54 11 0.743000 0.780000 0.699000 0.714000 0.619000 0.844000 0.840000 0.895000 0.587000 0.669000 0.622000 0.624000 0.806000 0.926000 0.628000 0.671000 0.620000 0.656000 0.770000 0.911000 0.685000 0.651000 0.623000 0.620000
AAC55 8 0.747000 0.769000 0.710000 0.726000 0.580000 0.835000 0.732000 0.832000 0.595000 0.640000 0.590000 0.630000 0.745000 0.866000 0.630000 0.667000 0.618000 0.680000 0.723000 0.843000 0.656000 0.637000 0.604000 0.620000
AAC56 4 0.782000 0.738000 0.687000 0.710000 0.570000 0.768000 0.685000 0.857000 0.596000 0.610000 0.584000 0.636000 0.687000 0.886000 0.625000 0.651000 0.606000 0.668000 0.691000 0.859000 0.670000 0.602000 0.601000 0.617000
AAC57 5 0.768000 0.734000 0.684000 0.713000 0.571000 0.774000 0.663000 0.843000 0.600000 0.613000 0.599000 0.639000 0.689000 0.873000 0.631000 0.633000 0.621000 0.668000 0.677000 0.855000 0.670000 0.619000 0.600000 0.629000
AAC58 6 0.776000 0.775000 0.701000 0.715000 0.594000 0.817000 0.662000 0.800000 0.599000 0.628000 0.546000 0.635000 0.675000 0.825000 0.629000 0.653000 0.568000 0.671000 0.666000 0.803000 0.665000 0.633000 0.556000 0.619000
AAC59 3 0.794000 0.725000 0.680000 0.710000 0.570000 0.802000 0.600000 0.797000 0.600000 0.589000 0.565000 0.628000 0.632000 0.859000 0.619000 0.592000 0.562000 0.670000 0.623000 0.815000 0.649000 0.585000 0.568000 0.615000
AAC60 3 0.789000 0.653000 0.656000 0.675000 0.563000 0.772000 0.636000 0.783000 0.590000 0.626000 0.535000 0.635000 0.670000 0.830000 0.637000 0.639000 0.529000 0.677000 0.670000 0.813000 0.645000 0.613000 0.546000 0.613000

You can for example select the scale set with the best performance for the ‘SEQ_CAPSID’ dataset as follows:

# Sort 'df_top60_eval' by 'SEQ_CAPSID' and get the index of the top row
top_id = df_top60_eval.sort_values(by="SEQ_CAPSID", ascending=False).index[0]
df_scales_top =  aa.load_scales(top60_n=top_id)
# Select a specific set using integers
df_cat_top20 = aa.load_scales(name="scales_cat", top60_n=20)
aa.display_df(df_cat_top20, n_rows=6, show_shape=True)
DataFrame shape: (90, 5)
  scale_id category subcategory scale_name scale_description
1 BULH740102 ASA/Volume Partial specific volume Partial specific volume Apparent partia...l-Breese, 1974)
2 DIGM050101 Composition AA composition (surface) Hydrostatic pressure Hydrostatic Pre...i Giulio, 2005)
3 FUKS010103 Composition AA composition (surface) Proteins of mesophiles (EXT) Surface composi...ishikawa, 2001)
4 GRAR740101 Composition Unclassified (Composition) Substitution Frequency Composition (Grantham, 1974)
5 CHOP780207 Conformation Coil Non helical reg...on (C-terminal) Normalized freq...-Fasman, 1978b)
6 ROBB760107 Conformation Coil (C-term) Coil (C-terminal) Information mea...n-Suzuki, 1976)

Two optional filtering steps are provided by the just_aaindex and unclassified_out parameters:

n_all_scales = len(aa.load_scales().T)
n_just_aaindex = len(aa.load_scales(just_aaindex=True).T)
n_classified = len(aa.load_scales(unclassified_out=True).T)
n_both_filter_steps = len(aa.load_scales(just_aaindex=True, unclassified_out=True).T)
print(n_all_scales, " scales")
print(n_just_aaindex, " from AAindex")
print(n_classified, " are classified in AAontology")
print(n_both_filter_steps, " fulfill both.")
586  scales
553  from AAindex
532  are classified in AAontology
499  fulfill both.