Scale loading

This is a tutorial on the load_scales function for loading of amino acid scales sets, their classification (AAontology), or evaluation (AAclust top60).

Six different datasets can be loading in total by using the name parameter: scales, scales_raw, scales_pc, scales_cat, top60, and top60_eval.

Three sets of numerical amino acid scales - scales_raw: Original amino acid scales sourced from AAindex and two other datasets. - scales: Min-max normalized version of the raw scales. - scales_pc: Scales compressed using principal component analysis (PCA).

Amino acid scales are indicated by a unique id (columns) and assign a numerical value to each canonical amino acid:

import aaanalysis as aa
aa.options["verbose"] = False

# Load different scale sets
df_scales = aa.load_scales()
df_raw = aa.load_scales(name="scales_raw")
df_pc = aa.load_scales(name="scales_pc")
aa.display_df(df=df_scales, show_shape=True, n_cols=8)
DataFrame shape: (20, 586)
  ANDN920101 ARGP820101 ARGP820102 ARGP820103 BEGF750101 BEGF750102 BEGF750103 BHAR880101
AA                
A 0.494000 0.230000 0.355000 0.504000 1.000000 0.512000 0.000000 0.249000
C 0.864000 0.404000 0.579000 0.387000 0.000000 0.233000 0.783000 0.205000
D 1.000000 0.174000 0.000000 0.000000 0.404000 0.233000 1.000000 0.867000
E 0.420000 0.177000 0.019000 0.032000 0.713000 0.000000 0.267000 0.811000
F 0.877000 0.762000 0.601000 0.670000 0.574000 1.000000 0.267000 0.076000
G 0.025000 0.026000 0.138000 0.170000 0.309000 0.233000 1.000000 1.000000
H 0.840000 0.230000 0.082000 0.053000 0.574000 0.651000 0.633000 0.112000
I 0.000000 0.838000 0.440000 0.543000 0.713000 1.000000 0.000000 0.671000
K 0.506000 0.434000 0.003000 0.004000 0.574000 0.000000 0.633000 0.687000
L 0.272000 0.577000 1.000000 0.989000 1.000000 0.651000 0.267000 0.281000
M 0.704000 0.445000 0.824000 1.000000 1.000000 1.000000 0.450000 0.000000
N 0.988000 0.023000 0.057000 0.046000 0.309000 0.000000 1.000000 0.675000
P 0.605000 0.736000 0.223000 0.220000 0.000000 0.000000 1.000000 0.859000
Q 0.519000 0.000000 0.211000 0.131000 0.404000 0.395000 0.450000 0.795000
R 0.531000 0.226000 0.047000 0.110000 0.489000 0.395000 0.783000 0.940000
S 0.679000 0.019000 0.289000 0.238000 0.309000 0.000000 0.783000 0.851000
T 0.494000 0.019000 0.248000 0.273000 0.404000 0.651000 0.633000 0.598000
V 0.000000 0.498000 0.324000 0.355000 0.809000 1.000000 0.000000 0.365000
W 0.926000 1.000000 0.226000 0.333000 0.713000 0.512000 1.000000 0.040000
Y 0.802000 0.709000 0.107000 0.191000 0.404000 0.651000 0.783000 0.502000

AAontology

  • scales_cat provides a two-level classification for all scales, termed AAontology.

The entries in the scale_id column align with the column names of df_scales and df_raw. Additional columns detail further information from AAontology, such as scale category or description.

# Load AAontology
df_cat = aa.load_scales(name="scales_cat")

aa.display_df(df=df_cat, show_shape=True, n_rows=8)
DataFrame shape: (586, 5)
  scale_id category subcategory scale_name scale_description
1 LINS030110 ASA/Volume Accessible surface area (ASA) ASA (folded coil/turn) Total median ac...s et al., 2003)
2 LINS030113 ASA/Volume Accessible surface area (ASA) ASA (folded coil/turn) % total accessi...s et al., 2003)
3 JANJ780101 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Average accessi...n et al., 1978)
4 JANJ780103 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Percentage of e...n et al., 1978)
5 LINS030104 ASA/Volume Accessible surface area (ASA) ASA (folded protein) Total median ac...s et al., 2003)
6 LINS030107 ASA/Volume Accessible surface area (ASA) ASA (folded protein) % total accessi...s et al., 2003)
7 CHOC760102 ASA/Volume Accessible surface area (ASA) ASA (folded proteins) Residue accessi...(Chothia, 1976)
8 LINS030116 ASA/Volume Accessible surface area (ASA) ASA (folded β-strand) Total median ac...s et al., 2003)

AAclustTop60 and filtering

AAclustTop60 The remaining two datasets stem from an in-depth analysis of redundancy-reduced subsets of scales using AAclust.

  • top60 comprises the 60 best performing scale sets, benchmarked on our protein datasets available by the aa.load_dataset function.

These have a unique AAclust id (top60_id index, ‘ACC’ for AAclust) and the presence (1) or absence (0) of scales is indicated.

df_top60 = aa.load_scales(name="top60")
aa.display_df(df=df_top60, show_shape=True, n_rows=8)
DataFrame shape: (60, 586)
  ANDN920101 ARGP820101 ARGP820102 ARGP820103 BEGF750101 BEGF750102 BEGF750103 BHAR880101 BIGC670101 BIOV880101 BIOV880102 BROC820101 BROC820102 BULH740101 BULH740102 BUNA790101 BUNA790102 BUNA790103 BURA740101 BURA740102 CHAM810101 CHAM820101 CHAM820102 CHAM830101 CHAM830102 CHAM830103 CHAM830104 CHAM830105 CHAM830106 CHAM830107 CHAM830108 CHOC750101 CHOC760101 CHOC760102 CHOC760103 CHOC760104 CHOP780101 CHOP780201 CHOP780202 CHOP780203 CHOP780204 CHOP780205 CHOP780206 CHOP780207 CHOP780208 CHOP780209 CHOP780210 CHOP780211 CHOP780212 CHOP780213 CHOP780214 CHOP780215 CHOP780216 CIDH920101 CIDH920102 CIDH920103 CIDH920104 CIDH920105 COHE430101 CRAJ730101 CRAJ730102 CRAJ730103 DAWD720101 DAYM780101 DAYM780201 DESM900101 DESM900102 EISD840101 EISD860101 EISD860102 EISD860103 FASG760101 FASG760102 FASG760103 FASG760104 FASG760105 FAUJ830101 FAUJ880101 FAUJ880102 FAUJ880103 FAUJ880104 FAUJ880105 FAUJ880106 FAUJ880107 FAUJ880108 FAUJ880109 FAUJ880110 FAUJ880111 FAUJ880112 FAUJ880113 FINA770101 FINA910101 FINA910102 FINA910103 FINA910104 GARJ730101 GEIM800101 GEIM800102 GEIM800103 GEIM800104 GEIM800105 GEIM800106 GEIM800107 GEIM800108 GEIM800109 GEIM800110 GEIM800111 GOLD730101 GOLD730102 GRAR740101 GRAR740102 GRAR740103 GUYH850101 HOPA770101 HOPT810101 HUTJ700101 HUTJ700102 HUTJ700103 ISOY800101 ISOY800102 ISOY800103 ISOY800104 ISOY800105 ISOY800106 ISOY800107 ISOY800108 JANJ780101 JANJ780102 JANJ780103 JANJ790101 JANJ790102 JOND750101 JOND750102 JOND920101 JOND920102 JUKT750101 JUNJ780101 KANM800101 KANM800102 KANM800103 KANM800104 KARP850101 KARP850102 KARP850103 KHAG800101 KLEP840101 KRIW710101 KRIW790101 KRIW790102 KRIW790103 KYTJ820101 LAWE840101 LEVM760101 LEVM760102 LEVM760103 LEVM760104 LEVM760105 LEVM760106 LEVM760107 LEVM780101 LEVM780102 LEVM780103 LEVM780104 LEVM780105 LEVM780106 LEWP710101 LIFS790101 LIFS790102 LIFS790103 MANP780101 MAXF760101 MAXF760102 MAXF760103 MAXF760104 MAXF760105 MAXF760106 MCMT640101 MEEJ800101 MEEJ800102 MEEJ810101 MEEJ810102 MEIH800101 MEIH800102 MEIH800103 MIYS850101 NAGK730101 NAGK730102 NAGK730103 NAKH900101 NAKH900102 NAKH900103 NAKH900104 NAKH900105 NAKH900106 NAKH900107 NAKH900108 NAKH900109 NAKH900110 NAKH900111 NAKH900112 NAKH900113 NAKH920101 NAKH920102 NAKH920103 NAKH920104 NAKH920105 NAKH920106 NAKH920107 NAKH920108 NISK800101 NISK860101 NOZY710101 OOBM770101 OOBM770102 OOBM770103 OOBM770104 OOBM770105 OOBM850101 OOBM850102 OOBM850103 OOBM850104 OOBM850105 PALJ810101 PALJ810102 PALJ810103 PALJ810104 PALJ810105 PALJ810106 PALJ810107 PALJ810108 PALJ810109 PALJ810110 PALJ810111 PALJ810112 PALJ810113 PALJ810114 PALJ810115 PALJ810116 PARJ860101 PLIV810101 PONP800101 PONP800102 PONP800103 PONP800104 PONP800105 PONP800106 PONP800107 PONP800108 PRAM820101 PRAM820102 PRAM820103 PRAM900101 PRAM900102 PRAM900103 PRAM900104 PTIO830101 PTIO830102 QIAN880101 QIAN880102 QIAN880103 QIAN880104 QIAN880105 QIAN880106 QIAN880107 QIAN880108 QIAN880109 QIAN880110 QIAN880111 QIAN880112 QIAN880113 QIAN880114 QIAN880115 QIAN880116 QIAN880117 QIAN880118 QIAN880119 QIAN880120 QIAN880121 QIAN880122 QIAN880123 QIAN880124 QIAN880125 QIAN880126 QIAN880127 QIAN880128 QIAN880129 QIAN880130 QIAN880131 QIAN880132 QIAN880133 QIAN880134 QIAN880135 QIAN880136 QIAN880137 QIAN880138 QIAN880139 RACS770101 RACS770102 RACS770103 RACS820101 RACS820102 RACS820103 RACS820104 RACS820105 RACS820106 RACS820107 RACS820108 RACS820109 RACS820110 RACS820111 RACS820112 RACS820113 RACS820114 RADA880101 RADA880102 RADA880103 RADA880104 RADA880105 RADA880106 RADA880107 RADA880108 RICJ880101 RICJ880102 RICJ880103 RICJ880104 RICJ880105 RICJ880106 RICJ880107 RICJ880108 RICJ880109 RICJ880110 RICJ880111 RICJ880112 RICJ880113 RICJ880114 RICJ880115 RICJ880116 RICJ880117 ROBB760101 ROBB760102 ROBB760103 ROBB760104 ROBB760105 ROBB760106 ROBB760107 ROBB760108 ROBB760109 ROBB760110 ROBB760111 ROBB760112 ROBB760113 ROBB790101 ROSG850101 ROSG850102 ROSM880101 ROSM880102 ROSM880103 SIMZ760101 SNEP660101 SNEP660102 SNEP660103 SNEP660104 SUEM840101 SUEM840102 SWER830101 TANS770101 TANS770102 TANS770103 TANS770104 TANS770105 TANS770106 TANS770107 TANS770108 TANS770109 TANS770110 VASM830101 VASM830102 VASM830103 VELV850101 VENT840101 VHEG790101 WARP780101 WEBA780101 WERD780101 WERD780102 WERD780103 WERD780104 WOEC730101 WOLR810101 WOLS870101 WOLS870102 WOLS870103 YUTK870101 YUTK870102 YUTK870103 YUTK870104 ZASB820101 ZIMJ680101 ZIMJ680102 ZIMJ680103 ZIMJ680104 ZIMJ680105 AURR980101 AURR980102 AURR980103 AURR980104 AURR980105 AURR980106 AURR980107 AURR980108 AURR980109 AURR980110 AURR980111 AURR980112 AURR980113 AURR980114 AURR980115 AURR980116 AURR980117 AURR980118 AURR980119 AURR980120 ONEK900101 ONEK900102 VINM940101 VINM940102 VINM940103 VINM940104 MUNV940101 MUNV940102 MUNV940103 MUNV940104 MUNV940105 WIMW960101 KIMC930101 MONM990101 BLAM930101 PARS000101 PARS000102 KUMS000101 KUMS000102 KUMS000103 KUMS000104 TAKK010101 FODM020101 NADH010101 NADH010102 NADH010103 NADH010104 NADH010105 NADH010106 NADH010107 MONM990201 KOEP990101 KOEP990102 CEDJ970101 CEDJ970102 CEDJ970103 CEDJ970104 CEDJ970105 FUKS010101 FUKS010102 FUKS010103 FUKS010104 FUKS010105 FUKS010106 FUKS010107 FUKS010108 FUKS010109 FUKS010110 FUKS010111 FUKS010112 MITS020101 TSAJ990101 TSAJ990102 COSI940101 PONP930101 WILM950101 WILM950102 WILM950103 WILM950104 KUHL950101 GUOD860101 JURD980101 BASU050101 BASU050102 BASU050103 SUYM030101 PUNT030101 PUNT030102 GEOR030101 GEOR030102 GEOR030103 GEOR030104 GEOR030105 GEOR030106 GEOR030107 GEOR030108 GEOR030109 ZHOH040101 ZHOH040102 ZHOH040103 BAEK050101 HARY940101 PONJ960101 DIGM050101 WOLR790101 OLSK800101 KIDA850101 GUYH850102 GUYH850104 GUYH850105 JACR890101 COWR900101 BLAS910101 CASG920101 CORJ870101 CORJ870102 CORJ870103 CORJ870104 CORJ870105 CORJ870106 CORJ870107 CORJ870108 MIYS990101 MIYS990102 MIYS990103 MIYS990104 MIYS990105 ENGD860101 FASG890101 KARS160101 KARS160102 KARS160103 KARS160104 KARS160105 KARS160106 KARS160107 KARS160108 KARS160109 KARS160110 KARS160111 KARS160112 KARS160113 KARS160114 KARS160115 KARS160116 KARS160117 KARS160118 KARS160119 KARS160120 KARS160121 KARS160122 LINS030101 LINS030102 LINS030103 LINS030104 LINS030105 LINS030106 LINS030107 LINS030108 LINS030109 LINS030110 LINS030111 LINS030112 LINS030113 LINS030114 LINS030115 LINS030116 LINS030117 LINS030118 LINS030119 LINS030120 LINS030121 KOEH090101 KOEH090102 KOEH090103 KOEH090104 KOEH090105 KOEH090106 KOEH090107 KOEH090108 KOEH090109 KOEH090110 KOEH090111 KOEH090112
top60_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
AAC01 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 1 1 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 1 1 1 0 1 1 0 1 0 0 1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1
AAC02 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 1 0 1 0 0 1 0 1 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
AAC03 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
AAC04 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
AAC05 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
AAC06 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
AAC07 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 0 1 1 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
AAC08 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
  • top60_eval shows the average accuracy for each protein scale subset given by their ids (index) across all tested protein benchmarks (columns):

df_eval = aa.load_scales(name="top60_eval")
df_eval.mean(axis=1)  # Shows the overall average performance used for ranking
aa.display_df(df=df_eval, show_shape=True, n_rows=8)
DataFrame shape: (60, 25)
  n_scales SEQ_AMYLO SEQ_CAPSID SEQ_DISULFIDE SEQ_LOCATION SEQ_SOLUBLE SEQ_TAIL AA5_CASPASE3 AA5_FURIN AA5_LDR AA5_MMP2 AA5_RNABIND AA5_SA AA9_CASPASE3 AA9_FURIN AA9_LDR AA9_MMP2 AA9_RNABIND AA9_SA AA13_CASPASE3 AA13_FURIN AA13_LDR AA13_MMP2 AA13_RNABIND AA13_SA
top60_id                                                  
AAC01 183 0.761000 0.827000 0.732000 0.746000 0.646000 0.884000 0.862000 0.901000 0.612000 0.680000 0.652000 0.642000 0.816000 0.916000 0.644000 0.703000 0.659000 0.664000 0.790000 0.918000 0.694000 0.681000 0.652000 0.615000
AAC02 170 0.747000 0.830000 0.733000 0.742000 0.653000 0.886000 0.855000 0.907000 0.608000 0.688000 0.660000 0.640000 0.819000 0.915000 0.642000 0.706000 0.657000 0.671000 0.792000 0.916000 0.690000 0.676000 0.656000 0.608000
AAC03 137 0.741000 0.829000 0.734000 0.746000 0.648000 0.884000 0.857000 0.904000 0.601000 0.685000 0.661000 0.640000 0.818000 0.917000 0.636000 0.710000 0.659000 0.670000 0.791000 0.914000 0.695000 0.684000 0.656000 0.613000
AAC04 144 0.747000 0.828000 0.731000 0.747000 0.654000 0.885000 0.859000 0.906000 0.605000 0.686000 0.657000 0.639000 0.822000 0.913000 0.640000 0.714000 0.654000 0.664000 0.790000 0.915000 0.689000 0.680000 0.656000 0.610000
AAC05 138 0.739000 0.830000 0.735000 0.752000 0.646000 0.888000 0.859000 0.906000 0.601000 0.684000 0.655000 0.638000 0.823000 0.916000 0.640000 0.713000 0.658000 0.671000 0.790000 0.918000 0.689000 0.682000 0.649000 0.607000
AAC06 139 0.743000 0.827000 0.736000 0.746000 0.652000 0.883000 0.857000 0.906000 0.608000 0.684000 0.657000 0.640000 0.821000 0.914000 0.642000 0.709000 0.659000 0.665000 0.789000 0.915000 0.691000 0.680000 0.653000 0.611000
AAC07 121 0.742000 0.833000 0.736000 0.747000 0.650000 0.882000 0.858000 0.901000 0.606000 0.688000 0.655000 0.638000 0.820000 0.915000 0.638000 0.711000 0.661000 0.671000 0.789000 0.914000 0.689000 0.682000 0.655000 0.606000
AAC08 142 0.743000 0.831000 0.733000 0.746000 0.650000 0.884000 0.858000 0.903000 0.603000 0.687000 0.657000 0.640000 0.819000 0.916000 0.640000 0.710000 0.658000 0.669000 0.787000 0.916000 0.689000 0.681000 0.654000 0.608000

Use the top60_n parameters to select the n-th best scale set, either as scales, scales_raw, or scales_cat

df_cat_1 = aa.load_scales(name="scales_cat", top60_n=1)
df_raw_1 = aa.load_scales(name="scales_raw", top60_n=1)
df_scales_1 = aa.load_scales(top60_n=1)

# Which is the same as
df_top60 = aa.load_scales(name="top60")
selected_scales = df_top60.columns[df_top60.loc["AAC01"] == 1].tolist()
df_aac1 = df_scales[selected_scales]

Filtering of scale sets Two parameters are provided to filter df_scales, df_cat, and df_raw. You can exclude scales from the other two data sources (i.e., scales not contained in AAindex) setting just_aaindex=True, which is disabled by default. AAontology comprises scales that were not subordinated to any subcategory (‘unclassified’ scales), which can be excluded by setting unclassified_out=True:

df_scales = aa.load_scales(just_aaindex=False, unclassified_out=False)
df_raw = aa.load_scales(name="scales_raw", just_aaindex=True, unclassified_out=False)
df_cat = aa.load_scales(name="scales_cat", just_aaindex=False, unclassified_out=True)

# Print the number of filtered scales
print(f"Number of all min-max scales: {len(df_scales.T)}")
print(f"Number of raw scales from AAindex: {len(df_raw.T)}")
print(f"Number of classified scale subcategories from AAontology: {len(df_cat)}")
Number of all min-max scales: 586
Number of raw scales from AAindex: 553
Number of classified scale subcategories from AAontology: 532

Further information are available under the Data Handling API and in the Tables section.