References
AAanalysis Algorithms
Breimann and Frishman (2024a), AAclust: k-optimized clustering for selecting redundancy-reduced sets of amino acid scales, Bioinformatics Advances.
Breimann et al. (2024b), AAontology: An ontology of amino acid scales for interpretable machine learning, Journal of Molecular Biology.
Breimann and Kamp et al. (2025), Charting γ-secretase substrates by explainable AI, Nature Communications.
Sequence Algorithms
Li W., Godzik A. (2006), Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics.
Steinegger M., Söding J. (2017), MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets, Nature Biotechnology.
Machine Learning
Hastie, Tibshirani, and Friedman (2009), The Elements of Statistical Learning, Springer.
Positive-Unlabeled Learning
Bekker and Davis (2020), Learning from positive and unlabeled data: a survey, Machine Learning.
Explainable AI
Lundberg et al. (2020), From local explanations to global understanding with explainable AI for trees, Nature Machine Intelligence.
Datasets and Benchmarks
Cheng et al. (2006), Large-scale prediction of disulphide bridges using kernel methods, two-dimensional recursive neural networks, and weighted graph matching, Proteins: Structure, Function, Bioinformatics.
Kawashima et al. (2008), AAindex: Amino aid index database, progress report 2008 Nucleic Acids Research.
Magnan, Randall, and Baldi (2009), SOLpro: Accurate sequence-based prediction of protein solubility, Bioinformatics.
Galiez et al. (2016), VIRALpro: A tool to identify viral capsid and tail sequences, Bioinformatics.
Song et al. (2018), PROSPERous: High-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy, Bioinformatics.
Shen et al. (2019), Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou’s general PseAAC, Journal of Theoretical Biology.
Tang et al. (2020), IDP-Seq2Seq: Identification of intrinsically disordered regions based on sequence to sequence learning, Bioinformatics.
Teng et al. (2021), ReRF-Pred: Predicting amyloidogenic regions of proteins based on pseudo amino acid composition and tripeptide composition, BMC Bioinformatics.
Yang et al. (2021), Granular multiple kernel learning for identifying RNA-binding protein residues via integrating sequence and structure information, Neural Computation and Applications.