aaanalysis.comp_seq_sim
- class aaanalysis.comp_seq_sim(seq1=None, seq2=None, df_seq=None)[source]
Bases:
Compute pairwise similarity between two or more sequences.
The normalized sequence similarity score between two sequences is computed as a fraction of the alignment score to the length of the longest sequence. The alignment score is obtained using the
Bio.Align.PairwiseAlignerfrom ´BioPython <https://biopython.org/>´ with default settings.Added in version 1.0.0.
- Parameters:
- Returns:
seq_sim – If
seq1andseq2are provided, returns the sequence similarity between both sequences. Ifdf_seqis provided, returns a DataFrame containing pairwise sequence similarity scores.- Return type:
See also
Bio.Align.PairwiseAlignerfor details on the similarity computation.
Warning
This function requires biopython, which is automatically installed via pip install aaanalysis[pro].
Examples
You can compute the pairwise sequence similarity between two sequences using the features
comput_seq_sim()function:import aaanalysis as aa aa.options["verbose"] = False seq1 = "GTGGHWWW" seq2 = "GTGGHFALSE" # Compute similarity between two sequences seq_sim = aa.comp_seq_sim(seq1=seq1, seq2=seq2) print(seq_sim)
50.0To compute the similarity between all pairs of sequences from a DataFrame, you can use the
df_seqparameter:Q14802 Q86UE4 Q969W9 P53801 Q8IUW5 P05067 P14925 P70180 Q03157 Q06481 Q14802 1.000000 11.340200 17.073200 22.777800 18.450200 8.701300 7.991800 12.313400 10.091700 9.043300 Q86UE4 11.340200 1.000000 23.195900 18.213100 23.539500 29.740300 28.381100 34.020600 32.568800 29.882000 Q969W9 17.073200 23.195900 1.000000 24.390200 34.494800 19.480500 18.545100 26.119400 22.935800 20.314500 P53801 22.777800 18.213100 24.390200 1.000000 30.258300 15.194800 13.217200 19.776100 17.584100 15.465300 Q8IUW5 18.450200 23.539500 34.494800 30.258300 1.000000 20.259700 17.930300 25.373100 22.782900 20.314500 P05067 8.701300 29.740300 19.480500 15.194800 20.259700 1.000000 31.864800 29.740300 45.324700 58.701300 P14925 7.991800 28.381100 18.545100 13.217200 17.930300 31.864800 1.000000 28.176200 29.508200 32.274600 P70180 12.313400 34.020600 26.119400 19.776100 25.373100 29.740300 28.176200 1.000000 34.556600 29.357800 Q03157 10.091700 32.568800 22.935800 17.584100 22.782900 45.324700 29.508200 34.556600 1.000000 46.920100 Q06481 9.043300 29.882000 20.314500 15.465300 20.314500 58.701300 32.274600 29.357800 46.920100 1.000000