aaanalysis.AAlogo.get_conservation
- static AAlogo.get_conservation(df_logo_info=None, value_type='mean')[source]
Summarize per-position information content into a single conservation score.
Aggregates the per-position information content from
AALogo.get_df_logo_info()into a single scalar value representing overall sequence conservation.- Parameters:
df_logo_info (pd.Series, shape (n_positions,)) – Per-position information content with index name ‘pos’, as returned by
AAlogo.get_df_logo_info().value_type ({'min', 'mean', 'median', 'max'}, default='mean') –
Aggregation method:
min: Minimum conservation across all positions.mean: Average conservation across all positions.median: Median conservation across all positions.max: Maximum conservation at any single position.
- Returns:
cons_val – Conservation score ranging from 0 (no conservation) to ~4.248 (fully conserved).
- Return type:
Notes
The maximum theoretical information content per position is log2(20) ≈ 4.248 bits, corresponding to a completely conserved amino acid.
Use
value_type='mean'for an overall conservation estimate andvalue_type='max'to identify the most conserved position.
See also
AAlogo.get_df_logo_info(): to compute the per-position information content.
Examples
The
AALogo.get_conservationstatic method aggregates the per-position information content fromget_df_logo_infointo a single float usingmin,mean,median, ormax. It takes apd.Serieswith index name'pos'as input.import warnings warnings.filterwarnings('ignore') import aaanalysis as aa sf = aa.SequenceFeature() df_seq = aa.load_dataset(name="DOM_GSEC", n=100) labels = df_seq["label"].values df_parts = sf.get_df_parts(df_seq=df_seq, list_parts=["jmd_n", "tmd", "jmd_c"]) aalogo = aa.AAlogo() df_logo_info = aalogo.get_df_logo_info(df_parts=df_parts) df_logo_info.head()
pos 0 0.250782 1 0.297012 2 0.358037 3 0.322563 4 0.383782 dtype: float64
A
pd.Seriesreturned byget_df_logo_info, with index name'pos'. The method simply aggregates the values — it is equivalent to callingdf_logo_info.mean()(or min/median/max) directly.cons = aa.AAlogo.get_conservation(df_logo_info=df_logo_info) print(f"Conservation score: {cons:.6f} bits") print(f"df_logo_info.mean(): {df_logo_info.mean():.6f} bits") print(f"Are equal: {cons == df_logo_info.mean()}")
Conservation score: 0.800787 bits df_logo_info.mean(): 0.800787 bits Are equal: True
Controls which aggregation is applied to
df_logo_info:'mean'(default):df_logo_info.mean()— overall average conservation'min':df_logo_info.min()— least conserved position'median':df_logo_info.median()— median position conservation'max':df_logo_info.max()— most conserved position
for value_type in ["min", "mean", "median", "max"]: score = aa.AAlogo.get_conservation(df_logo_info=df_logo_info, value_type=value_type) # Confirm it matches direct pandas call expected = getattr(df_logo_info, value_type)() print(f"{value_type:>8}: {score:.4f} bits (== df_logo_info.{value_type}(): {score == expected})")
min: 0.2508 bits (== df_logo_info.min(): True) mean: 0.8008 bits (== df_logo_info.mean(): True) median: 0.8627 bits (== df_logo_info.median(): True) max: 1.5553 bits (== df_logo_info.max(): True)
get_conservationis most useful for reducingget_df_logo_infooutput to a single comparable value per group or dataset.df_info_pos = aalogo.get_df_logo_info(df_parts=df_parts, labels=labels, label_test=1) df_info_neg = aalogo.get_df_logo_info(df_parts=df_parts, labels=labels, label_test=0) print(f"{'':12} {'min':>8} {'mean':>8} {'median':>8} {'max':>8}") print("-" * 48) for group, df_info in [("All", df_logo_info), ("Positive", df_info_pos), ("Negative", df_info_neg)]: scores = {vt: aa.AAlogo.get_conservation(df_logo_info=df_info, value_type=vt) for vt in ["min", "mean", "median", "max"]} print(f"{group:<12} {scores['min']:>8.4f} {scores['mean']:>8.4f} " f"{scores['median']:>8.4f} {scores['max']:>8.4f}")
min mean median max ------------------------------------------------ All 0.2508 0.8008 0.8627 1.5553 Positive 0.4080 1.1192 1.1839 1.8965 Negative 0.1741 0.7748 0.7218 1.5067