Charting γ-secretase substrates by explainable AI

Reproduce a published study with AAanalysis. Where a tutorial teaches one tool and a protocol teaches one workflow, a use case re-runs a real study end-to-end from bundled data only (no downloads) — showing that a published result drops out of the standard AAanalysis pipeline, and serving as a template you adapt to your own data.

This use case reproduces:

Breimann and Kamp et al. (2025), Charting γ-secretase substrates by explainable AI, Nature Communications 16, 5428 <https://www.nature.com/articles/s41467-025-60638-z>__.

Biological motivation. γ-secretase is an intramembrane-cleaving protease that cuts the transmembrane domain of single-span membrane proteins, releasing fragments that drive signalling — and it is central to Alzheimer’s disease (it generates the amyloid-β peptide from the amyloid precursor protein, APP) and to cancer (it activates Notch receptors). Yet out of hundreds of single-span membrane proteins, γ-secretase cleaves only a subset, and no consensus sequence motif marks which ones. Worse, the data is weakly labelled: a few dozen expert-curated substrates, only a handful of confirmed non-substrates, and hundreds of proteins of unknown status. The study asks what physicochemically defines a substrate, and how to predict substrates despite this.

The AAanalysis pipeline. Comparative Physicochemical Profiling (CPP) builds an interpretable, position-resolved signature of the substrates over a redundancy-reduced set of amino-acid scales (curated with AAclust); deterministic Positive-Unlabelled learning (dPULearn) mines reliable negatives from the unlabelled pool to balance the data; a tree model predicts substrate status; and SHAP explains individual predictions at single-residue resolution.

What this reproduces (key steps, simplified). Sequence logos of the three protein groups · an AAclust redundancy-reduced scale set · the CPP signature and feature map with importances · dPULearn reliable-negative mining · a prediction benchmark (feature engineering × data expansion) and feature-number optimization · single-residue SHAP explanations for individual substrates.

Simplifications (so it runs in seconds). The study works over the full human N-out proteome across ten model types with leave-one-out CV and three transmembrane annotations. Here we use the bundled balanced DOM_GSEC set (63 substrates + 63 reliable non-substrates) and the unlabelled DOM_GSEC_PU set (63 substrates + 631 others) with their single embedded annotation, one tree ensemble, and 5-fold cross-validation. The biology and the headline numbers reproduce; scaling up is the Protocols (P1, P4, P7-P10).

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict, StratifiedKFold
from sklearn.metrics import balanced_accuracy_score

import aaanalysis as aa

aa.options["verbose"] = False
aa.options["random_state"] = 42

sf = aa.SequenceFeature()

# Shared evaluator: 5-fold cross-validated balanced accuracy of a tree ensemble.
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
def balanced_acc(X, y):
    y = np.asarray(y)
    y_pred = cross_val_predict(RandomForestClassifier(n_estimators=200, random_state=42), X, y, cv=cv)
    return balanced_accuracy_score(y, y_pred) * 100

1. Load the data

Two bundled domain-level (DOM_) datasets, one row per protein with the transmembrane geometry and the pre-cut jmd_n / tmd / jmd_c parts:

  • ``DOM_GSEC`` — the curated balanced set: 63 substrates (label=1) + 63 reliable non-substrates (label=0).

  • ``DOM_GSEC_PU`` — the positive-unlabelled set: the same 63 substrates + 631 unlabelled proteins of unknown status (label=2, the “others”).

df_dom = aa.load_dataset(name="DOM_GSEC")        # curated: substrates + non-substrates
df_pu = aa.load_dataset(name="DOM_GSEC_PU")      # positive-unlabelled: substrates + others
y_dom = df_dom["label"].to_numpy()
y_pu = df_pu["label"].to_numpy()

print(f"DOM_GSEC (curated)  : {(y_dom == 1).sum()} substrates + {(y_dom == 0).sum()} non-substrates")
print(f"DOM_GSEC_PU (PU set): {(y_pu == 1).sum()} substrates + {(y_pu == 2).sum()} unlabelled 'others'")
aa.display_df(df=df_pu, n_rows=10, show_shape=True)
DOM_GSEC (curated)  : 63 substrates + 63 non-substrates
DOM_GSEC_PU (PU set): 63 substrates + 631 unlabelled 'others'
DataFrame shape: (694, 8)
  entry sequence label tmd_start tmd_stop jmd_n tmd jmd_c
1 P05067 MLPGLALLLLAAWTA...GYENPTYKFFEQMQN 1 701 723 FAEDVGSNKG AIIGLMVGGVVIATVIVITLVML KKKQYTSIHH
2 P14925 MAGRARSGLLLLLLG...EEEYSAPLPKPAPSS 1 868 890 KLSTEPGSGV SVVLITTLLVIPVLVLLAIVMFI RWKKSRAFGD
3 P70180 MRSLLLFTFSACVLL...RELREDSIRSHFSVA 1 477 499 PCKSSGGLEE SAVTGIVVGALLGAGLLMAFYFF RKKYRITIER
4 Q03157 MGPTSPAARGQGRRW...HGYENPTYRFLEERP 1 585 607 APSGTGVSRE ALSGLLIMGAGGGSLIVLSLLLL RKKKPYGTIS
5 Q06481 MAATGTAAAAATGRL...GYENPTYKYLEQMQI 1 694 716 LREDFSLSSS ALIGLLVIAVAIATVIVISLVML RKRQYGTISH
6 P35613 MAAALFVLLGFALLG...HQNDKGKNVRQRNSS 1 323 345 IITLRVRSHL AALWPFLGIVAEVLVLVTIIFIY EKRRKPEDVL
7 P35070 MDRAARCSGASSLPL...DITPINEDIEETNIA 1 119 141 LFYLRGDRGQ ILVICLIAVMVVFIILVIGVCTC CHPLRKRRKR
8 P09803 MGARCRSFSALLLLL...RFKKLADMYGGGEDD 1 711 733 GIVAAGLQVP AILGILGGILALLILILLLLLFL RRRTVVKEPL
9 P19022 MCRIAGALRTLLPLL...PRFKKLADMYGGGDD 1 724 746 RIVGAGLGTG AIIAILLCIIILLILVLMFVVWM KRRDKERQAK
10 P16070 MDKFWWHAAWGLCLV...DETRNLQNVDMKIGV 1 650 672 GPIRTPQIPE WLIILASLLALALILAVCIAVNS RRRCGQKKKL

2. Sequence logos of the three groups

Build a probability sequence logo for the substrates, the non-substrates, and the unlabelled others. If a simple consensus motif marked substrates, it would jump out here.

parts_dom_logo = sf.get_df_parts(df_seq=df_dom, list_parts=["jmd_n", "tmd", "jmd_c"])
parts_pu_logo = sf.get_df_parts(df_seq=df_pu, list_parts=["jmd_n", "tmd", "jmd_c"])

aal = aa.AAlogo(logo_type="probability")
TMD_LEN = 20
df_logo_sub = aal.get_df_logo(df_parts=parts_dom_logo, labels=y_dom, label_test=1, tmd_len=TMD_LEN)
df_logo_non = aal.get_df_logo(df_parts=parts_dom_logo, labels=y_dom, label_test=0, tmd_len=TMD_LEN)
df_logo_oth = aal.get_df_logo(df_parts=parts_pu_logo, labels=y_pu, label_test=2, tmd_len=TMD_LEN)

aa.plot_settings()
aal_plot = aa.AAlogoPlot(logo_type="probability", jmd_n_len=10, jmd_c_len=10)
aal_plot.multi_logo(
    list_df_logo=[df_logo_sub, df_logo_non, df_logo_oth],
    list_name_data=["Substrates (n=63)", "Non-substrates (n=63)", "Others / unlabelled (n=631)"],
    list_name_data_color=["tab:green", "tab:gray", "tab:blue"],
    figsize_per_logo=(9, 1.8),
)
plt.tight_layout()
plt.show()
../_images/use_case1_gamma_secretase_1_output_6_0.png

All three logos look essentially the same — a hydrophobic TMD flanked by a lysine/arginine-rich JMD-C. There is no consensus motif that picks out substrates; the discriminating signal is physicochemical and position-dependent, which is what CPP reads.

3. AAclust: a redundancy-reduced scale set

CPP describes each position by amino-acid scales (physicochemical property indices). The bundled set has 586 scales, but many are near-duplicates. AAclust clusters them and keeps one representative per cluster — here with agglomerative, complete-linkage clustering, as in the study. We compare a few set sizes and keep 133 representative scales (the study’s main set).

df_scales = aa.load_scales()   # 20 amino acids x 586 scales
X_scales = df_scales.T

aac = aa.AAclust(model_class=AgglomerativeClustering, model_kwargs={"linkage": "complete"})
sizes = [50, 100, 133, 200]
list_labels = [aac.fit(X_scales, n_clusters=k).labels_ for k in sizes]
df_eval = aac.eval(X_scales, list_labels=list_labels,
                   names_datasets=[f"Set {i+1} (n={k})" for i, k in enumerate(sizes)])
aa.display_df(df=df_eval, n_rows=10, show_shape=True)
DataFrame shape: (4, 5)
  name n_clusters BIC CH SC
1 Set 1 (n=50) 50 36.326619 31.762686 0.146554
2 Set 2 (n=100) 100 -1328.884793 24.023842 0.150068
3 Set 3 (n=133) 133 -2382.239699 22.269659 0.165686
4 Set 4 (n=200) 200 -4604.826550 21.556305 0.174962
aa.plot_settings()
aa.AAclustPlot().eval(df_eval=df_eval)
plt.tight_layout()
plt.show()
../_images/use_case1_gamma_secretase_2_output_10_0.png
# Keep the 133-scale representative set for all downstream feature engineering.
df_scales_red = aac.select_scales(df_scales, n_clusters=133)
print(f"reduced scale set: {df_scales_red.shape[1]} scales (from {df_scales.shape[1]})")
reduced scale set: 133 scales (from 586)

4. CPP + TreeModel: the physicochemical signature

CPP contrasts the test group (substrates, label=1) against the reference group (non-substrates, label=0), splitting each sequence into parts, applying splits, and averaging a scale over the selected positions; it keeps the Part-Split-Scale features that separate the groups best — the signature. A TreeModel then ranks them by importance. (Mechanics: the CPP and TreeModel tutorials.)

parts_dom = sf.get_df_parts(df_seq=df_dom)   # default TMD-centric composite parts
df_feat = aa.CPP(df_parts=parts_dom, df_scales=df_scales_red).run(labels=list(y_dom), n_filter=100, n_jobs=1)

X_cpp_dom = sf.feature_matrix(features=df_feat["feature"], df_parts=parts_dom, df_scales=df_scales_red)
df_feat = aa.TreeModel().fit(X_cpp_dom, labels=y_dom).add_feat_importance(df_feat=df_feat)
print(f"signature: {df_feat.shape[0]} features")
aa.display_df(
    df=df_feat[["feature", "category", "subcategory", "mean_dif", "abs_auc", "feat_importance"]],
    n_rows=8, show_shape=True,
)
signature: 100 features
DataFrame shape: (100, 6)
  feature category subcategory mean_dif abs_auc feat_importance
1 TMD_C_JMD_C-Pat...,15)-ROBB760103 Conformation α-helix 0.107000 0.354000 3.368000
2 TMD_C_JMD_C-Seg...6,9)-ZIMJ680104 Energy Isoelectric point 0.268000 0.352000 4.456000
3 TMD_C_JMD_C-Seg...4,9)-ROBB760113 Conformation β-turn -0.337000 0.349000 1.955000
4 TMD_C_JMD_C-Seg...2,3)-KANM800103 Conformation α-helix 0.113000 0.342000 1.668000
5 TMD_C_JMD_C-Seg...4,5)-ZIMJ680104 Energy Isoelectric point 0.205000 0.336000 1.846000
6 TMD_C_JMD_C-Seg...2,3)-VASM830101 Conformation Unclassified (Conformation) 0.139000 0.335000 1.172000
7 TMD_C_JMD_C-Seg...6,9)-WERD780103 Energy Unclassified (Energy) -0.122000 0.329000 2.084000
8 TMD_C_JMD_C-Seg...4,8)-ROBB760113 Conformation β-turn -0.261000 0.327000 2.214000

The feature map shows the whole signature at once: rows = scale subcategories, columns = positions along the parts, colour = direction & strength of the substrate-vs-non- substrate difference, top bars = cumulative importance. The signal concentrates in the cleavage region (C-terminal TMD into JMD-C) and is dominated by conformational properties — the study’s central biological readout.

cpp_plot = aa.CPPPlot()
aa.plot_settings(font_scale=0.65, weight_bold=False)
cpp_plot.feature_map(df_feat=df_feat, name_test="substrate", name_ref="non-substrate")
plt.tight_layout()
plt.show()
../_images/use_case1_gamma_secretase_3_output_15_0.png

5. dPULearn: mine reliable negatives

The 63 curated non-substrates were not all known upfront: only 14 were experimentally confirmed; the other 49 were predicted by dPULearn from the unlabelled pool. The bundled data still encodes that split — the curated negatives that never appear among the unlabelled DOM_GSEC_PU “others” are exactly the 14 experimentally-known ones. dPULearn projects the proteins into the CPP feature space and labels the unlabelled points most distant from the positives as reliable negatives, deterministically, extending the 14 to a balanced 63.

known_neg = sorted(set(df_dom.loc[y_dom == 0, "entry"]) - set(df_pu.loc[y_pu == 2, "entry"]))
df_known = df_dom[df_dom["entry"].isin(known_neg)]
df_pos = df_pu[df_pu["label"] == 1]
df_others = df_pu[df_pu["label"] == 2]

def cpp_X(df):
    return sf.feature_matrix(features=df_feat["feature"], df_parts=sf.get_df_parts(df_seq=df),
                             df_scales=df_scales_red)
X_cpp_pos, X_cpp_known, X_cpp_oth = cpp_X(df_pos), cpp_X(df_known), cpp_X(df_others)

n_mine = 63 - len(known_neg)
X_pool = np.vstack([X_cpp_pos, X_cpp_oth])
y_pool = np.array([1] * len(X_cpp_pos) + [2] * len(X_cpp_oth))
dpul = aa.dPULearn(random_state=42)
dpul.fit(X=X_pool, labels=y_pool, n_unl_to_neg=n_mine)
mined_others = np.asarray(dpul.labels_)[len(X_cpp_pos):] == 0
print(f"positives: {len(df_pos)} | experimentally-known negatives: {len(known_neg)} "
      f"| dPULearn-mined reliable negatives: {int(mined_others.sum())}")
positives: 63 | experimentally-known negatives: 14 | dPULearn-mined reliable negatives: 49
aa.plot_settings(font_scale=0.85)
aa.dPULearnPlot().pca(df_pu=dpul.df_pu_, labels=np.asarray(dpul.labels_))
plt.tight_layout()
plt.show()
../_images/use_case1_gamma_secretase_4_output_18_0.png

The positives (green) cluster on one side of the CPP feature space; dPULearn picks the unlabelled points (grey) farthest from them as reliable negatives (gold).

6. Sequence logo of the reliable negatives

What do the mined negatives look like? Their logo, beside the substrates’, shows dPULearn selected coherent membrane proteins (not noise) — yet still with no single distinguishing motif, reinforcing that the difference is physicochemical.

df_mined = df_others.iloc[np.where(mined_others)[0]].reset_index(drop=True)
parts_mined = sf.get_df_parts(df_seq=df_mined, list_parts=["jmd_n", "tmd", "jmd_c"])
df_logo_rel = aal.get_df_logo(df_parts=parts_mined, tmd_len=TMD_LEN)

aa.plot_settings()
aal_plot.multi_logo(
    list_df_logo=[df_logo_sub, df_logo_rel],
    list_name_data=["Substrates (n=63)", "dPULearn reliable negatives (n=49)"],
    list_name_data_color=["tab:green", "goldenrod"],
    figsize_per_logo=(9, 1.8),
)
plt.tight_layout()
plt.show()
../_images/use_case1_gamma_secretase_5_output_21_0.png

7. Prediction evaluation

Optimizing the feature and negative counts

How many CPP features and how many dPULearn negatives are enough? Sweep both and read 5-fold balanced accuracy as a heatmap (always 63 positives + the 14 known negatives, extended by the top dPULearn-mined negatives).

feat_list = df_feat["feature"]
n_feats = [25, 50, 75, 100]
n_negs = [20, 35, 49, 63]
heat = np.zeros((len(n_feats), len(n_negs)))
for i, nf in enumerate(n_feats):
    feats = feat_list.iloc[:nf]
    Xp = sf.feature_matrix(features=feats, df_parts=sf.get_df_parts(df_seq=df_pos), df_scales=df_scales_red)
    Xk = sf.feature_matrix(features=feats, df_parts=sf.get_df_parts(df_seq=df_known), df_scales=df_scales_red)
    Xo = sf.feature_matrix(features=feats, df_parts=sf.get_df_parts(df_seq=df_others), df_scales=df_scales_red)
    d = aa.dPULearn(random_state=42)
    d.fit(X=np.vstack([Xp, Xo]), labels=np.array([1]*len(Xp) + [2]*len(Xo)), n_unl_to_neg=max(n_negs))
    order = np.where(np.asarray(d.labels_)[len(Xp):] == 0)[0]
    for j, nn in enumerate(n_negs):
        take = min(nn, len(order))
        X = np.vstack([Xp, Xk, Xo[order[:take]]])
        heat[i, j] = balanced_acc(X, [1]*len(Xp) + [0]*(len(Xk) + take))
print("best balanced accuracy: %.1f%%" % heat.max())
best balanced accuracy: 92.1%
aa.plot_settings()
fig, ax = plt.subplots(figsize=(5.5, 4.5))
im = ax.imshow(heat, cmap="viridis", aspect="auto", vmin=50, vmax=100)
ax.set_xticks(range(len(n_negs))); ax.set_xticklabels(n_negs)
ax.set_yticks(range(len(n_feats))); ax.set_yticklabels(n_feats)
ax.set_xlabel("Number of dPULearn negatives"); ax.set_ylabel("Number of top CPP features")
ax.set_title("CPP / dPULearn optimization")
for i in range(len(n_feats)):
    for j in range(len(n_negs)):
        ax.text(j, i, f"{heat[i, j]:.0f}", ha="center", va="center",
                color="white" if heat[i, j] < 80 else "black", fontsize=10, weight="bold")
fig.colorbar(im, ax=ax, label="Balanced accuracy [%]")
plt.tight_layout()
plt.show()
../_images/use_case1_gamma_secretase_6_output_24_0.png

Comparing against the baseline

Two axes, as in the study. Always train on the 63 positives; vary the feature method (a scale-average baseline — each scale averaged over the whole sequence, no positional splits — vs. the CPP signature) and the data expansion (14 known negatives only / +49 random others / +49 dPULearn-mined reliable negatives). CPP beats the baseline at every column, and dPULearn beats no and random expansion: it is interpretable features and reliable negatives, not just more data, that reach the study’s accuracy.

def scale_X(df):
    seqs = (df["jmd_n"] + df["tmd"] + df["jmd_c"]).to_list()
    return np.array([df_scales_red.loc[[a for a in s if a in df_scales_red.index]].mean(axis=0).values
                     for s in seqs])
X_sc_pos, X_sc_known, X_sc_oth = scale_X(df_pos), scale_X(df_known), scale_X(df_others)

def bench(X_pos, X_known, X_oth):
    none = balanced_acc(np.vstack([X_pos, X_known]), [1]*len(X_pos) + [0]*len(X_known))
    rnd = []
    for seed in range(3):
        pick = np.random.default_rng(seed).choice(len(X_oth), size=n_mine, replace=False)
        rnd.append(balanced_acc(np.vstack([X_pos, X_known, X_oth[pick]]),
                                [1]*len(X_pos) + [0]*(len(X_known) + n_mine)))
    dpu = balanced_acc(np.vstack([X_pos, X_known, X_oth[mined_others]]),
                       [1]*len(X_pos) + [0]*(len(X_known) + int(mined_others.sum())))
    return [none, float(np.mean(rnd)), dpu]

res_scale = bench(X_sc_pos, X_sc_known, X_sc_oth)
res_cpp = bench(X_cpp_pos, X_cpp_known, X_cpp_oth)
cols = ["No expansion\n(14 known)", "Random\nexpansion", "dPULearn\nexpansion"]
print("Scale-based :  " + "  ".join(f"{v:5.1f}%" for v in res_scale))
print("CPP         :  " + "  ".join(f"{v:5.1f}%" for v in res_cpp))
Scale-based :   61.1%   59.8%   74.6%
CPP         :   70.6%   73.5%   93.7%
aa.plot_settings()
fig, ax = plt.subplots(figsize=(7, 4.2))
x = np.arange(len(cols)); w = 0.38
b1 = ax.bar(x - w/2, res_scale, w, label="Scale-based", color="tab:gray")
b2 = ax.bar(x + w/2, res_cpp, w, label="CPP", color="tab:red")
ax.axhline(50, ls="--", color="black", lw=1); ax.text(2.5, 51, "random (50%)", ha="right", fontsize=9)
for bars in (b1, b2):
    for b in bars:
        ax.text(b.get_x() + b.get_width()/2, b.get_height() + 1, f"{b.get_height():.0f}",
                ha="center", fontsize=10, weight="bold")
ax.set_xticks(x); ax.set_xticklabels(cols)
ax.set_ylabel("Balanced accuracy [%]"); ax.set_ylim(0, 108)
ax.set_title("Feature engineering x data expansion")
ax.legend(title="Feature method", loc="upper left", ncol=2, fontsize=9, framealpha=0.9)
plt.tight_layout()
plt.show()
../_images/use_case1_gamma_secretase_7_output_27_0.png

8. Explaining single substrates with SHAP

CPP describes the group; SHAP explains one protein. Fitting a ShapModel on the CPP features and reading out per-residue feature impact shows where in a given substrate the substrate-defining signal sits. We do this for APP (the amyloid precursor protein, the Alzheimer-relevant substrate) and N-cadherin (CDH2), both in the bundled set.

(Notch — the canonical cancer-relevant substrate — is not in the bundled ``DOM_GSEC`` subset; it lives in the study’s full proteome dataset.)

X_shap = sf.feature_matrix(features=df_feat["feature"], df_parts=parts_dom, df_scales=df_scales_red)
sm = aa.ShapModel(random_state=42)
sm.fit(X_shap, labels=list(y_dom))

samples = {"APP": "P05067", "CDH2": "P19022"}
positions = [int(np.where(df_dom["entry"].values == acc)[0][0]) for acc in samples.values()]
names = list(samples)
df_feat = sm.add_sample_mean_dif(X_shap, labels=list(y_dom), df_feat=df_feat,
                                 samples=positions, names=names)
df_feat = sm.add_feat_impact(df_feat=df_feat, samples=positions, names=names)

for name, acc in samples.items():
    seq_kws = sf.get_seq_kws(df_seq=df_dom, df_parts=parts_dom, sample=acc)
    aa.plot_settings(font_scale=0.9)
    aa.CPPPlot().profile(df_feat=df_feat, shap_plot=True, col_imp=f"feat_impact_{name}", **seq_kws)
    plt.title(f"CPP-SHAP profile for {name} ({acc})")
    plt.tight_layout()
    plt.show()
../_images/use_case1_gamma_secretase_8_output_29_0.png ../_images/use_case1_gamma_secretase_9_output_29_1.png

For both substrates the per-residue impact peaks in the cleavage region (C-terminal TMD into the JMD-C), pinpointing the residues that make each protein a substrate — the single-residue interpretability that motivates the “explainable AI” in the study’s title.

Summary

Reproduced from bundled data, in seconds, the full arc of the study:

  • No motif — substrate, non-substrate and “other” logos are indistinguishable; the signal is physicochemical, not a letter pattern.

  • AAclust reduces 586 redundant scales to a representative set of 133.

  • CPP + TreeModel yield an interpretable signature concentrated in the cleavage region and dominated by conformational properties.

  • dPULearn mines reliable negatives from the unlabelled pool to balance the data.

  • Prediction — CPP beats a scale-average baseline at every setting, dPULearn beats no- and random-expansion, and the optimized CPP + dPULearn combination reaches ~90%+ balanced accuracy, matching the study.

  • SHAP explains individual substrates (APP, CDH2) at single-residue resolution.

Scale it up. To approach the full study — the imbalanced proteome, multi-model leave-one-out CV, and proteome-wide prediction — follow the Protocols: P1 (CPP signature), P4 (prediction tasks), P7-P8 (selection & prediction), and P9-P10 (interpretability & validation).

Breimann and Kamp et al. (2025), Charting γ-secretase substrates by explainable AI, Nature Communications 16, 5428 <https://www.nature.com/articles/s41467-025-60638-z>__.