苗震龑等《Plant Communications》2024年

作者：来源：发布日期：2024-06-19 浏览次数：

论文题目：Dual-Extraction Modeling: A multimodal deep learning architecture for phenotypic prediction and functional gene mining of complex traits

论文作者：Yanlin Ren#, Chenhua Wu#, He Zhou#, Xiaona Hu*, Zhenyan Miao*

论文摘要：

Despite considerable advancements in extracting crucial insights from bio-omics data to unravel the intricate mechanisms underlying complex traits, the absence of a universal multimodal computational tool with robust interpretability for accurate phenotype prediction and identification of trait-associated genes remains a challenge. This study introduces the Dual-Extraction Modeling (DEM) approach, a multimodal deep learning architecture designed to extract representative features from heterogeneous omics datasets, enabling the prediction of complex trait phenotypes. Through comprehensive benchmarking experiments, we demonstrate DEM's efficacy in classification and regression prediction of complex traits. DEM consistently exhibits superior accuracy, robustness, generalizability, and flexibility. Notably, we establish its effectiveness in predicting pleiotropic genes influencing both flowering time and rosette leaf number, underscoring its commendable interpretability. Additionally, user-friendly software has been developed to facilitate the seamless utilization of DEM's functions. In summary, this study presents a state-of-the-art approach with the capability to effectively predict qualitative and quantitative traits, as well as identify functional genes, affirming its potential as a valuable tool in exploring the genetic basis of complex traits. Source code and software of DEM are available at https://github.com/cma2015/DEM/.

复杂性状的表型变异与多种分子过程中的遗传和表观遗传变异密切相关。高通量测序技术的发展使得研究人员能够获取涵盖基因组、转录组、蛋白组和表观组在内的大量多组学数据。然而，如何有效整合这些多模态数据并将其转化为有意义的生物学见解一直是复杂性状研究中的一大挑战。

该研究开发了一种创新的深度学习模型Dual-Extraction Modeling（DEM）。DEM模型能够从多种异质的组学数据集中提取代表性特征，进而预测与复杂性状相关的表型，并识别影响这些性状的功能基因，有望在性状改良和疾病预测中发挥重要作用。DEM模型架构采用了双重提取的策略，分别在高维和低维特征空间中进行建模。通过多头自注意力网络，DEM能够从个体组学特征矩阵和联合矩阵中提取全局注意力特征向量，进而优化模型权重并输出最终预测结果。这种方法不仅提升了性状表型预测的准确性，还通过事后解释策略提高了模型对性状相关功能基因的识别能力。在随后一系列全面的基准测试实验中，DEM模型在植物的数量性状和质量性状预测方面均展现出了优越的准确性、鲁棒性、泛化能力和灵活性。此外，DEM模型在人类疾病预测中同样表现优异。特别值得一提的是，DEM在预测影响多性状的多效性基因方面表现出了显著的可解释能力。在此基础上，研究团队开发了一款便于使用的软件（https://github.com/cma2015/DEM/），使得研究人员能够轻松运用DEM的各项功能进行多组学数据的分析与解读。该项研究成果不仅为复杂性状的研究提供了一个强有力的工具，还通过高效的特征提取、表型预测和事后解释方法，推动了对性状机制和疾病机理的理解。DEM的高性能和易用性将有望加速智能育种和精准医疗研究的进程。

论文链接：https://doi.org/10.1016/j.xplc.2024.101002