摘 要: 针对目前农作物蛋白质磷酸化位点预测成本高、效率低等问题,提出了一种基于深度学习的计算方法。在编码器中加入门控单元,引入蛋白质内在无序性得分作为特征并优化了训练样本采样方式。相较于基于Transformer的方法,该方法具有相同的精度,并且计算量显著减少,展现出高效的计算性能;与DeepIPs、TabNet、TransPhos等现有方法相比,也表现出卓越性能,并且在五倍交叉验证下的AUC提升2%以上。此外,该方法使用的特征可以仅从序列中提取,简化了操作,同时提高了预测效果,为农作物蛋白质磷酸化的研究提供了重要的参考。 |
关键词: 深度学习;生物信息学;蛋白质磷酸化;计算生物学 |
中图分类号: TP389.1
文献标识码: A
|
|
A Study on the Prediction Model of Protein Phosphorylation in Crops Based on Gated Units |
DUAN Xufu1, LI Zhong1,2
|
(1.School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China; 2.School of In f ormation Engineering, Huzhou University, Huzhou 313000, China)
202130504072@mails.zstu.edu.cn; lizhong@zjhu.edu.cn
|
Abstract: In response to the current challenges of high cost and low efficiency in predicting protein phosphorylation sites in crops, this paper proposes a computational method based on deep learning. Gated units are incorporated in the encoder, intrinsic disorder scores of proteins are introduced as features, and the sampling method of training samples is optimized. Compared to methods based on Transformers, this method achieves the same accuracy with significantly reduced computational complexity, demonstrating high computational efficiency. When compared to existing methods such as DeepIPs, TabNet, and TransPhos, it also shows superior performance, with an increase of over 2% in AUC under five-fold cross-validation. Furthermore, the features used in this method can be extracted solely from sequences, simplifying operations while improving prediction effectiveness, providing important insights for the study of protein phosphorylation in crops. |
Keywords: deep learning; bioinformatics; protein phosphorylation; computational biology |