摘 要: TCR(T细胞受体)-多肽结合位点的准确预测对免疫治疗和相关药物发现具有重要意义。文章综合多个文献及数据库整理了一个TCR-多肽结合位点数据集,并引入了一种基于卷积神经网络的预测方法Propep-TCR。该方法综合考虑了输入TCR的序列特征和结构特征,通过采用残基可变滑动窗口方法提取每个目标残基的特征向量。为解决数据集中正负样本不平衡的问题,还采用了改进的损失函数和过采样技术。实验结果表明,Propep-TCR可以成功预测出TCR序列中的潜在结合位点,取得了优于传统算法的性能,其预测准确度达到0.98,AUROC达到了0.95。 |
关键词: 卷积神经网络;结合位点预测;TCR-多肽相互作用;深度学习 |
中图分类号: TP311.5
文献标识码: A
|
基金项目: 上海市卫生计生委协同创新集群研究项目(2019CXJQ02) |
|
Prediction of TCR-peptide Binding Sites Based on Dual-Module Convolutional Neural Networ |
GAO Yuan1,2, LU Manman2, LIN Yong1, XIE Lu2
|
(1.School of Health Science and Engineering, University of Shanghai f or Science and Technology, Shanghai 200093, China; 2.Shanghai-MOST Key Laboratory of Health and Disease Genomics, Shanghai Institute f or Biomedical and Pharmaceutical Technologies, Shanghai 200237, China)
gaoyuan-99@qq.com; 15512468229@163.com; yong_lynn@163.com; xielu@sibpt.com
|
Abstract: Accurate prediction of TCR ( T Cell Receptor)-peptide binding sites is of great significance for immunotherapy and related drug discovery. This paper proposes to compile a TCR-peptide binding site dataset based on multiple literatures and databases, and introduce a prediction method Propep-TCR based on convolutional neural network. This method comprehensively considers the sequence and structural features of input TCR, and extracts the feature vector of each target residue using the residue-variable sliding window method. To address the issue of imbalanced positive and negative samples in the dataset, an improved loss function and oversampling technique are also employed. Experimental results show that Propep-TCR can successfully predict potential binding sites in TCR sequences, outperforming traditional algorithms with a prediction accuracy of 0.98 and an AUROC of 0.95. |
Keywords: convolutional neural network; binding site prediction; TCR-peptide interaction; deep learning |