Introduction

Pupylation plays a key role in regulating various protein functions as a crucial post-translational modification of prokaryotes. In order to understand the molecular mechanism of pupylation, it is important to identify pupylation substrates and sites accurately. Several computational methods have been developed to identify pupylation sites because the traditional experimental methods are time-consuming and labor-sensitive. With the existing computational methods, the experimentally annotated pupylation sites are used as the positive training set and the remaining non-annotated lysine residues as the negative training set to build classifiers to predict new pupylation sites from the unknown proteins. However, the remaining non-annotated lysine residues may contain pupylation sites which have not been experimentally validated yet. Unlike previous methods, in this study, the experimentally annotated pupylation sites were used as the positive training set whereas the remaining non-annotated lysine residues were used as the unlabeled training set. A novel method named PUL-PUP was proposed to predict pupylation sites by using positive-unlabeled learning technique. Our experimental results indicated that PUL-PUP outperforms the other methods significantly for the prediction of pupylation sites. As an application, PUL-PUP was also used to predict the most likely pupylation sites in non-annotated lysine sites.

PUL-PUP software MATLAB code

The whole software MATLAB code of PUL-PUP is availabe by clicking here.

Usage of this MATLAB software package:

  1. Prepare your sequence(s) file in fasta format and name it "Xinput.fasta"
  2. Run the program "PUL_PUP.m"
  3. Get the result file "Pre_result.xls"

Please note that this software package only supports the 32-bit version of Windows operating system.

Dataset

Tung's training set and Tung's independent testing set are availabe by clicking here.

Reference

Ming Jiang, Jun-Zhe Cao, Positive-unlabeled learning for pupylation sites prediction.

If you have any questions, please contact Ming Jiang