Welcome to CEG database!

Citation: Ye YN, Hua ZG, Huang J, Rao N, Guo FB*. (2013) CEG: a database of essential gene clusters. BMC Genomics. 14:769. [Full Text]

Introduction

User guide

CEG (Cluster of Essential Genes) is a database containing clusters of orthologous essential genes developed by CEFG Group in UESTC. If you have any suggestion, please contact us by sending an e-mail to webmaster@cefg.cn or fbguo@uestc.edu.cn.

Original data for generating CEG are derived from the database DEG, which has been published in NAR in 2004 and 2014. Different from DEG, CEG database store essential genes in the form of orthologous groups and not in single genes.

That is to say, all essentials genes in different bacterial organisms are contained in one CEG cluster if they have the same functions. With this adaptation from DEG, users could easily decide that an essential gene is conserved in multiple bacterial pathogens or species-specific. And this property (named persistence) is a very important index for evolutionary research, drug design and other fields. Another superior for clustering essential genes based on functions is that the false positive predictions for predicting essential genes with similarity alignment method could be greatly decreased. Actually, one cluster of CEG belongs to one COG cluster (one well-known database provided by NCBI).

However, they are not equivalent. COG group contain all orthologous genes, whereas CEG group contain only those orthologous genes being essential for the bacterial hosts. The most important function of CEG database is that it provides the result of similarity alignment for every cluster against human proteins or genes. So, it is a convenient resource for selecting target of innocuous anti-bacterial drugs. Now the CEG contains 1565 clusters with two or more essential genes, and it also contains 2856 pseudo clusters with only one essential genes.

Compared with version1.0, in this version, we are add some information relate with drug design and related fields. For example, we add the structure information, pathway information, virulence information, protein_ligand information and drug information of the essential gene. We believe these information will be helpful for drug design and the find of drug target.

This version contains 29 strains ,they are:

Bacillus subtilis 168
Staphylococcus aureus N315
Vibrio cholerae N16961
Escherichia coli MG1655
Haemophilus influenzae Rd KW20
Mycoplasma genitalium G37
Streptococcus pneumoniae
Helicobacter pylori 26695
Mycobacterium tuberculosis H37Rv
Salmonella typhimurium LT2
Francisella novicida U112
Acinetobacter baylyi ADP1
Mycoplasma pulmonis UAB CTIP
Pseudomonas aeruginosa UCBPP-PA14
Salmonella enterica serovar Typhi
Staphylococcus aureus NCTC 8325
Caulobacter crescentus
Streptococcus sanguinis
Porphyromonas gingivalis ATCC 33277
Bacteroides thetaiotaomicron VPI-5482
Burkholderia thailandensis E264
Sphingomonas wittichii RW1
Shewanella oneidensis MR-1
Salmonella enterica serovar Typhimurium SL1344
Bacteroides fragilis 638R
Burkholderia pseudomallei K96243
Salmonella enterica subsp. enterica serovar Typhimurium str. 14028S
Pseudomonas aeruginosa PAO1
Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819

 

User guide

A detailed description of table name in all pages.

Table name Description
CEG id ID of CEG cluster
Symbol Genes' name in each CEG cluster
ESAHG The e-value of similarity alignment(blastn) for every cluster against human genes
ESAHP The e-value of similarity alignment(blastp) for every cluster against human proteins
Cluster size Number of genes in a cluster
Strains Number of species cover
Drug size Number of gene have drug information in the cluster
Virulence size Number of gene have virulence information in the cluster
Struct size Number of gene have Struct information in the cluster
Pathway size Number of gene have pathway information in the cluster