MILANO--custom annotation of microarray results using automatic literature searches

BMC Bioinformatics. 2005 Jan 20:6:12. doi: 10.1186/1471-2105-6-12.

Abstract

Background: High-throughput genomic research tools are becoming standard in the biologist's toolbox. After processing the genomic data with one of the many available statistical algorithms to identify statistically significant genes, these genes need to be further analyzed for biological significance in light of all the existing knowledge. Literature mining--the process of representing literature data in a fashion that is easy to relate to genomic data--is one solution to this problem.

Results: We present a web-based tool, MILANO (Microarray Literature-based Annotation), that allows annotation of lists of genes derived from microarray results by user defined terms. Our annotation strategy is based on counting the number of literature co-occurrences of each gene on the list with a user defined term. This strategy allows the customization of the annotation procedure and thus overcomes one of the major limitations of the functional annotations usually provided with microarray results. MILANO expands the gene names to include all their informative synonyms while filtering out gene symbols that are likely to be less informative as literature searching terms. MILANO supports searching two literature databases: GeneRIF and Medline (through PubMed), allowing retrieval of both quick and comprehensive results. We demonstrate MILANO's ability to improve microarray analysis by analyzing a list of 150 genes that were affected by p53 overproduction. This analysis reveals that MILANO enables immediate identification of known p53 target genes on this list and assists in sorting the list into genes known to be involved in p53 related pathways, apoptosis and cell cycle arrest.

Conclusions: MILANO provides a useful tool for the automatic custom annotation of microarray results which is based on all the available literature. MILANO has two major advances over similar tools: the ability to expand gene names to include all their informative synonyms while removing synonyms that are not informative and access to the GeneRIF database which provides short summaries of curated articles relevant to known genes. MILANO is available at http://milano.md.huji.ac.il.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Apoptosis
  • Automation
  • Computational Biology / methods*
  • Computer Graphics
  • Databases, Bibliographic*
  • Databases, Factual
  • Databases, Genetic
  • Gene Expression Profiling
  • Gene Expression Regulation
  • Genomics
  • Humans
  • Information Storage and Retrieval
  • Internet
  • Literature
  • MEDLINE
  • Microarray Analysis
  • Models, Biological
  • Oligonucleotide Array Sequence Analysis
  • Programming Languages
  • Proteomics
  • PubMed
  • RNA, Messenger / metabolism
  • Software Design
  • Software*
  • Systems Integration
  • Tumor Suppressor Protein p53 / metabolism
  • User-Computer Interface
  • Vocabulary, Controlled

Substances

  • RNA, Messenger
  • Tumor Suppressor Protein p53