Author :
Chakraborty, Nabarun ; Hammamieh, Rasha ; Wang, Yan ; Laing, Mark ; Liu, Zaigang ; Mulligan, John ; Jett, Marti
Author_Institution :
Molecular Pathology, Walter Reed Army Inst. of Res., Silver Spring, MD
Abstract :
GeneCite, a stand alone Java application searching tool, provides an efficient method of high throughput processing and handling of batch queries in PubMed and UniSTS, the database of citations and the database of sequenced tagged sites (STSs) respectively. It is of carrying out high throughput data mining, as many as 200 queries at a time, in user´s personal server using several unique time-saving features. Unlike some available internet based data-mining tools, in GeneCite, user can interconnect two input files via any of the three available Boolean operators at NCBI Webdomain, i.e. AND, OR and NOT. For instance, GeneCite could provide an array of citations or a list of possible STSs depending on the choice of search base, against one array of search keys. This kind of query is defined as a one dimensional search. A two dimensional search, by definition could connect two files together. A careful selection of the Boolean operator could be the key factor to obtain a precise and effective result. For instance in order to search for unique STSs of a genome, a two dimensional- self vs. self query with that genome connected by ´NOT´ could process. The output would refer those entries where only the genes at vertical column are mentioned, not the gene at horizontal column. Similarly, a genome and a set of gene functions connected by ´AND´ would return the entries where both keys are cited together. The search could be customized further by selecting a key from a set of search-limiting markers provided by GeneCite, namely the date of publication, name of authors or journals and organisms to study. Once the search begins, GeneCite can handle the rest of the process including the documentation the resultant files, without further assistance. After completion of a given search, GeneCite provides a summery of result briefing total number of hits etc., and two output files. First file provides literature citation counts for each given search key, while the other file off- ers hyperlinks for each query connecting the appropriate result page of the data source. By sorting the citation numbers listed on first file, user can easily identify the important cells of queries or can reject the false positives; while through the second file they could directly reach the result page of the queries of interest. The result files could be stored in hard drive and users can work over them at their convenience. In short, today´s fast growing sphere of bioinformatics, where retrieving precise information with minimum effort and time turns into a major concern, this software could provide a fitting solution. The software is free and currently available at http://www.biospice.org
Keywords :
Boolean functions; Java; biology computing; citation analysis; data mining; database management systems; query processing; Boolean operators; GeneCite; Java application searching tool; NCBI Webdomain; PubMed literature mining; UniSTS literature mining; batch queries; bioinformatics; citations database; documentation; gene functions; information retrieval; literature citation counts; sequenced tagged sites; two-dimensional search; users personal server; Bioinformatics; Data mining; Documentation; Genomics; Internet; Java; Organisms; Spatial databases; Throughput; Web server;