Title :
Structure induction by lossless graph compression
Author_Institution :
Center for Biomed. Inf., Harvard Med. Sch., Boston, MA
Abstract :
This work is motivated by the necessity to automate the discovery of structure in vast and ever-growing collection of relational data commonly represented as graphs, for example genomic networks. A novel algorithm, dubbed Graphitour, for structure induction by lossless graph compression is presented and illustrated by a clear and broadly known case of nested structure in a DNA molecule. This work extends to graphs some well established approaches to grammatical inference previously applied only to strings. The bottom-up graph compression problem is related to the maximum cardinality (non-bipartite) maximum cardinality matching problem. The algorithm accepts a variety of graph types including directed graphs and graphs with labeled nodes and arcs. The resulting structure could be used for representation and classification of graphs
Keywords :
DNA; biology computing; data compression; directed graphs; molecular biophysics; DNA molecule; Graphitour; bottom-up graph compression problem; directed graphs; genomic networks; grammatical inference; lossless graph compression; maximum cardinality maximum cardinality matching problem; structure induction; Bioinformatics; Biomedical informatics; Computer science; DNA; Explosives; Genomics; Humans; Inference algorithms; Production; Proteins;
Conference_Titel :
Data Compression Conference, 2007. DCC '07
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-2791-4
DOI :
10.1109/DCC.2007.73