Title :
An axiomatic inspection of the behavior of topic models with data aggregation
Author :
Deolalikar, Vinay
Abstract :
Topic modeling has various applications in organizing and retrieving textual data in document collections. In enterprises, such collections are often distributed across various sites, and collaboratively aggregated at the time of processing. Therefore, the problem of topic modeling over aggregations of data is important. We study the behavior of a standard topic modeling technique-hierarchical Dirichlet process (HDP)-as the underlying data is aggregated. We formulate three axioms that reflect the assumptions that users frequently make when dealing with aggregated data. We empirically demonstrate that HDP does not necessarily satisfy these axioms. We discuss the ramifications of this on applications in enterprise settings.
Keywords :
document handling; information retrieval; HDP; axiomatic inspection; data aggregation; document collections; enterprise settings; hierarchical dirichlet process; standard topic modeling technique; textual data retrieval; topic model behavior; Analytical models; Computational modeling; Data mining; Data models; Distributed databases; Information management; Standards;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
DOI :
10.1109/BigData.2014.7111660