DocumentCode :
658366
Title :
Economic Development through Business Profiling: A Text Analysis Based Approach
Author :
Parimi, Rohit ; Caragea, Doina ; Wunderlich, Dirk
Author_Institution :
Comput. & Inf. Sci., Kansas State Univ. Manhattan, Manhattan, KS, USA
Volume :
1
fYear :
2013
fDate :
17-20 Nov. 2013
Firstpage :
315
Lastpage :
320
Abstract :
The tremendous improvements in the field of web technologies have contributed to the accumulation of large amounts of text data, particularly in the form of websites. Among others, most businesses, smaller or bigger, present themselves to the world through their websites. Economic development analysts could make use of the information available on business websites to identify ways in which businesses in a region can be grouped together into clusters, and possibly partnership for mutual benefit, and thus for economics gains of that particular region. Automated clustering of businesses is especially useful, as the existing NAICS codes-based clustering is not very accurate, according to domain experts, and does not scale up well. Text analysis is a blooming field whose goal is to automatically extract useful information from natural language text. In this work, we perform a preliminary text analysis of business websites to build business profiles and to organize businesses into clusters. Our approach is based on representing businesses as a mixture of ``topics" using a technique called Latent Dirichlet Allocation (LDA). Given the business profiles represented as the topic distributions obtained with LDA, we construct preliminary clusters of businesses. Manual analysis of the results shows that the proposed approach has the potential for giving interesting and useful clusters, which have the potential to replace the existing NAICS codes-based clusters. We identify further challenges associated with the existing business data, and several ideas for future work.
Keywords :
Web sites; business data processing; economics; text analysis; LDA technique; NAICS codes-based clustering; Web technologies; business Web sites; business profiling; economic development; information extraction; latent Dirichlet allocation; natural language text; text analysis based approach; text data accumulation; topic distributions; Analytical models; Companies; Data models; Industries; Manufacturing; Materials; Business Profiling; Text Analysis; Topic Models;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4799-2902-3
Type :
conf
DOI :
10.1109/WI-IAT.2013.45
Filename :
6690031
Link To Document :
بازگشت