DocumentCode :
3263600
Title :
Machine learning approaches to Information Retrieval and its applications to the web, medical informatics and health care
Author :
Huang, Xiangji
Author_Institution :
York Univ., Toronto, ON
fYear :
2008
fDate :
26-28 Aug. 2008
Firstpage :
39
Lastpage :
40
Abstract :
With the ever-increasing large amount of digital information available, the need for advanced information retrieval (IR) systems increases. this wealth of digital information presents a major data-analysis challenge for us. How to manipulate, analyze and understand large quantities of complex data becomes extremely important. Over the past decades, significant progress has been made in IR. However, many challenges remain. First, most Web search engines take a short text query as input and output a ranked list of documents. The retrieval decision is made primarily based on the current query and document collection. Web search engines generally treat search requests in isolation. The results for a given query are identical, independent of the user or the context in which the user makes the request. However, it is unlikely that different users are so similar in their interests that one standardized way of retrieving information fits all needs. Different users may have different information needs. They may use the same query to search for different kinds of information. Moreover, even the same user may use identical queries to express different information needs. For example, a person may use ldquoIRIXrdquo to mean information retrieval in context at one time, but IRIX operating systems at another time. It is impossible for the current Web search engines to distinguish these two cases because the userpsilas search context is not considered. Second, IR is, in general, an interactive process. A userpsilas information need is rarely satisfied with just one iteration of search. With the current document-centered retrieval paradigm, interactive retrieval is treated as a sequence of independent simple retrieval decision-making steps. The information about search history is ignored, which makes the retrieval performance of existing IR systems inherently non-optimal. However, it has been brought into attention that analysis of task-oriented user sessions provides useful insight- - into the query behavior of the users. Third, most of present IR systems including general search engines (e.g. Google and Yahoo) and scientific literature search engines (e.g. PubMed and ACM Digital Library) use keywords to query and index documents. However, this traditional keyword-based IR model provides little semantic context for the understanding of user information needs. For example, a keyword usually has several senses and its meaning is ambiguous without context. In addition, one meaning can be expressed by many keywords. Thus, the integration of semantic context according to the userpsilas information need and the userpsilas understanding of the documents in the collection into IR systems will definitely improve the IR performance.
Keywords :
data analysis; health care; learning (artificial intelligence); medical information systems; query formulation; search engines; Web search engines; data analysis; decision-making steps; document collection; document-centered retrieval paradigm; health care; information retrieval systems; literature search engines; machine learning; medical informatics; Biomedical informatics; Decision making; History; Information retrieval; Machine learning; Medical services; Operating systems; Search engines; Software libraries; Web search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Granular Computing, 2008. GrC 2008. IEEE International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-4244-2512-9
Electronic_ISBN :
978-1-4244-2513-6
Type :
conf
DOI :
10.1109/GRC.2008.4664796
Filename :
4664796
Link To Document :
بازگشت