Title :
Statistical analysis of Web documents: a proposal and a case study
Author :
Vittorini, Pierpaolo ; Felice, PaolinoDi
Author_Institution :
Dipt. di Ingegneria Elettrica, L´´Aquila Univ., Italy
Abstract :
The quality metrics so far adopted for Web document analysis suffer from a serious limitation: they take into account single documents, disregarding the specific context the Web pages belong to. As a formal tool suitable to overcome such a limitation, we introduce new metrics which take as input sets of Web pages and return statistical distributions about the number of paragraphs of text, the area covered by the images and the number of (internal/external) hyperlinks. The strategy for the practical evaluation of the quality of the organization of a generic set of Web pages requires the comparison of their statistical distributions against reference distributions computed by applying the metrics to a "selected" set of Web documents. The paper reports about an experiment where the general strategy is instantiated to the specific domain of courseware; in numbers: seven thousand pages make up the reference set and two courseware, totalling about 250 pages, make up the actual case study. The experiment showed that our measures correspond to the kind of quality we might expect
Keywords :
hypermedia markup languages; information resources; log normal distribution; statistical analysis; text analysis; Web document analysis; courseware; hyperlinks; quality metrics; statistical analysis; statistical distributions; Computer aided software engineering; Courseware; Distributed computing; HTML; Performance evaluation; Proposals; Statistical analysis; Statistical distributions; Text analysis; Web pages;
Conference_Titel :
Database and Expert Systems Applications, 2001. Proceedings. 12th International Workshop on
Conference_Location :
Munich
Print_ISBN :
0-7695-1230-5
DOI :
10.1109/DEXA.2001.953075