• DocumentCode
    3693962
  • Title

    Design of local web content observatory system

  • Author

    Gashaw Tsegaye;Solomon Atnafu

  • Author_Institution
    Department of Computer Science, Addis Ababa University, Addis Ababa, Ethiopia
  • fYear
    2015
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    The amount of information on the web is growing rapidly. However, considering a particular group or country, it is very difficult to know how much relevant web contents are published and which are in what language and on what specific subject. Knowing the status of local web content of a country or a culture is of critical importance for making a decision on policy and strategy design for the development of the multi-lingual and multi-cultural web. This research work is therefore to design a model for a local web content observatory system that measures the qualitative and quantitative content of different domains. The local web content observatory system consists of six components - the crawler, content extractor, statistical tracker, language identifier, Web document categorizer and report generator. Though the model developed is generic and can be applied to any country or culture, to test and evaluate the system, we have selected all domains hosted under the .et domain. Accordingly about two thousand seed URLs under the .et domain are used and the crawler collected around 263,031 web documents. The accuracy rate measures employed to the language identifier obtained a rate of 98.67%. To demonstrate the effectiveness of the local web content categorizer precision, recall and F-measures test were conducted and an average precision of 91.7%, a recall of 97.2% and an F-measures of 94.25% is obtained for English document and a precision of 91.7%, recall of 87.85% and F-measures of 86.65% obtained for Amharic document. The average accuracy rate of the statistical tracker is 98.72%.
  • Keywords
    "Crawlers","Observatories","Training","Search engines","Service-oriented architecture","Web pages","Accuracy"
  • Publisher
    ieee
  • Conference_Titel
    AFRICON, 2015
  • Electronic_ISBN
    2153-0033
  • Type

    conf

  • DOI
    10.1109/AFRCON.2015.7331964
  • Filename
    7331964