• DocumentCode
    420205
  • Title

    Machine learning for information architecture in a large governmental Web site

  • Author

    Efron, Miles ; Elsas, Jonathan ; Marchionini, Gary ; Zhang, Junliang

  • Author_Institution
    Sch. of Inf. & Libr. Sci., North Carolina Univ., Chapel Hill, NC, USA
  • fYear
    2004
  • fDate
    7-11 June 2004
  • Firstpage
    151
  • Lastpage
    159
  • Abstract
    We describe ongoing research into the application of machine learning techniques for improving access to governmental information in complex digital libraries. Under the auspices of the GovStat Project, our goal is to identify a small number of semantically valid concepts that adequately spans the intellectual domain of a collection. The goal of this discovery is twofold. First we desire a practical aid for information architects. Second, automatically derived document-concept relationships are a necessary precondition for real-world deployment of many dynamic interfaces. The current study compares concept learning strategies based on three document representations: keywords, titles, and full-text. In statistical and user-based studies, human-created keywords provide significant improvements in concept learning over both title-only and full-text representations.
  • Keywords
    Web sites; digital libraries; full-text databases; information retrieval; learning (artificial intelligence); user interfaces; GovStat Project; digital library; document representation; document-concept relationship; dynamic interface; full-text representation; information architecture; large governmental Web site; machine learning; statistical study; title-only representation; user-based study; Data mining; Data models; Information retrieval; Machine learning; Permission; Software libraries; Statistical analysis; US Government; USA Councils; User interfaces;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on
  • Print_ISBN
    1-58113-832-6
  • Type

    conf

  • DOI
    10.1109/JCDL.2004.1336112
  • Filename
    1336112