By Lisa Schiff, eScholarship Publishing Program Technical Lead
The California Digital Library (CDL) is pleased to announce the availability of an extensive self-guided tutorial for its eXtensible Text Framework (XTF) application. XTF is an open source, highly customizable piece of software supporting the search, browse, and display of heterogeneous digital content and offering efficient and practical methods for creating customized end-user interfaces for distinct digital collections. The tutorial provides guidance for implementing and customizing XTF, from core functionality to overall look and feel.Downloads for the Mac and Windows operating systems are available from the XTF Project page on SourceForge, along with the complete distribution and documentation.
The tutorial comes with a complete XTF package that is ready to run when uncompressed; no other installation is required. It contains nine modules spanning the most powerful and popular features, including how to:
Add new content
Change metadata
Change logo and colors
Increase significance of titles in ranking hits
Customize and enable default status of advanced search
Change fields displayed in search results
Enable structural searching
Create a hierarchical facet
Change footnote behavior
XTF Background and Overview
Since first developing and deploying this indexing and display technology in 2005, the CDL has worked to build and maintain XTF as a highly customizable application built upon tested components already in use by the digital library and search communities - in particular the Lucene text search engine, Java, XML, and XSLT. By coordinating these pieces in a single platform that can be used to create multiple unique applications, the CDL has succeeded in dramatically reducing the investment in infrastructure, staff training, and development for new digital content projects.
XTF offers the following core features out of the box:
Easy to deploy: Drops directly in to a Java application server such as Tomcat or Resin; has been tested on Solaris, Mac, Linux, and Windows operating systems
Easy to configure: Can create indexes on any XML element or attribute; entire presentation layer is customizable via XSLT
Robust: Optimized to perform well on large documents (e.g., text that exceeds 10MB of encoded text); scales to perform well on collections of millions of documents; provides full Unicode support
Extensible:
Works well with a variety of authentication systems (e.g., IP address lists, LDAP, Shibboleth)
Provides an interface for external data lookups to support thesaurus-based term expansion, recommender systems, etc.
Can power other digital library services (e.g., XTF contains an OAI-PMH data provider that allows others to harvest metadata, and an SRU interface that exposes searches to federated search engines)
Can be deployed as separate, modular pieces of a third-party system
By Lisa Schiff, Technical Lead for CDL Publishing Services
The California Digital Library (CDL) is pleased to announce a new release of its search and display technology, the eXtensible Text Framework (XTF) Version 2.1. XTF is an open source, highly flexible software application that supports the search, browse and display of heterogeneous digital content. XTF offers efficient and practical methods for creating customized end-user interfaces for distinct digital content collections.
Highlights from the 2.1 release include:
Extensive interface improvements, including new search forms, built-in faceted browsing, and new look and feel.
Increased support for document and information exchange formats.
XHTML and OAI-PMH output
NLM article format indexing and output
Microsoft Word indexing
Streamlined XSLT stylesheets for simpler deployment and adaptation.
Updated documentation that has been moved to the XTF project wiki, allowing XTF implementers to share solutions with entire user community.
"Freeform" Boolean query language, offered as an experimental feature.
Backward compatibility with existing XTF implementations.
Since the first deployment of XTF in 2005, the development strategy has been to build and maintain an indexing and display technology that is not only customizable, but also draws upon tested components already in use by the digital library and search communities - in particular the Lucene text search engine, Java, XML, and XSLT. By coordinating these pieces in a single platform that can be used to create multiple unique applications, CDL has succeeded in dramatically reducing the investment in infrastructure, staff training and development for new digital content projects.
XTF offers a suite of customizable features that support diverse intellectual access to content. Interfaces can be designed to support the distinct tools and presentations that are useful and meaningful to specific audiences. In addition, XTF offers the following core features:
Easy to deploy: Drops directly in to a Java application server such as Tomcat or Resin; has been tested on Solaris, Mac, Linux, and Windows operating systems.
Easy to configure: Can create indexes on any XML element or attribute; entire presentation layer is customizable via XSLT.
Robust: Optimized to perform well on large documents (e.g., a single text that exceeds 10MB of encoded text); scales to perform well on collections of millions of documents; provides full Unicode support.
Extensible:
Works well with a variety of authentication systems (e.g., IP address lists, LDAP, Shibboleth).
Provides an interface for external data lookups to support thesaurus-based term expansion, recommender systems, etc.
Can power other digital library services (e.g., XTF contains an OAI-PMH data provider that allows others to harvest metadata, and an SRU interface that exposes searches to federated search engines).
Can be deployed as separate, modular pieces of a third-party system (e.g., the module that displays snippets of matching text).
Powerful for the end user:
Spell checking of queries/li>
Faceted displays for browsing
Dynamically updated browse lists
Session-based bookbags
These basic features can be tuned and modified. For instance, the same bookbag feature that allows users to store links to entire books, can also store links to citable elements of an object, such as a note or other reference.
A sampling of XTF-based applications both within and outside of the CDL include:
Mark Twain Project Online (http://www.marktwainproject.org), developed by the Mark Twain Papers Project, the CDL and the University of California Press.
Calisphere (http://www.calisphere.org/), a curated collection of primary sources keyed to the curriculum standards of California’s K-12 community, developed by the CDL.
The Encyclopedia of Chicago (http://www.encyclopedia.chicagohistory.org/), developed by the Chicago History Museum, The Newberry Library, and Northwestern University.
The "CDL Guidelines for Digital Objects, Version 2.0" (CDL GDO) has been updated to include specifications for use of the METS File <file> element. You can find the updated version at http://www.cdlib.org/inside/diglib/guidelines/ .
The revision applies to Sections 2.1, 2.2.2, 3.1, and 3.2.4 only:
To support the orderly transmission and ingest of digital objects, the CDL recommends the inclusion of checksum (MD5, SHA-1, or CRC32) and byte size values in the METS File <file> element. Note that this information is preferred, but not required.
The subheadings within Sections 2.1 and 3.1 have been relabeled, and are now consistently based on METS element names.
Please contact the CDL at oacops@ucop.edu if you have any questions.
By Adrian Turner, CDL Data Acquisitions consultant
The "CDL Guidelines for Digital Objects, Version 2.0" (CDL GDO) has been updated to reflect modified requirements for METS unique identifiers. You can find the updated version at http://www.cdlib.org/inside/diglib/guidelines/ .
The revision applies to Section 3.1 only, and pertains to objects submitted for the CDL’s "Enhanced Service Level”. This service level encompasses the presentation of digital assets via CDL websites. It is also sufficient for increased preservation services in the UC Libraries Digital Preservation Repository.
The METS top-level <mets> tag must contain an OBJID attribute containing an ARK identifier for the digital object. Previously, the CDL GDO indicated that the OBJID attribute could contain a unique local identifier in lieu of an ARK identifier. CDL systems do not support this scenario, however, for objects submitted for the Enhanced Service Level only.
Position title: Manager, Infrastructure and Applications Support (Req. 20070286)
Position location: Oakland, California
Closing date: July 19, 2007
Reporting to the University Librarian and Executive Director, the Manager of Infrastructure and Applications Support is responsible for the technical design, implementation, maintenance, and operations of the common technology enterprise services that support all program and service areas. The Manager is responsible for the Computing and Storage Resource Center comprising a distributed network of CDL-owned resources at three physical locations and for managing the overall integration architecture for computing systems, database management systems, storage systems and network infrastructure. The Manager also provides support to application developers in the CDL’s program areas for the development, staging and production environments, and for collaboration tools supporting the work of CDL and its partners.
Welcome to the new and improved CDLINFO. Written primarily for University of California librarians and staff, this electronic newsletter provides updates about California Digital Library projects, initiatives, and newly available electronic resources.
Beginning in 2007, CDLINFO was reformatted in order to help CDL’s latest news reach users speedily via RSS feeds for either all items that appear in any month or for items in specific areas only. See Subscribe to CDLINFO RSS for more information. CDLINFO will still be sent to subscribers via email, as usual; however, it will arrive monthly instead of biweekly.
If you would like to contribute to CDLINFO, please send email to Robin Davis-White at CDLINFO-SUBMISSIONS-L-Request@UCOP.EDU. If you would like to contact the CDL Web Production Team, email Eric Satzman at esatzman@ucop.edu.
The CDL and Digital Library Services Advisory Group (DLSAG) are pleased to announce the release of the final version of the CDL Guidelines for Digital Objects (CDL GDO), Version 2.0. The guidelines are available in HTML and PDF format at the following URL:
Digital materials of ever-increasing variety and complexity are seen to be worth collecting and preserving by memory organizations — libraries, archives, museums, etc. Materials include objects converted into digital form from existing collections such as manuscripts, maps, visual images, and sound files, as well as “born digital” materials such as web sites.
In order for the CDL to provide effective preservation and access services, these materials need to be represented in a uniform manner. The CDL GDO provides specifications for all new digital objects prepared by institutions for submission to the CDL. It is based upon and supersedes the “CDL Digital Object Standard, Version 1.0″ (May 2001) and the “OAC Best Practice Guidelines for Digital Objects, Version 1.1″ (January 2004).
The CDL GDO includes the following features:
Establishes “sliding scale” requirements, i.e., the more a digital object conforms to the guidelines, the more preservation and access services can be provided for it.
Provides specifications for preparing digital objects, comprising metadata and content files (e.g., digital images, text) packaged using the Metadata Encoding and Transmission Standard (METS) format.
Includes updated recommendations for digital image files.
A draft version of the guidelines was prepared from the fall of 2004 through the winter 2005. Feedback received from CDL contributing institutions was incorporated into this final version of the guidelines.
Thursday, December 14th, 2006 | Category: Technology
CDL announces the latest XTF release: version 1.9. The main feature of the new XTF release is greatly improved documentation. Almost all features are now fully documented, allowing users to take better advantage of the system.
Users of XTF version 1.9 will also find the following new features:
Stylesheets with a real HTTP redirect, to send the user’s browser to a different URL
Improved full-text scoring, file handling and numeric data searching
New query operator: multi-field AND, that requires *all* terms to be present, but in *any* of the listed fields. Default stylesheets now use this for a basic “keyword” search.
New query operator: orNear. This is like a typical OR query, except that when multiple terms are present in the same metadata field, their proximity is taken into account when scoring.
Experimental dynamic FRBR mode… see docs/experimental.html for details.
As always, please let us know if you encounter any bugs or problems with this release.
**We are also pleased to report that XTF continues to attract users across the globe. Most recently, it has been deployed at the Grupo de Estudos em Dereito das Telicomunicacoes, where it is being used to run searches on Brazilian acts of telecommunication law.
View the site at: http://www.gds.nmi.unb.br:8080/xtf/search
The California Digital Library (CDL) is pleased to announce the release of “7train”, an XSLT 2.0-based tool for generating Metadata and Encoding Transmission Standard (METS) files from standardized XML inputs. Version 1 of the open-source, platform-independent tool is available via Sourceforge at http://seventrain.sourceforge.net.
7train was designed to transform XML documents into METS files conforming to a specific METS profile. This initial implementation was designed with the goal of transforming exports from the CONTENTdm digital asset management tool (Version 4.0 and higher) into the CDL 7train METS profile, available at http://www.loc.gov/mets/profiles/00000010.xml, which is suitable for inclusion in CDL repositories. However, the tool can be customized to produce METS files from any kind of standardized XML document (e.g, OAI records).
The tool was developed through the CDL’s work in the “California Local History Digital Resources Project”, a multi-year LSTA grant-funded project that explores a model to aggregate, preserve, and provide permanent public access to digitized local history content via a statewide online access point.
The CDL invites METS implementors and CONTENTdm users to utilize and comment on the toolkit. The Sourceforge web page contains contact information for feedback.
Thursday, February 9th, 2006 | Category: Technology
The CDL Common Framework is an open, services-oriented technical architecture that provides an integrating framework for services related to digital libraries. As a layered architecture, it aims to separate front-end tools from back-end services and from underlying data storage so that different components can be reused in multiple applications, reducing the time and money it takes to develop and maintain code.