University of California eScholarship® Repository Exceeds 5 Million Full-Text Downloads; 20,000 Papers

Tuesday, January 15th, 2008 | Category: Digital Publishing

By Catherine Mitchell, CDL Acting Director of Publishing Services

The University of California announced this week that its widely-used eScholarship® Repository has surpassed the 5 million mark for full-text downloads of its open access scholarly content.  This major milestone reflects the impressive adoption and usage rate the Repository has enjoyed since its inception in 2002, with University of California academic units and departments from its 10 campuses publishing or depositing over 20,000 papers and works.

The eScholarship Repository, a service of the California Digital Library, provides a robust full-spectrum, open access publishing platform for pre-prints, post-prints, peer-reviewed articles, edited volumes and peer-reviewed journals.  The Repository houses a broad range of scholarly content from disciplines across the Humanities, Social Sciences, Mathematics and Sciences.

The rate of usage of these materials has grown exponentially in the past 5 years, now often exceeding 55,000 full-text downloads per week.

As evidenced by this rate of activity, the eScholarship Repository represents one of the University of California’s most successful and sustained efforts to improve and provide innovative alternatives to the troubled scholarly publishing system – a system that increasingly struggles to serve the needs and requirements of the academic community.

“We’re very excited about the uptake and use of the eScholarship Repository at the University of California,” says Catherine Candee, Executive Director, Strategic Publishing and Broadcast Services at UC’s Office of the President.  “Our open access publishing platform represents a critical component of UC’s broader effort to strengthen university-based publishing services and integrate them into the research, teaching and public service mission of the University.”

Part of a suite of innovative publishing services developed by the CDL in recent years, the eScholarship Repository serves the scholarly publishing needs of individual faculty and academic departments, laboratories and research units across the University of California system.  It is also a central mechanism in the collaborative publishing efforts between the CDL and the University of California Press.

University of California launches Mark Twain Project Online

Thursday, November 8th, 2007 | Category: General, Digital Publishing

Access to texts, notes, and facsimiles available online at no charge to institutions or individuals

University of California is pleased to announce the launch of the beta version of the Mark Twain Project Online (www.marktwainproject.org), a digital critical edition of the writings of Mark Twain.

The Mark Twain Project Online (MTPO) applies innovative technology to more than four decades of archival research by expert editors at the Mark Twain Project.  It offers unfettered, intuitive access to reliable texts, accurate and exhaustive notes, and the most recently discovered letters and documents.

MTPO is a joint undertaking of the Mark Twain Papers and Project, the California Digital Library, and University of California Press.  It is funded in part by a generous grant from the National Endowment for the Humanities to the Mark Twain Project, and is supported by a number of institutions and individuals.  The Mark Twain Foundation, a perpetual charitable trust that possesses the publication rights to all of Mark Twain’s writings, has given UC Press and the Mark Twain Project Online exclusive rights to publish copyright-protected writings by Mark Twain, both in print and electronically.

At beta launch, the site will include more than twenty-three hundred letters written between 1853 and 1880, including nearly 100 facsimiles of originals.  Users will also be able to search for information about Mark Twain’s complete correspondence across his entire life, including letters to him and his family. In future years, the site will release more of the nearly ten thousand known letters, including many never-before published; electronic editions of many of Mark Twain’s most famous literary works; the most complete catalog of Mark Twain’s writings currently available; and, in 2010, Mark Twain’s Autobiography, never before published in its complete form.

"The Mark Twain Project Online is an extraordinary resource for scholars, teachers, and ordinary readers.  Materials that previously could be examined only by scholars fortunate enough to be able to visit the Mark Twain Project in The Bancroft Library at UC Berkeley will now be available worldwide to anyone with an interest in Mark Twain—and that’s a cause for celebration," Shelley Fisher Fishkin, author of Lighting Out for the Territory: Reflections on Mark Twain and American Culture, said.

The customizable interface provides a powerful reading and research experience.  The site offers users unprecedented access to authoritative transcriptions of Mark Twain’s writings and the ability to compare those transcriptions side by side with facsimiles when available. Researchers can gather and store digital citations and links to selected documents, images, and other resources.  These features are supported, in large part, by the California Digital Library’s eXtensible Text Framework (XTF) and the ongoing work of the Textual Encoding Initiative (TEI).

The Mark Twain Project Online demonstrates the great advantages of digital presentation and will be a model for future digital scholarly work.  “The Mark Twain Project Online is an exciting initiative that will make a fundamental literary and biographical archive available to scholars and students.  MTPO offers easy access through a sophisticated web interface that is growing and comprehensive scope.  This project has the potential to become a model for Web accessibility to foundational scholarly resources,” Richard Terdiman, author of Body and Story: The Ethics and Practice of Theoretical Conflict, said.

View the Mark Twain Project Online and access information about the making of this landmark online publication, by visiting http://www.marktwainproject.org.  You can also contact Catherine Mitchell (Catherine.Mitchell@ucop.edu; 510.587.6132), Acting Director of CDL’s eScholarship Publishing Group for additional information.

Digital Preservation News

Wednesday, October 17th, 2007 | Category: Digital Preservation, Digital Publishing

By Trisha Cruse, CDL Director of Digital Preservation

The CDL Digital Preservation Group has been busy with a variety of exciting activities, reported below.

Release 4 of the Web Archiving Service
On September 18th the Web Archiving Group released a new version of the Web Archiving Service – special thanks to Tracy Seneca, Scott Fisher, Margaret Low, Erik Hetzner, Mark Reyes, and Mike Wooldridge for getting this release out the door.  So far the group has received very positive feedback from users on the service’s functionality and the user interface.  We are also extremely pleased with the performance; we are up to 500 captures with relatively few hiccups.

We have also put together an overview of the service that is available on YouTube <http://tinyurl.com/2tdrwq>.  This brief overview explains why the content targeted for this project is at risk, how we plan to address this in the Web Archiving Service, and provides an explanation of the collections our curators are working on. Warning: the YouTube video quality is a bit sketchy so we have also made this presentation available in a high-quality video format; contact tracy.seneca at ucop dot edu for further information.

A kinder and gentler ARK page
Thanks to Kirsten Neilsen and John Kunze there is now a kinder, gentler introduction to ARK identifiers on Inside CDL <http://www.cdlib.org/inside/diglib/ark/>.  Don’t know what that is?  Then definitely take a look.  Our hope is that this will help others recognize and appreciate the true beauty and splendor of ARKs.  The new page has already been re-purposed in a German "technology watch" newsletter, <http://www.kim-forum.org/techwatch/kim-dini-technology-watch-report1_2007.pdf> which is the very first edition of a bi-annual publication from the Interoperable Metadata Center for Excellence and the German Networked Information Initiative.

Tidal wave of web data knocking on our door
For the past several years the Digital Preservation group has been working with Andreas Paepcke and Hector Garcia-Molina at Stanford University on web crawling activities.  Their research group has a wealth of experience collecting web data and while CDL’s Digital Preservation group was getting their “web crawling sea legs” they asked Stanford’s group to collect data on our behalf.  Over the years Stanford has collected over 100 TB of data ranging from dot.gov sites, election data, Katrina, Virginia Tech tragedy, etc.  However, they have been using a different crawler than the Web Archiving Service (WAS) crawler (Heritrix).  As a consequence their crawler output is incompatible with most web archiving services, including ours.  However, there is good news — they have recently created a tool that will turn the output of their crawler data into something that CDL’s service can understand.  Erik Hetzner, Mike Wooldridge, and Scott Fisher are just beginning to play around with this, but we are hoping for a positive outcome.

Contributing to the community by documenting Heritrix
As mentioned above, our Web Archiving Service uses Heritrix, the Internet Archive’s (IA) open-source, extensible, web-scale, archival-quality web crawler project.  "Heritrix" (often misspelled heretrix, heratrix, heritix, etc.) is an archaic word for "heiress", which the IA chose because the project seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations.  One of the challenges of using Heritrix is that there is a dearth of documentation.  Over the next several months Hunter Stern, CDL’s technical writer, will be working with Heritrix programmers at CDL and IA to better document the crawler.  This collaboration will help us tremendously and benefit the crawler community as well.

Moving big data: Mass Transit Project
Over the past couple of years the Digital Preservation Group has been working with the campuses to move large chunks of content into the Digital Preservation Repository (DPR).  In the process we have encountered a few speed bumps along the way. The issues are two-fold but related: the files are large and the network transfer rates have been unaccountably slow.  Though we have worked towards resolving this, we have more work to do in understanding the best transfer tools and in monitoring our networks to make sure there are no log jams and that they are ready to be used to their full potential bandwidth.  The goal is to make sure we’re making the best use of our Internet2 pathways to/from the campuses and the data centers for the benefit of all CDL projects.

The Digital Preservation group has embarked on two efforts to speed up movement of large files into the DPR.  First, they are collaborating with San Diego Supercomputer Center (SDSC) to understand how to transfer data across the network more quickly and efficiently.  Second, they are implementing (on a trial basis) a method of pulling in large numbers of external data objects into a kind of preservation holding tank in order to reduce the impact of network speed and latency on the overall DPR ingest process.  They are very excited about the collaboration with SDSC and Kirsten Neilsen will be leading the project for CDL – we’re calling the project “Mass Transit” and there is a project Wiki <http://masstransit.sdsc.edu/>.

If you want any additional information on any of these projects please contact Trisha Cruse (patricia.cruse@ucop.edu).

Powered by WordPress and CDL Web Production