DataONE to Deal with Data Deluge

Friday, November 20th, 2009 | Category: Digital Preservation

By Patricia Cruse, Director, University of California Curation Center

Researchers at the University of California have partnered with dozens of other universities and agencies to create DataONE (http://dataone.org), a global data access and preservation network for earth and environmental scientists that will support breakthroughs in environmental research. DataONE (Data Observation Network for Earth) is one of two $20 million awards made this year as part of the National Science Foundation’s (NSF) DataNet program. The collaboration of universities and government agencies coalesced to address the mounting need for organizing and serving up vast amounts of highly diverse and inter-related but often incompatible scientific data. Resulting studies will range from research that illuminates fundamental environmental processes to identifying environmental problems and potential solutions.

The National Center for Ecological Analysis and Synthesis (NCEAS) at UC Santa Barbara, the Department of Computer Science and Genome Center at UC Davis, and the California Digital Library at the UC Office of the President are integrally involved in the NSF DataONE initiative. Across these UC partners, the several million dollar award will drive advanced research and data acquisition, storage, mining, integration, and visualization for DataONE. The resulting computing and processing “cyberinfrastructure” will be made permanently available for use by the broader UC community and international science communities. DataONE is led by the University of New Mexico, and includes additional partner organizations across the United States as well as from Europe, Africa, South America, Asia, and Australia.

Read more at the UC Newsroom, which sent out a press release on 11-18-2009.

The press release can also be found http://www.cdlib.org/ and http://www.cdlib.org/news/index.html.

HathiTrust Large Scale Search

Friday, November 20th, 2009 | Category: Collection Development

By Heather Christenson, CDL Mass Digitization Project Manager

Effective November 18th, the HathiTrust Digital Library is now providing full-text searching capabilities across the entire library of 4.6 million volumes (1.6 billion pages) in the collection. Researchers can now search public domain and in-copyright works by keyword or phrase.

Based on open source Solr/Lucene technology, the service expands on an experimental search of public domain volumes introduced in November 2008. The CDL Discovery & Delivery team participated in testing the full-text search ahead of this release.

Full-text search will continue to be supported across the repository as it grows at a rate of hundreds of thousands of volumes every month. The UC Libraries currently have over 750,000 digital volumes in the HathiTrust, and the number continues to grow.

UC is a founding member of the HathiTrust, a collaborative enterprise of 25 leading research libraries. UC participation is coordinated by the California Digital Library (CDL), which brings its extensive experience in digital curation and shared online services to the HathiTrust.

The HathiTrust large scale search is available at: http://catalog.hathitrust.org.

For more information, please see the official press release: http://www.hathitrust.org/press.

Follow John Muir on Twitter and Facebook

Thursday, November 19th, 2009 | Category: Digital Special Collections

By Sherri Berger, Digital Special Collections Program Coordinator

This December, hear renowned California writer and naturalist John Muir (1838-1914) in his own words as he travels to California, encounters Yosemite for the first time, and works to preserve the open land he calls home.

To raise awareness of Muir’s newly digitized letters, Digital Special Collections will be quoting portions of them on Calisphere’s Twitter and Facebook pages.  Each installment or “tweet” will contain a segment of Muir’s stirring prose and a link to the original document and transcript.  The story will unfold over one week, starting December 1.

To hear Muir’s story, become a fan on Facebook (www.facebook.com/calisphere) or follow us on Twitter (www.twitter.com/calisphere).  Not a member of either network?  No problem—both accounts are open for viewing by all.

After the event, check back in on Calisphere’s social networking pages to stay up-to-date on new content and developments, as well as learn about related news, tools, and resources scouted on the Web.  We also welcome your questions and comments in these new forums.

This online event aims to engage students, educators, and the general public with the recent online publication of more than 6,500 of Muir’s letters—a collaborative achievement of CDL, The Bancroft Library at the University of California Berkeley, and the University of the Pacific Library (Learn more).

Meet Stephen Abrams

Thursday, November 19th, 2009 | Category: Staff News

By Ellen Meltzer, Information Services Manager; Photo by Craig Thompson, Web Producer

Stephen Abrams

How extraordinary to have an undergraduate senior thesis portend the themes throughout one’s career! That’s the case for Stephen Abrams, CDL’s Senior Manager for Digital Preservation Technology who arrived at CDL in February of 2008.  (Members of the University of California Curation Center, UC3 (previously known as the Digital Preservation Program),of which Stephen is a member, also include Patricia Cruse, Scott Fisher, Erik Hetzner, John Kunze, Margaret Low, David Loy, Mark Reyes, Tracy Seneca, Marisa Strong and Perry Willet.)

Stephen provides leadership in guiding the UC3 primarily in 3 areas:

First, the Digital Preservation Repository (DPR).  The DPR is the primary technical infrastructure that manages long term retention of digital objects.  The DPR is moving to a new generation of software; the earlier software was originally designed nearly 6 years ago.  In the intervening years, Stephen points out that we’ve have learned a great deal about the best way to provide preservation services and are at the beginning of a major project to re-conceive and re-implement the repository.  One of main goals we’re trying to accomplish is to ensure the new repository will be more responsive to needs of customers, especially as our customers are becoming more varied, both in the types of units that contribute to the repository and the types of contents we’re preserving.  Traditionally we have worked closely with campus libraries to preserve cultural heritage texts and images.  More recently, we’ve expanded our scope to include new campus constituencies interested in data sets in the social and experimental sciences.

Stephen states that we need to expand our capacity to deal with new content types and an increasingly diverse set of users while still continuing to support our traditional users.  One way to do this is by a new conceptualization of the repository.  Previously, we thought of the repository as a large monolithic system or place, managed centrally.  That concept breaks down when dealing with diverse sets of content with diverse sets of requirements.  CDL is now working on devolving our preservation functions into a set of independent, but interoperable micro-services.  Since each is small and self-contained, they are collectively easier to develop, maintain, and enhance.  Although each is narrow-scoped in function, complex behavior can nevertheless emerge through the strategic combination of the services.

Second, Stephen oversees the Web Archiving Service (WAS), keeping an eye on it to ensure that it remains consistent with our other initiatives. The Web Archiving Service, ably run by Web Archiving Coordinator Tracy Seneca, has been in operation for about a year; recently, we began providing public access to web resources (see http://cdlinfo.cdlib.org/blog/2009/07/08/public-access-to-web-archiving-service-goes-live/).

Third, Stephen serves as lead on the multi-year, multi-institutional, NDIIPP-funded JHOVE2 initiative.  In this project, the CDL is collaborating with Stanford and Portico to develop a next- generation open source format-aware characterization system.  (At this point, I needed to ask what that was.) 

Stephen explained that characterization is an automated process of determining the significant properties of digital objects.  Any digital object is a representation governed by rules of format that specify syntactic and semantic requirements.  During characterization we can examine an object and, by being cognizant of the underlying format rules, we can extract the significant properties.  In a digital document, for example, we want to know the fonts used to be able to ensure that we can properly continue to display the text in the future. For digital images, we need to understand the way in which color is represented to ensure accurate reproduction.

JHOVE1, which Stephen helped create, was widely used in the preservation community; now it’s 5-6 years old and has some inadequacies.  One of the goals of JHOVE2 is to remedy that, and to provide new features.

Characterization becomes important when operating a Preservation Repository.  Sometimes it’s clear what format you’re expecting to receive—depositors can tell you in great detail; other times you don’t know what you have until it arrives.  It’s useful, still, to verify what you did actually receive; people and systems make mistakes. Sometimes you get things you don’t expect.  Characterization also helps to categorize items in order to take advantage of efficiencies by automating processes.  This can only be done effectively if parallel workflows are properly classified.  Characterization is a way to decide which workflow something goes into.  Audio files are different from documents; color images are different from bi-tonal ones.  This is far more than you may want to know on these subjects, but Stephen is someone who is passionate about what he does and I felt he could have continued to speak rapturously about these subjects.

Immediately before arriving at CDL, Stephen served as Digital Library Program Manager at Harvard University Library. And prior to his work at Harvard, he spent 9 years at MIT working as a research engineer in the Department of Ocean Engineering where he worked on grant-funded software for the design and manufacture for naval vessels.  His expertise was on scientific and engineering visualization, where he turned numbers into pictures.  As the Cold War wound down in the late eighties, there were fewer funding sources for these projects.  He began working on information retrieval problems for the Department of Commerce and Interior.  The information retrieval problems lead Stephen to the world of digital libraries.

It was hard for me to imagine that even before this, Stephen spent 9 years at a small company in Pennsylvania: Swanson Analysis Systems—leading developers of finite element analysis used in structural analysis.  There he also worked on the development of engineering visualization solutions.

Now, back to where we began.  Stephen’s undergraduate thesis was on a problem in celestial mechanics — the Three-body problem (I encourage you to look this up in Wikipedia, or elsewhere). One aspect of his research was to develop a graphics display system, in which he had to program the math involved and program for visualization.  With an undergraduate degree in mathematics from Boston University and a Master’s Degree in art and architecture from Harvard, Stephen went looking for work on the scientific side of the two choices “It pays better”, he quipped.  The themes that interested him in his undergraduate thesis have followed him throughout his career.

Stephen was aware for some time of the interesting and innovative work going on at the CDL, the University of California, and partner institutions. Coming here provided Stephen with the opportunity to apply himself more deeply to the “incredibly important” problems in digital preservation.  Of course, transplanted easterners always are drawn by the weather, but there were many things professionally and personally that drew him here.

The challenges are real: There is more useful work that could be done than time to do it.  The main thing is trying to prioritize appropriately—you put together a multi-year road map so that we can be where we need to be at the end of the day; approaching larger problems through small incremental steps.   In addition, he finds there’s such a broad constituency at UC with people working on amazingly innovative things.  Attempting to come up with comprehensive and effective solutions for any one thing can be a great challenge–just trying to ensure our services remain responsive to users as their needs are known now and as they change is daunting.  We’re so glad Stephen is on board to help tackle these demanding issues.

Next Generation Melvyl Pilot Enhancements - November 8, 2009

Thursday, November 12th, 2009 | Category: Bibliographic Services

By Ellen Meltzer, CDL Information Services Manager

Several enhancements were made to Next Generation Melvyl with OCLC’s Sunday, November 8, install
The changes include:

  • Parenthetical (Boolean) support in search queries.  You can now use parentheses to create more precise searches.  A search on dog (walking OR feeding OR grooming) will return results for dog walking OR dog feeding OR dog grooming.
  • Additional custom web links.  With the approval of UC Heads of Public Services (HOPS), two additional links soon will be added for each campus in the dropdown menu under the library name (e.g., UCR Libraries).  These will link to the campus Article Database and E-journal links, among the most heavily used links in current Melvyl.
  • Improvements in treating some item types as a different item type by configuring certain tags, subfields and/or values contained in the data.
  • Improvements to “Browse similar items” in the “carousel” on the “Similar items” section of the detailed record.
  • Changes to Details section for remote database records.
  • Upon saving a search, users will now receive confirmation that their search has been saved and will see a link to their profile page to view their saved searches.

Please see the PDF for more details.

Mellon Planning Grant Awarded to UC Libraries for a Western Regional Storage Trust

Tuesday, November 3rd, 2009 | Category: Collection Development

Emily Stambaugh, CDL Manager of Shared Print

The Andrew W. Mellon Foundation has awarded the University of California Libraries a nine month planning grant to organize the “Western Regional Storage Trust (WEST)” — a shared print repository service, focused initially on retrospective journal archives. UC Libraries in collaboration with regional library partners will band together to prepare service models for consolidating print journal holdings in responsible ways to provide efficiencies throughout the libraries. WEST is envisioned as a robust partnership that will allow libraries to build and manage a cooperative regional archive at the network level.  The proposal calls for library leaders in the Western Region of the United States to convene in Oakland, CA to (1) design operating, governance, and business models to support cooperative print archives among diverse partners; (2) establish standards for low-level validation and disclosure; and (3) develop selection criteria incorporating risk-management principles to ensure persistence within the broader context of similarly intentioned national and international efforts.

Library leaders from UC, Orbis-Cascade, GWLA, SCELC, Stanford, CalTech, Occidental College and more will band together to design the Western Regional Storage Trust incorporating sustainable models for participation amongst a wide variety of partners.  We are pleased to announce that Lizanne Payne, Director of the Washington Research Library Consortium, will serve as the consultant for this process.

For more information, please contact Emily Stambaugh, CDL Shared Print Manager.

The Bibliographic Services Team has a new name

Tuesday, November 3rd, 2009 | Category: Bibliographic Services

By Patricia Martin, Director, Discovery & Delivery

Do you know what the name Bibliographic Services means? We couldn’t agree on it either, so we decided to change the name of our team from Bibliographic Services to a name that better describes the services we provide as a CDL team.  The team that brings you Melvyl®, Next Generation Melvyl®, UC-eLinks, and Request is now the Discovery & Delivery team.

How is this name more relevant to what we’re doing? We’ve seen a shift in the way scholars do research. Discovery and delivery are tightly aligned services — researchers expect access to publications at the same time as they find them. The core library services we provide extend beyond managing bibliographic data — we’re connecting people to what they want. UC-eLinks, for example, is a popular web application that provides UC faculty and students with a quick and reliable way to link directly to articles from the library catalog or other sites like PubMed or Google Scholar.

What can you expect from the Discovery & Delivery team looking forward? We’re building on our years of metadata expertise and expanding further into the delivery realm. We’re exploring new territory in collaborative initiatives like Hathi Trust, where members of our team recently implemented an open-source page turner for the mass digitized books on the Hathi Trust website.

Where will you see our team’s new name? You will see Discovery & Delivery on the CDL website early next year. Our name is 100% acronym free, but you can call us the D&D team for short. We have already garnered several nicknames, including “disco-tech” for our technical team.

Powered by WordPress and CDL Web Production