DataONE to Deal with Data Deluge

Friday, November 20th, 2009 | Category: Digital Preservation

By Patricia Cruse, Director, University of California Curation Center

Researchers at the University of California have partnered with dozens of other universities and agencies to create DataONE (http://dataone.org), a global data access and preservation network for earth and environmental scientists that will support breakthroughs in environmental research. DataONE (Data Observation Network for Earth) is one of two $20 million awards made this year as part of the National Science Foundation’s (NSF) DataNet program. The collaboration of universities and government agencies coalesced to address the mounting need for organizing and serving up vast amounts of highly diverse and inter-related but often incompatible scientific data. Resulting studies will range from research that illuminates fundamental environmental processes to identifying environmental problems and potential solutions.

The National Center for Ecological Analysis and Synthesis (NCEAS) at UC Santa Barbara, the Department of Computer Science and Genome Center at UC Davis, and the California Digital Library at the UC Office of the President are integrally involved in the NSF DataONE initiative. Across these UC partners, the several million dollar award will drive advanced research and data acquisition, storage, mining, integration, and visualization for DataONE. The resulting computing and processing “cyberinfrastructure” will be made permanently available for use by the broader UC community and international science communities. DataONE is led by the University of New Mexico, and includes additional partner organizations across the United States as well as from Europe, Africa, South America, Asia, and Australia.

Read more at the UC Newsroom, which sent out a press release on 11-18-2009.

The press release can also be found http://www.cdlib.org/ and http://www.cdlib.org/news/index.html.

iPRES 2009 hosted by CDL

Thursday, October 15th, 2009 | Category: General, Digital Preservation

By Perry Willett, CDL Digital Preservation Services Manager

On October 5-6 2009, over 300 people from 22 countries attended iPRES 2009 at the Mission Bay Conference Center on the UCSF Mission Bay campus. iPRES 2009 was the sixth in an annual series of conferences devoted to digital preservation, and with the 300 attendees, the largest ever.  This year’s conference was coordinated by the UC Curation Center (the new name of CDL’s Digital Preservation Program), with the program committee chaired by Trisha Cruse.  Perry Willett was the project manager for the conference; Beaumont Yung and Rondy Epting-Day provided significant administrative support, as did Megan Amaral, CDL’s student intern from SJSU.

The theme of this year’s conference was "Moving into the mainstream.  Enabling our digital future."  The program was packed with thoughtful and thought-provoking presentations on all aspects of digital preservation. Some notable presentations included keynote addresses by David Kirsch (University of Maryland) on public interest in corporations’ business archives; Micah Altman (Harvard University) on public archives for scientific data; and a panel discussion by members of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access.

The conference was extremely successful, and we’ve received much "iPraise." In addition to hosting the conference, this was an important opportunity for CDL to showcase our recent work on curation micro-services and web archiving, and to speak with current and potential partners interested in working with us.  The paper by Stephen Abrams, John Kunze and David Loy (delivered by Stephen) on curation micro-services was particularly well-received, with several highly positive Twitter tweets and blogposts during and after the conference.

CDL staff participated in many ways, with Stephen Abrams, Trisha Cruse, John Kunze, Tracy Seneca, and Perry Willett serving on the program committee.  Stephen and Tricia were also presenters at the conference, and Tracy and Heather Christenson gave poster sessions.  Many people served as reviewers for the program, including (in addition to the CDL staff members already mentioned) Scott Fisher, Martin Haye, Erik Hetzner, John Ober and Lisa Schiff.  Many people from UC campuses also served as reviewers and helped with the local arrangements.  Thanks to all of them for their contributions.

In addition to the scholarly program, we had a full slate of social events including the conference reception at the California Academy of Sciences on Monday evening.  During the conference, Rick Prelinger presented "Lost Landscapes of San Francisco," a film that includes rare footage of the city from newsreels, industrial documentaries and amateur films.  A special "après iPRES" event was held at the Hi Dive in San Francisco on Tuesday evening after the conference.  The conference was part of what was informally called "digital preservation week in San Francisco," with additional events and meetings later in the week sponsored by the International Internet Preservation Consortium, the JHOVE2 project, and Sun Microsystems.

The conference website at http://www.cdlib.org/iPres contains the complete program, along with an archive of Twitter tweets by conference attendees, photographs from the conference and social events on Flickr, and (eventually) video, PowerPoint and full papers from the presentations.  See the Amplified Conference page for photos, blogposts and tweets.

Special thanks go to the vendors who supported the conference: Sun Microsystems, Isilon Systems, ExLibris, Institute of Museum and Library Services (IMLS), Tessella, the Joint Information Systems Committee (JISC), NetApp, FileTek, Library of Congress and the National Digital Information Infrastructure and Preservation Program (NDIIPP), and DuraSpace.  Their support went a long way toward underwriting the costs of the conference.

Public Access to Web Archiving Service Goes Live

Wednesday, July 8th, 2009 | Category: General, Digital Preservation

By Tracy Seneca, Web Archiving Coordinator

The California Digital Library is pleased to announce public access to its Web Archives.  CDL’s Web Archives are built and published using its Web Archiving Service (WAS), which enables librarians to capture, curate, and preserve websites for the benefit of researchers and the general public.  New archives are continually being built and published, and will appear along with the current archives available at http://webarchives.cdlib.org/.

This first set of archives includes materials from the California state government agencies, and local government agencies from Orange County, San Diego, Los Angeles and more.  Also included are archives of Middle Eastern political organizations, American left-wing organizations, and web content related to events such as the 2007 Southern California Wildfires and the 2003 California Recall Election.

As government agencies and public policy organizations increasingly turn to the web as a primary means of publication, libraries are challenged to provide lasting access to the budgets, studies and reports that they have long collected for the benefit of the research community.  The WAS service also allows libraries to expand their collecting scope to more ephemeral materials such as press releases, local commission meeting minutes, public forum sites, and blogs.  All of these provide a glimpse of history in the making for future researchers.  The value of this service is described in a recent Chronicle of Higher Education article, Scholars Race to Preserve Guantánamo Records, which focuses on an archive currently being built by New York University using CDL’s Web Archiving Service.

The Web Archiving Service also enables the University of California (UC) libraries to work collaboratively on a monumental task: archiving the web sites of the State of California.   The State of California web domain (.ca.gov) represents the third largest subdomain of the U.S. government web presence.  The UC campuses have worked individually to capture and archive local information, and collectively to archive state publications.  Together, these archives represent a major achievement and a series of rich resources for California researchers.  These archives can also provide lasting access to the individual state publications that are catalogued and made available via UC’s Melvyl catalog.

The archives represent the culmination of the Web-at-Risk grant, funded by Library of Congress’ National Digital Information Infrastructure Preservation Program, and led by the California Digital Library.  With our grant partners at the University of North Texas and New York University, and our curatorial partners at the UC campuses, NYU and Stanford University, we are embarking on new era in collection building for libraries.

If you have any questions about the archives or about the Web Archiving Service, please contact washelp@ucop.edu.

CDLers in Print

Wednesday, May 27th, 2009 | Category: Digital Preservation

By Ellen Meltzer, Manager, Information Services

Patricia Cruse, director of the CDL Digital Preservation Program, and Beth Sandore, associate university librarian for information technology planning and associate dean of libraries at the University of Illinois Library at Urbana-Champaign (who completed a rotation to the CDL in 2007) have edited a special issue of Library Trends.  The issue is comprised of sixteen articles that tell fascinating stories about the ground-breaking efforts of numerous partners within the Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP).  Also included is an article by CDL’s Web Archiving Service Manager, Tracy Seneca, entitled, The Web-at-Risk at Three: Overview of an NDIIPP Web Archiving Initiative.

Since its inception in 2004, NDIIPP has grown from an experimental program into a true partnership of concerned organizations working together to sustain access to digital information critical to scholarship and cultural heritage nationwide.

Congratulations, Trisha and Tracy!

End of Bush’s Term: Will It Disappear from the Web?

Friday, September 5th, 2008 | Category: General, Digital Preservation

By Hunter Stern, CDL Technical Writer

Will the Homeland Security and No Child Left Behind websites disappear on January 20th 2009?  The answer might surprise you.  January 20th 2009 will mark the beginning of a new presidential administration and the coincident end of the current administration, putting much of the online material related to its policies and initiatives at risk.  According to the Washington Post, “Many federal agency records exist only in digital form and are in danger of disappearing when the administration changes” (August 20, 2008).

The University of California community, not to mention scholars the world over, require perpetual access to these online materials in the normal conduct of research, teaching, and learning. Even without a change in administration, government records stored in digital form are notoriously volatile.  Web pages on government sites have an average life span of only 44 days.

To ensure that the historical record of the current administration is not lost, a partnership of government and nonprofit agencies has taken responsibility for its preservation.  The University of California – California Digital Library (CDL), in partnership with the Library of Congress, the Government Printing Office (GPO), the Internet Archive (IA), and the University of North Texas Libraries (UNTL) are planning the harvest and archival storage of more than 100 million US government web pages from the second George W. Bush administration.  This effort will involve the comprehensive harvest of the .gov domain as well as focused Web harvests of specific government agencies.  The goal is to conduct a broad capture of all Federal government Web sites, and a deep capture of specific high-priority sites that have been chosen by the project’s curators.  Each partner plays a critical role in the project.

The California Digital Library, a recipient of a Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) grant, leads the Web-at-Risk project, a goal of which is “to develop tools that enable librarians and archivists to capture, curate, preserve, and provide access to web-based government and political information.”  These tools will be put to use doing deep crawls of specific government agencies ranked as priority sites by the project’s curators.  In addition to CDL, UNTL will be responsible for conducting deep crawls.

The broad crawl will be the responsibility of the Internet Archive, a non-profit group providing universal and permanent access to digital information for educators, researchers, and the general public.  IA will use its advanced Web-crawling software, called Heritrix, to capture the intended sites.

In order to prioritize the vast list of URLs included in the scope of the crawl, the University of North Texas has designed a software tool that allows curators to nominate URLs for harvest and tag them with numeric rankings.

The Library of Congress, which has preserved congressional Web sites since December 2003, will focus on developing the overall harvesting plan.  The GPO and the libraries in its Federal Depository Library Program will assist in the curation process.

For more information on the End of Term project contact Patricia Cruse (patricia.cruse@ucop.edu), Director, Digital Preservation Program.

CDL Staff at IFLA Conference

Thursday, September 4th, 2008 | Category: Digital Preservation

By Stephen Abrams, CDL Manager for Digital Preservation Technology

Margaret Low, software engineer in Digital Preservation, recently returned from the IFLA (International Federation of Library Associations and Institutions) Annual Conference in Quebec, where she presented a very well-received paper on the CDL preservation infrastructure.  The paper is available for download at: http://www.ifla.org/IV/ifla74/papers/084-Low-en.pdf.

CDL Staff in Print

Tuesday, July 15th, 2008 | Category: Digital Preservation

By John Kunze, CDL Preservation Technologies Architect

CDL’s Preservation Program staff member Erik Hetzner presented a paper at JCDL (Joint Conference on Digital Libraries) in Pittsburgh, Pennsylvania last month.  His paper is entitled, "A simple method for citation metadata extraction using hidden Markov models" and is available at http://gales.cdlib.org/~egh/hmm-citation-extractor/sp181-hetzner.pdf.

CDL recruiting for Digital Preservation Services Manager

Friday, July 11th, 2008 | Category: General, Digital Preservation

By Patricia Cruse, CDL Director of the Digital Preservation Program

UNIVERSITY OF CALIFORNIA, CALIFORNIA DIGITAL LIBRARY

TITLE: Digital Preservation Services Manager

CATEGORY: Full-Time

SALARY: Salary commensurate with qualifications and experience.
Excellent benefits.

TO APPLY:  http://jobs.ucop.edu/applicants/Central?quickFind=52447

POSITION DESCRIPTION:

Want to be part of a dynamic team that is working to preserve digital information for future generations?  At the California Digital Library (CDL), we’ve developed a world-class program to preserve digital material that supports the University of California’s research, teaching, and learning mission and you can be a part of it.  A key member of the team is the Digital Preservation Services Manager — reporting to the Director of the Digital Preservation Program the Manager is responsible for the day-to-day management of digital preservation services (production and development) through project management, the provision of support services (whether offered in person or online), and liaison with digital preservation service providers and support staff.  In addition, the Services Manager will be responsible for translating experience of users’ needs and perceptions of system capabilities in a manner that informs further refinement and extension of the digital preservation technology and service infrastructure.

This is an ideal opportunity for someone with solid people skills and a passion for working in a collaborative and dynamic environment.

The California Digital Library (CDL) supports the assembly and creative use of the world’s scholarship and knowledge for the UC libraries and the communities they serve.  In partnership with the UC libraries, the California Digital Library established the digital preservation program to ensure long-term access to the digital information that supports and results from research, teaching and learning at UC.

JOB REQUIREMENTS:

Bachelor’s degree in the social sciences, public administration, library and information science or a related field and at least three years’ relevant experience with development or delivery of online information services in educational, digital preservation, library, research, and/or cultural heritage settings or an equivalent combination of education and experience.

Demonstrated experience to plan, evaluate, budget for and manage complex projects from their inception through to their final delivery.

Plans projects and assignments and monitors performance according to priorities as demonstrated by regularly meeting established deadlines in an environment of multiple projects and changing priorities.

Strong logic and quantitative reasoning skills as demonstrated by ability to review and assess a range of variables to define key issues, evaluate reasonable alternatives and translate findings into recommended changes, actions or strategies.

Proven experience with and general understanding of the academic user community and the digital library/scholarly information services domain.

Demonstrated experience working with user community and technology/programming staff to build use cases, functional requirements and user interface design.

Excellent written and verbal communication skills as demonstrated by the ability to understand and articulate technical ideas and issues at a conceptual level and explain them clearly and concisely to non-technical staff.

Demonstrated ability to operate under general direction, able to develop creative solutions to problems, and tackle issues in a self-motivated manner in a service-oriented geographically distributed team environment.

Demonstrated ability to plan, evaluate, budget for and manage complex projects from their inception through to their final delivery.

Bagit: Transferring Digital Content

Wednesday, July 2nd, 2008 | Category: Digital Preservation

By Trisha Cruse, Director of Digital Preservation

The CDL Digital Preservation Group, under the leadership of John Kunze, has co-developed with the Library of Congress a format for transferring digital content.  “The BagIt format specification is based on the concept of ‘bag it and tag it,’ where digital content is packaged (the bag) along with a small amount of machine-readable text (the tag) to help automate the content’s receipt, storage and retrieval.  There is no software to install.  BagIt is an attempt to simplify large-scale data transfers between cultural institutions.”

Find at more from the Library of Congress press release:
http://www.digitalpreservation.gov/news/2008/20080602news_article_bagit.html

The full BagIt specification is available at http://www.cdlib.org/inside/diglib/bagit/bagitspec.html

CDL Guidelines for Digital Objects, Version 2.0: Updated for METS File element

Thursday, November 15th, 2007 | Category: Digital Preservation, Technology, Digital Special Collections

By Adrian Turner, CDL Data Acquisitions

The "CDL Guidelines for Digital Objects, Version 2.0" (CDL GDO) has been updated to include specifications for use of the METS File <file> element.  You can find the updated version at http://www.cdlib.org/inside/diglib/guidelines/ .

The revision applies to Sections 2.1, 2.2.2, 3.1, and 3.2.4 only:

  • To support the orderly transmission and ingest of digital objects, the CDL recommends the inclusion of checksum (MD5, SHA-1, or CRC32) and byte size values in the METS File <file> element.  Note that this information is preferred, but not required.
  • The subheadings within Sections 2.1 and 3.1 have been relabeled, and are now consistently based on METS element names.

Please contact the CDL at oacops@ucop.edu if you have any questions.

Next Page »

Powered by WordPress and CDL Web Production