Digital Libraries Discussion: 2009

Tuesday, November 24, 2009

Muddiest Point

I could use some clarification in regards to the design report accompanying our final projects. I am not sure what information to include in some of the categories, particularly user requirements, conceptual design, data collections, and sample information access scenarios. I have a general idea of what these might mean, but examples would be very helpful. Is there a finished report we could look at?

Tuesday, November 3, 2009

Muddiest Point: Digital Preservation

There has been a lot of talk about what would happen to a library's licensed digital resources if the vendor went under -- since nothing is physically on the shelves, access to all those materials could be lost as though it never was. But I have heard that some vendors offer guaranteed continued access through third-party permanent repositories, and that some institutions are partnering with each other to build and maintain the same. Can you point to any examples of these types of ventures?

Friday, October 30, 2009

Preservation in Digital Libraries

Rieger
I am glad to see digital preservation issues being addressed so thoroughly. Though it’s more of a minor point, I’m particularly intrigued by the concept of shared print storage facilities – it strongly suggests that digital is becoming the default, with print just the backup/alternative. It also has interesting implications for the way competing institutions might find themselves collaborating. I think one of the crucial points raised in this report (though maybe not directly related to digital preservation) is the call for libraries to develop more “scaleable and flexible infrastructures” and to rethink their collection development policies. Digital resources are truly changing the landscape of traditional resource management practices. The most important issue raised, however, was the fact that so many standards and best practices are so far out of date. That’s a field that requires a lot of development very quickly, especially with the advent of these large-scale ventures.

Jones and Beagrie
Although digitization has been touted by some as the solution to preservation of print resources, it has become clear that digital objects require just as much if not more effort and funds to maintain. Paper can sit on shelves untouched for decades and still fulfill its intended function; it would be very difficult, at this point, to access a digital object from forty years ago. This handbook emphasizes the fragility of digital media, and the advance planning and quick action necessary to keep it intact and functional. It also identifies the key stakeholders for long-term preservation efforts. Unfortunately the section on cost is probably all too accurate when it says, essentially, that there is no way to estimate it.

Littman
This is a useful reflection from someone who clearly has much experience in the digital resource field, and has faced many of the problems that digital libraries have and will continue to face. The four categories of failures he describes -- media, hardware, software, human -- are ever-present risks in any digital library, and it is important to keep those various risks in mind when designing and managing a DL. Littman suggests that human error was the most serious threat to his own project, which is a lesson that any DL team should consider very seriously.

Monday, October 26, 2009

Muddiest Point (Access in Digital Libraries II)

Point 1: OAI-PMH seems to rely on consensus amongst all participants as to which metadata standard to use (i.e. Dublin Core). However, I imagine it is difficult to convince people to provide DC metadata if they are using a different standard for their own purposes -- it's just more work. What I wonder is whether it is easier to try to get people to agree on one standard, or whether it is possible to develop a new protocol that is capable of recognizing and harvesting multiple metadata schemas.

Point 2: If Z39.50 is not designed for full-document retrieval, is there a different standard in the works that addresses that growing need? Is there any institution that has successfully implemented a different system?

Wednesday, September 30, 2009

Unit 5 Readings

Witten
2.2
I like Witten's emphasis on presenting an image of stability and continuity to the user, despite the impermanence and fluidity of digital documents. I hadn't previously considered the comparison between different editions of a physical work and different modifications of a digital document. The discussion on authority control is not new information, but I wonder if it is more or less relevant/necessary in the context of digital libraries with sophisticated search functions. The LCSH information is also not new, though it's a good overview of basic cataloging practices. It is interesting to contrast the linear arrangement of books on shelves with the endless rearrangement possible in a digital collection.
5.4-5.7
Good overview of MARC and DC. BibTeX was new to me; I like that it seems compatible with XML. Refer seems, on the surface, a little more comprehensible to me, but maybe I'm just drawn to the alphabetical ordering of categories. The section on TIFF was enlightening; I had no idea the format carried so much metadata. I'd like to explore that capability further. MPEG-7 seems extremely useful for digital libraries; the projected capabilities mentioned are pretty amazing, although the technical stuff was a little over my head.
Automatic extraction of metadata makes me a little nervous, perhaps because I am a firm believer in authority control. Many of the methods discussed seemed iffy, though key phrase assignment/extraction seems useful. With the volume of information out there it's useful to have some automated methods, but there will always be a need for manual assignment of metadata. Phrase hierarchies actually seem pretty interesting, though all the talk about complex algorithms is beyond me.

Gilliland
The table on typology of data standards is a really useful way of thinking about this stuff (e.g. LCSH vs. AACR, MARC fields vs. MARC21). Tables 2 and 3 are useful as well. She makes an important point that metadata for digital objects needs to exist independent of the current storage/retrieval system in order to survive migration.

Weibel
It's encouraging to read the reflections of someone who was involved with Dublin Core from its very first development. I laughed a little when he mentioned that "participants spent an hour of scarce plenary time talking about Type before realizing that the librarians and the computer scientists had been talking about completely different concepts" -- but the important thing is that they produced a functional standard in the end. Weibel questions the possible future effect of "folksonomies" on the metadata landscape, and I am both interested and discomfited by the notion. I liked his anecdote about the China-Mongolia railroad gages.

Wednesday, September 23, 2009

Week Three Muddiest Point

I was really interested in the discussion of different file types and their different features and uses. I'm wondering if there's a good, clear, concise resource online somewhere that outlines all of them? I'd love to have a quick comparison chart or something to refer to when choosing which to use.

Assignment 2: Flickr

I chose to scan the front pages of various national newspapers from the 2008 election. They can be found here.

Friday, September 18, 2009

Week Two Muddiest Point

I could still use some practical examples/demonstrations of interoperability between separate digital libraries to understand how that functions.

Week Three Readings

Lesk
It's interesting to read an author's musings about how to retain control of the layout of a document while still allowing for flexibility of viewing options, when said musings are being displayed as a PDF. Likewise the discussion of different scanning options when reading a scanned page. The discussion of how OCR technology has progressed is encouraging, and it seems like the ease and cost-effectiveness of converting texts to sophisticated digital format with high functionality will only increase with time. Pitt subscribes to a historical newspaper collection (can't recall the name) that solves beautifully the display problems mentioned in 3.4. I didn't realize that CMU had such a large book-scanning project underway. Lesk reiterates the crucial point, however, that reading off paper is still preferable to most people than reading from a screen.

Arms
This chapter deals with a lot of the same issues I picked up in Lesk. I appreciate the in-depth discussion of Unicode, which I know is a vital standard for displaying diverse languages but I don't know much about (e.g. UTF-8 encoding). Ditto the explanation of DTDs and SGML. The section on XML is extremely helpful, as it's a concept I've had trouble grasping in the past.

Lynch
Identifier systems are a critically important aspect of digital libraries--and really any digital/networked collection--that I haven't given much thought. I think laypeople (like myself) tend to think that you have a URL or filename and that's all you need, but Lynch identifies several contexts and usages that require a different approach. Also, though I've seen them before (I believe through the Government Printing Office), I didn't realize that PURLs were an OCLC creation.

Tuesday, September 8, 2009

Week Two Readings

Suleman and Fox
Not having a background in computer science, some of this was a little hard for me to follow, but it seems to be advocating an open, simple, and customizable protocol for use in DLs, which seems like a great thing. Interoperability and extensibility are going to be increasingly desirous to DL user populations, and it seems like that's what this system is attempting to do.

Arms, Blanchi, Overly ("Architecture")
There's a lot of discussion of client services in this article, and I think I need clarification about what it is and how it functions--I'm not quite grasping it at this point. I like the point about how the organization of information should not be biased by expectations about how users will approach the material. I've also never really considered some of the levels of complexity discussed here--like illustrations within a text being created as separate digital objects, or a meta-object constituting various resolutions of a single photo.

Payette, Blanchi, Lagoze, Overly ("Interoperability")
To rephrase for my own reference, the key to extensibility is clean separation of object structure, extensible interfaces, and mechanisms that implement extended functionality. The concept of Disseminator Types seems to make a lot sense for adding new functionality to an object. My problem with this article, however, is that I could use a clearer illustration of what the authors mean by "interoperable."

Saturday, September 5, 2009

Week One Readings

Candela et al.
The authors make a legitimate point about the "terminological imprecision" in the literature. Their differentiation between Digital Libraries, Digital Library Systems, and Digital Library Management Systems is a useful one; so too is their discussion of the interaction between the four categories of actors (end-users, designers, system administrators, and application developers).

Borgman
Borgman highlights the problematic nature of the term "digital library," which "obscures the complex relationship between electronic information collections and libraries as institutions." I think this is an important distinction/relationship to keep in mind, and efforts to create digital libraries should strive to bridge the gap between them. I also agree with Borgman that librarians tend to--or, I think, should--"take a broad view of the concept of a library." It will ultimately make them more useful and more relevant to a wider range of users. Borgman also traces back to one of the earliest definitions of a digital (or "electronic") library in 1992, which included the key elements of services, architecture, content, enabling technologies, users, and content. I think it is still useful and applicable to consider all those features when talking about DLs.

Paepcke et al.
Digital libraries being a conduit for funds that libraries normally wouldn't have access to is a point I hadn't considered before, though I wonder if this is the case in practice. This is a good discussion of the tensions between the library science and computer science fields, and the way that relationship has been impacted by technological developments (e.g. the advent of the Internet). I appreciate the insistence that "the core function of librarianship remains" in a world where people ask, "Aren't libraries kind of irrelevant since Google?" (Someone really said that to me. I was speechless, but I guess I should have a ready answer for the next time...)

Wednesday, September 2, 2009

Week One Muddiest Point

First post for the class, and my muddiest point is a question that may not have an answer. What is the functional difference between a digital library and a digital collection? Is there a difference, or are they two terms describing essentially the same thing? If a digital collection is one aspect of a larger physical library, can it be considered a digital library? I guess the the first assignment is intended to address this ambiguity.

Digital Libraries Discussion