Anatomy of Aggregate Collections: The Example of Google

Anatomy of Aggregate Collections: The Example of Google

Anatomy of Aggregate Collections: The Example of Google Print for Libraries Brian Lavoie Senior Research Scientist OCLC Research OCLC Members Council Meeting October 2005 Aggregate collections Boundaries between local and external collections increasingly blurred Resource sharing (digital/network technologies) Cooperative collection management (resource allocation) Shift in focus to resources of the system (or subsets of the system), rather than individual collections Need data to support/illuminate system-wide perspective Characterize/analyze aggregate collections WorldCat: largest aggregate collection Aggregate holdings of >20,000 libraries Bridge from local to system-wide perspective The system-wide print book collection as represented in WorldCat (January 2005)

60,000,000 ~55 million ~32 million print books 50,000,000 ~41 million 40,000,000 ~35 million 30,000,000 20,000,000 10,000,000 0 Total WorldCat Records Language-based monographs Language-based monographs, excluding government documents and theses/dissertations Language-based monographs,

excluding government documents and theses/dissertations, in print format only More information: Google Print for Libraries Aggregate collection of print books Focus on copyright issues; very little discussion of Google Print for Libraries as an aggregate collection Aggregate print book holdings of five major research libraries (Harvard, Michigan, Oxford, NYPL, and Stanford) What are characteristics of this aggregate collection? How does it relate to the system-wide collection? WorldCat: useful data source for analysis

Lavoie, Connaway, Dempsey: Anatomy of Aggregate Collections: The Example of Google Print for Libraries D-Lib (September 2005) G5 coverage of system-wide print book collection 33% Held by at least one G5 library 67% Not held 10.5 10.5 million million unique unique books books Holdings overlap 10% Held by 3 20%

Held by 2 3% 6% Held by 4 Held by 5 61% Held by 1 Potential Potentialredundancy redundancy rate rateof of40 40percent percent Language distribution Language English German French Spanish Chinese Russian Italian Japanese Hebrew Arabic

Portuguese Polish Dutch Latin Korean Swedish All others Google 5 0.49 0.10 0.08 0.05 0.04 0.04 0.03 0.02 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.07 System-wide 0.52 0.08

0.08 0.06 0.04 More 0.03 Morethan than430 430 languages 0.03 languagesin in Google 0.04 Google55 collection 0.01 collection 0.01 0.01 0.01 0.01 0.01 0.01 < 0.01 0.08 Proportion Published During or Prior To Current Year

Cumulative age distribution of G5 holdings 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Years > >80 80percent percentof ofGoogle Google55 collection collectionstill stillin incopyright copyright Works 35000000

32 m illion 30000000 26.1 m illion 25000000 20000000 Google 5 System -w ide 15000000 10.5 m illion 10000000 9.1 m illion 5000000 0 Manifestations Works Coverage Coverageslightly slightly higher higher(35 (35%) %)

Holdings Holdingsoverlap overlap slightly slightlygreater greater (56 (56% %held helduniquely) uniquely) Some speculation What results would have been obtained if a different group of libraries had been selected? What incremental extensions to coverage can be obtained by adding additional library collections to original Google 5? Chose 5 new libraries: Small US liberal arts college Large US public university Large US private university Large US metropolitan library Large Canadian university

Beyond the Google 5 New Google 5 Original Google 5 Total holdings: Total unique books: % of system-wide: ~8 million 5.9 million 18 percent ~18 million 10.5 million 33 percent Redundant holdings: 26 percent 42 percent Impact by library type: Large US metropolitan library: Large US private university: Large Canadian university: Large US public university:

Small US liberal arts college: % of holdings unique relative to original G5 collection: 39 percent (most unlike G5) 25 percent 23 percent 21 percent 13 percent (most like G5) The Google 10 Google Google10 10collection: collection: 12.3 12.3million millionbooks books + +1.8 1.8million million(17 (17%) %) Original Google 5 (10.5 million books)

Diminishing returns? Original G5: ~18 million holdings 58% unique New G5: ~8 million holdings 22% unique Anatomy of aggregate collections Mass digitization programs and other aggregate collections increasingly common features of library landscape Effective decision-making/planning aided by convergence on set of standard questions that help map out anatomy of aggregate collections Example: mass digitization programs What are characteristics of overarching population of materials that is target of digitization effort? How much of population will digitization effort cover? What is potential degree of redundancy? What bibliographic unit is focus of digitization (e.g., manifestations, expressions, works)? What number of participants and combination of institution types is optimal for obtaining maximum benefit with minimum cost?

Aggregate collections and WorldCat WorldCat more than tool for cataloging and reference; also strategic resource for managing aggregate collections OCLC Group Services OCLC WorldCat Collection Analysis Service OCLC Research data-mining activities Web site:

Recently Viewed Presentations

  • Diapositive 1 -

    Diapositive 1 -

    Anergie tuberculinique++. Guérison spontanée dans 90 % des cas. Récidive dans 5% des cas. La sarcoïdose inaugurée par un syndrome de löfgren est habituellement de meilleur pronostic. b/ Atteinte articulaire. 2. Polyarthrite aigue : 10% des cas.
  • 150 anni di Italia unita -

    150 anni di Italia unita -

    Governo (promotore) Tecnici Ministero (oppositori alleati) Imprese Partner cofinancing Reg. Loc. Nella prima fase il network non è troppo complesso, data la presenza di soli attori nazionali, ma non centralizzato. il governo non trova alleati nella società e l'eredità dell'orientamento...
  • High School Counselor Workshop - Los Angeles Mission College

    High School Counselor Workshop - Los Angeles Mission College

    Today we're going to go through a number of updates to the financial aid process, most of them from the federal government, and the FAFSA process. There aren't many changes to the FAFSA, but we will cover a majority of...
  • SWWT Space Situational Awareness Briefing for SWWT topic

    SWWT Space Situational Awareness Briefing for SWWT topic

    Noveltis (F) Operational Distribution Service of 2D TEC maps over Europe for Natural Hazard Studies L MSSL-UCL (UK) Space Weather Operational Airline Risks Service H SIDC, (B) Solar Influences Data Centre H QinetiQ (UK) A Pilot Space Weather Service Employing...
  • Graphing Using x and y Intercepts in Standard Form

    Graphing Using x and y Intercepts in Standard Form

    Finding the y intercept. We have already looked at doing this from a graph, and in slope intercept form. The y intercept occurs when x = 0. To find the intercept in standard form, simply substitute x = 0 and...
  • Diapositiva 1 - ANMM

    Diapositiva 1 - ANMM

    INCIDENCIA GLOBAL EN TUBERCULOSIS . Zumla. A et al. N . Engl. J Med 2013;368:745-755. Figure 1 Global Incidence of Tuberculosis. Panel A shows global trends in the estimated incidence of tuberculosis from 1990 through 2011 among all patients, those...
  • Jeopardy -

    Jeopardy -

    Jeopardy People Battles Conditions Issues Causes Q $100 Q $100 Q $100 Q $100 Q $100 Q $200 Q $200 Q $200 Q $200 Q $200 Q $300 Q $300 Q $300 Q $300 Q $300 Q $400 Q $400
  • Endangered Species Around the World

    Endangered Species Around the World

    A species of animal or plant that is seriously at risk of extinction. There are many different species of animals around the world that are endangered or are quickly becoming endangered or even extinct. You might think that only the...