Dan Scott, Heather Matheson, Holly Eggleston, Scott Nickerson
The Problem
Collection analysis reports are required for internal and external program reviews on a regular basis
These reports are very important, have relatively well-defined requirements, and tend to require tedious amounts of manual work to put together
Common approach is to use LC call number ranges to identify items within a given subject discipline
Some resources (e-journals) may not have LC call number ranges, so subject heading keywords would be a useful alternative
As program reviews occur on a relatively infrequent basis, the call number / subject heading criteria are easily lost
The Dream
Materialized views of normalized data to provide an ILS-independent source on which to generate reports:
Holdings information
Serials status
Article abstracts and indexes
Institutional repository content
Ad hoc reporting based on facets of bibliographic records and holdings
Default configuration includes discipline-specific criteria to identify resources in a consistent way
Generates reports for each discipline on a regular basis, stored in a repository to provide historical snapshots of the state of the collection
Codify the many "standard" collection analysis methods
The Hack
Work backwards from an actual collection analysis report and generate the data for that particular report including WIBNIs that aren't satisfied by the current process
Dump a set of raw MARC records, including local fields, from a subject institution
Define analysis criteria for a single discipline: History
Call number ranges
Subject heading keywords
Codify that criteria in an ini file:
[History] General History (including history of civilization)=C81.A1--C8482.Z9 KEYWORD=histor
The Algorithm
Search record by record
Does it match call number range?
Yes - this is a primary resource; increment associated title from conf file
No - Check for subject keyword match
If match, this is a secondary resource; grab first term of first subject heading in bibliographic record and increment associated term
Differentiate matches by type of resource:
monograph, serial, database
electronic or print resource
Generate a simple list of primary and secondary resources with tallies for each category by resource type