6 HARNESSING THE POWER OF GOOGLE
procedures, and other medical terms. Staff at the National Library of Medi-
cine with subject expertise carefully assign MeSH headings for precision in
retrieval of results.
In the legal realm, the proprietary West Key Number System is a long-
standing authority for indexing precedent-setting cases, law reviews, legal
encyclopedias, and other materials. These legal subjects are assigned by
attorney-editors who are specialists in various areas of law. It is important
for legal researchers to be able to find “all and only” relevant legal cases, as
well as their disposition (whether a law has been overturned or still stands).
In the realms of medicine and law it seems that strict adherence to con-
trolled vocabularies is here to stay for generations to come. However, in
many other disciplines we are seeing a paradigm shift. Users are tending not
to take the time to look up subjects or descriptors and instead simply search
the full text with tools like Google Web, Google Scholar, and Google Books.
Several observations should be made about controlled vocabularies and
their use by various database vendors. The first observation is that there are
great challenges when trying to decide whether to use a controlled vocabu-
lary or not, and then which controlled vocabulary should be used. As we saw
with the three social science disciplines of psychology, education, and sociol-
ogy, there are sometimes great differences in nomenclature within these dis-
ciplines. What about contexts that are more universal? Library catalogs, for
example, are collections of books and other materials across all disciplines:
arts and humanities, social sciences, and science and engineering. Usually one
controlled vocabulary is used in academic libraries to capture the “about-
ness” of these works: the Library of Congress Subject Headings. But in using
a generalized nomenclature set, subject-specific nuances are not captured.
The important point in this observation is that controlled vocabularies are
many, they vary in scope and applicability, and they are not always applied
in every context.
The second observation we need to make is that many databases, through
“smoke and mirrors,” create the impression that they are using a principled
thesaurus and applying it consistently throughout their product, but in fact
this is not the case. This is not meant to criticize them, for they are doing the
best they can with what they have to work with. But we need to be aware of
what is really happening in aggregated databases like those produced by
vendors like EBSCO, ProQuest, and Gale.
Unlike databases like PubMed and Westlaw, which get their records from
a single stream that they control, aggregator content comes from many
sources, some of which they have more control over than others. To make it
appear that they have a semblance of vocabulary control, subjects are first
captured out of individual index records, then are “back-generated” into a
master index and “normalized” (brought into a uniform style) to the extent
possible. Rather than individual index records being carefully examined (an
impossible idea when you consider the scale of records vendors deal with),