6 HARNESSING THE POWER OF GOOGLE
procedures, and other medical terms. Staff at the National Library of Medi-
cine with subject expertise carefully assign MeSH headings for precision in
retrieval of results.
In the ­legal realm, the proprietary West Key Number System is a long-­
standing authority for indexing precedent-­setting cases, law reviews, ­legal
encyclopedias, and other materials. ­These ­legal subjects are assigned by
attorney-­editors who are specialists in vari­ous areas of law. It is impor­tant
for ­ legal researchers to be able to find “all and only” relevant ­ legal cases, as
well as their disposition (­whether a law has been overturned or still stands).
In the realms of medicine and law it seems that strict adherence to con-
trolled vocabularies is ­here to stay for generations to come. However, in
many other disciplines we are seeing a paradigm shift. Users are tending not
to take the time to look up subjects or descriptors and instead simply search
the full text with tools like Google Web, Google Scholar, and Google Books.
Several observations should be made about controlled vocabularies and
their use by vari­ous database vendors. The first observation is that ­there are ­
great challenges when trying to decide ­ whether to use a controlled vocabu-
lary or not, and then which controlled vocabulary should be used. As we saw
with the three social science disciplines of psy­chol­ogy, education, and sociol-
ogy, ­ there are sometimes ­ great differences in nomenclature within ­ these dis-
ciplines. What about contexts that are more universal? Library cata­logs, for
example, are collections of books and other materials across all disciplines:
arts and humanities, social sciences, and science and engineering. Usually one
controlled vocabulary is used in academic libraries to capture the “about-
ness” of ­these works: the Library of Congress Subject Headings. But in using
a generalized nomenclature set, subject-­specific nuances are not captured.
The impor­tant point in this observation is that controlled vocabularies are
many, they vary in scope and applicability, and they are not always applied
in ­every context.
The second observation we need to make is that many databases, through
“smoke and mirrors,” create the impression that they are using a principled
thesaurus and applying it consistently throughout their product, but in fact
this is not the case. This is not meant to criticize them, for they are ­ doing the
best they can with what they have to work with. But we need to be aware of
what is ­really happening in aggregated databases like ­those produced by
vendors like EBSCO, ProQuest, and Gale.
Unlike databases like PubMed and Westlaw, which get their rec­ords from
a single stream that they control, aggregator content comes from many
sources, some of which they have more control over than ­others. To make it
appear that they have a semblance of vocabulary control, subjects are first
captured out of individual index rec­ords, then are “back-generated” into a
master index and “normalized” (brought into a uniform style) to the extent
pos­si­ble. Rather than individual index rec­ords being carefully examined (an
impossible idea when you consider the scale of rec­ords vendors deal with),
Previous Page Next Page