December 6, 2006

Google Librarian Newsletter on Google Scholar

The latest Google Librarian Newsletter has a couple of articles on Google Scholar. One of them is a series of profiles on the various people on the GS team. Another is a video of the overview presentation they did at ALA.

Most interestingly, however, is an interview with Anurag Acharya, Google Scholar's founding engineer.

Some fun bits:

TH: What is your vision for Google Scholar?

AA: I have a simple goal -- or, rather, a simple-to-state goal. I would like Google Scholar to be a place that you can go to find all scholarly literature -- across all areas, all languages, all the way back in time. Of course, this is easy to say and not quite as easy to achieve. I believe it is crucial for researchers everywhere to be able to find research done anywhere. As Vannevar Bush said in his prescient essay "As We May Think" (The Atlantic Monthly, July 1945), "Mendel's concept of the laws of genetics was lost to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us, as truly significant attainments become lost in the mass of the inconsequential."
Yes, they do want to take over the world -- A&I databases look out. (More on that in the next day or two, he foreshadows)
TH: Why don't you provide a list of journals and/or publishers included in Google Scholar? Without such information, it's hard for librarians to provide guidance to users about how or when to use Google Scholar.

AA: Since we automatically extract citations from articles, we cover a wide range of journals and publishers, including even articles that are not yet online. While this approach allows us to include popular articles from all sources, it makes it difficult to create a succinct description of coverage. For example, while we include Einstein's articles from 1905 (the “miracle year” in which he published seminal articles on special relativity, matter and energy equivalence, Brownian motion and the photoelectric effect), we don't yet include all articles published in that year.

That said, I’m not quite sure that a coverage description, if available, would help provide guidance about how or when to use Google Scholar. In general, this is hard to do when considering large search indices with broad coverage. For example, the notes and comparisons I have seen about other large scholarly search indices (for which detailed coverage information is already available) provide little guidance about when to use each of them, and instead recommend searching all of them.
He dissembles a little here. We know that they index publisher-provided metadata. Just tell us what that is. I can understand that it's hard to figure out what's what in the stuff they crawl on the free web, but they should know what deals they've made with publishers -- that's what we want to know. Who's in and who's out. I suspect that they don't want us to know that there are some pretty significant publishers that aren't covered.
TH: Some librarians consider Google Scholar's interface too limited for sophisticated researchers. Do you plan to provide more options for manipulating or narrowing search results?

AA: Our experience as well as user feedback indicates that Google Scholar is widely used by researchers of all levels of sophistication -- from laypersons to leading experts. This is not surprising. LibQual's study of use of search habits of undergrads, graduate students and faculty members (presentation available here) shows that all three groups prefer general search engines with broad coverage and do so roughly with the same frequency.

Regarding options for narrowing and manipulating results, we do provide some on the advanced search page. However, we have found that other than time-based restrictions (to search papers from the last few years), none of these options see much use. More generally, we refine the user interface for Google Scholar based on how people actually use it. Instead of considering a laundry-list of features we may add, we consider a list of frequently-performed operations and see how well we support them. A long list of unrelated features wouldn’t be of much use. This is not surprising. For example, few of the tools in a full-featured Swiss Army knife see much use over its entire lifetime.
In other words, "good enough" is good enough for the vast majority of real researchers doing their day-to-day work, with only librarians doing the complaining. I'm actually pretty sympathetic to this point of view, with more to come on that front. (Once again, foreshadowing...)

No comments: