May 22, 2008

Google Scholar and the future of A&I databases

This is a bit of a "told you so" post inspired by something I saw on Open Access News the other day: Google indexes 90% of recent engineering research. The post mentions a recent article in the The Journal of Academic Librarianship, May 2008: John J. Meiera and Thomas W. Conkling, Google Scholar’s Coverage of the Engineering Literature: An Empirical Study. The abstract says it all:

Google Scholar’s coverage of the engineering literature is analyzed by comparing its contents with those of Compendex, the premier engineering database. Records retrieved from Compendex were searched in Google Scholar, and a decade by decade comparison was done from the 1950s through 2007. The results show that the percentage of records appearing in Google Scholar increased over time, approaching a 90 percent matching rate for materials published after 1990.

My first Google Scholar post was way back in 2004 and I think what I said then is just as valid today:
Winners & losers:

  • Loser: the A&I industry. Big time. Google Scholar is free, their products are definately not. Can they add enough value to the data they have to make it worth our (ie. libraries) while to subscribe to their services? No one's cancelling all those indexes this year, or even next, but what about five years from now? The key here is adding value. Google's product will be one-size-fits-all, always a bit overwhelming. Also, it will be probably be limiting itself to stuff online-only. Will Google get the metadata for journal backruns that aren't online and refer users to their local academic library?

  • Winner: students. Big time. Students want to use simple interfaces, easy searches with highly relevant results. If Google can deliver that with this product like with their regular search engine, this will be a hugely popular tool amongst students.

  • Loser: non-OA journals. More and more, if a journal's content is not online for free, it will not exist for the new generation of scholars. Why use journal A behind some weird pay-money-or-else screen when journal B has their articles right here. I know that you can get to A via your friendly neighbourhood proxy server/academic library, but really, at 3 am with the paper due tomorrow and the student doesn't even know where the library is on campus, that's not going to happen. Also, anyone not afiliated with a subscribing institution will automatically choose B. It's only a matter of time before Google puts a "Free full text only" check box on the screen. Open Access will mean survival for journals in the Google world. Not this year, not next year, but maybe in five or ten.

  • Winner: academic libraries & librarians. Yes. We're winners. Think of what this could do for our budgets! Finally we can demo tools in the classroom that the students will think are relevant! No more blank stares & sneers! But seriously, the advantages of basically using one interface are huge in terms of teaching students how to get the most out of their search experience. Google will continue to be overwhelming for many and confusing to some, so we will still have the role of helping students navigate. Oh yeah, we'll actually be able to spend more time on concepts like critical thinking, scholarly communication and all those information literacy standards we talk about but rarely have time to actually teach.

  • Loser: vendors of federated searching products. One search is here. This is it. The real challenge, of course, will be figuring out how to get link resolver products like SFX to work with Google Academic. Also, for us Ontario universities, all our content is on a central server. How do we get our students using Google Scholar to find the content on our platform rather than automatically going to the publisher's site. An interesting challenge.

  • Winner: the general public all over the world. Obviously, this will bring together a lot of information and make it accessible to everyone. As more and more stuff becomes OA, more and more scholarly content will become easily accessible to everyone. This is a good thing.

Musings on the future of A&I indexes also played a very important part of my My Job in 10 Years series, with a whole post devoted to the issue -- one of my all-time most read posts, if that has any meaning. I won't quote here, but my main point was that in a Google Scholar world, A&I providers will have to struggle to figure out how to add enough value to the bibliographic, citation and indexing data to make it worth our while as librarians to license those databases. The evidence from the study cited above would seem to indicate that we're getting closer to the day where we can start doing other things with that money. Sure, there's still quite a few cases where the vendors add tons of value to the data (SciFinder, Illustrata, Web of Science...), but for how much longer is it going to be worth the huge investment on our part. Personally, I'd much rather be spending the money on acquiring full text content, digitizing our own unique collections and new services to reach out to our patrons.

Some of the places I've talked about this (and related issues) before:


tompasley said...

Hmmm... yes, there are reasons to use Google, but you need to choose tools suitable for the subject area.

I know others have tackled the A&I added-value question, I'll mention the differences between subject areas and the impacts this might have on openness... some subjects will be bigger winners, and other will lag behind... I'm not sure how the humanites will do here. Physics, computer sciences, engineering are at one end of the spectrum, I can't help but wonder who will be at the other end, and what sort of time-lag there will be in between.

ThomsonReuters are still a big business - not as big as Microsoft and Google, but not as ubiquitous either, (although the Reuters part probably is?). Microsoft and Google always will be bigger... most computers on average use a Microsoft product, although it's interesting they've closed down the book part of their Live search. We're also seeing mergers elsewhere - Wiley-Blackwell, for example. I think there will be more to come, and with it, increased strength on the publisher side (they're not going to go down easily).

Interesting times ahead - personally, it will be great when things are more open, but I suspect subscription fees paid by libraries will be moved to the authors (normally within the same institution) who will pay to make things free...

John Dupuis said...

Hi Tom, thanks for the comment. I agree that the GS coverage will vary enormously by discipline with H/SS faring the worst. I do usually concentrate on the scitech areas here but I am aware of the differences and do recognize the long-term implications.

Long-term, however, I still see the A&I business as being severely challenged in all areas as they struggle to add value to justify their very high sub fees. As academic libraries struggle to transform themselves into 21st century organizations, they will need resources to fund that transformation. Personally, I would prefer to use my collections money to pay for truly important and unique content that our patrons will recognize as having value and will recognize the contribution of the library for sponsoring and providing access to that value.

As for today, I do agree with the authors of the article that GS is still probably not the best place for faculty and grad students who need more targeted and comprehensive searching capabilities. Of course, the challenge is convincing them that they truly need those capabilities and that GS is not "good enough."