April 12, 2007

Is there a future for bibliographic databases?

As promised, I'm reposting here the full text of the guest post I did on Michael Cairns' PersonaNonData blog. I'd like to thank Michael again for the opportunity, one that I think turned out pretty well for both of us.

The post itself turned out to be fairly popular (certainly one of the most widely linked posts I've ever done) and was linked to from several places:



Here goes:


A week or so ago, Michael asked me to do a guest post here on Persona Non Data about bibliographic databases, based on some of the speculations I've made on my own blog, Confessions of a Science Librarian, about the future of Abstracting and Indexing databases.

Here's how he put it in his email:
I have read your posts on the future of information databases and bibliographies etc. over the past several months and I was wondering whether you had a specific opinion of the future of bibliographic databases such as worldcat and booksinprint? ... [O]n my blog I have skirted around the idea that the basic logic of these types of databases is beginning to erode as base level metadata is more readily available and of sufficient quality to reduce the need for these types of bibliographic databases. Assuming that is increasingly the case then these providers need to determine new value propositions for their customers. So what are they?


How could I resist? I'm not sure if I exactly answer his questions or even talked about what he'd hoped I'd talk about, but at least I've probably provoked a few more questions.

In my blog post on the future of A&I databases, I basically came to the conclusion that in the face of competition from Google Scholar and its ilk, the traditional Abstracting & Indexing databases would be increasingly hard-pressed to make a case for their usefulness to academic institutions. Students want ease of use, they concentrate on what's "good enough" not what's perfect. Over time, academic libraries will find it harder and harder to justify spending loads of money on search and discovery tools when plenty of free alternatives exist. Unless, of course, the vendors can find some way to add enough value to the data to make themselves indispensable. I used SciFinder Scholar as an example of a tool that adds a lot of value to data. I think we'll definitely start to see this transition from fee to free in the next 10 years, with considerable acceleration after that.

Now, I didn't really talk about bibliographic/collections tools like Books in Print (BiP), WorldCat (WC), Ulrich's or the Serials Directory (SD). Why not? I think it's because those tools are aimed at experts, not end users. Professionals, not civilians. Surely if a freshman only wants a couple of quick articles to quote for a paper due in a couple of hours, then we librarians and publishing professionals are looking for good, solid, quality information and we're willing to pay for it. This distinction would seem to me to be quite important, leading to quite a different kind of analysis, one I wasn't really aiming at originally. So, I didn't really think about it at the time.

So, now it's time to put the thinking cap back on and see what my crystal ball tells me.

In my professional work as a collections librarian, I am a frequent user of all the tools I mention above. I think that BiP is the one I use the most. Over the last 5 or 6 years I've built up a specialized engineering collection mostly from scratch so I've needed a lot of help and BiP has been an enormously useful tool. I use keyword searches. I also use the subject links on the item records a lot to take me to lists of similar books.

WC I use less frequently, mostly only when I want to look beyond books that are in print and want to identify older and rarer items that I'll end up having to get on the used book market. I've used this to build up various aspects of our Science and Technology Studies collection on topics like women in science. On the other hand, WC seems to have already found a big part of its value proposition with non-experts. Look at it's partnership with Google Book Search. Also look at the really innovative things it's doing with products like WorldCat Identities. It's not perfect by any means but you can see the innovative spirit working.

Ulrich's and SD I mostly use to identify pricing issues for journals I might want to subscribe to, so I don't use them that often. With the ease of finding journal homepages, this function is probably falling fast in it's uses. As for identifying the journals in a particular subject area, that's still a useful function but I wonder what the future is if that's all they offer.

For our purposes here, I'll concentrate on the one I use most: BiP. I presume a lot of what I have to say will also more or less apply to the other specialized tools aimed at pros.

So, I definitely need quality information on books to do my job, now and in the future. But if I need quality information, what will the source be? Although of course I use BiP, I also use Amazon quite a lot to find information on books I want to order; the features that they have that I like best and use most come out of the kind of data mining they can do with their ordering and access logs. When I'm looking at an interesting item, Amazon can quickly tell me what other books are similar, what other books people that have purchased the one I'm looking at have also purchased. I find this to be an extremely important tool for finding books, a great time saver and an incredibly accurate way of finding relevant items. Also, when I search Amazon, I'm actually searching the full text of a lot of books in their database. This feature gets me inside books and unleashes their contents in a way that can't be duplicated by being able to view or even search tables of contents. I also very much like the user-generated lists and reviews. On more than one occasion I've appreciated multiple user reviews of highly technical books, especially when there are negative reviews to warn me away from bad ones. The "Listmania" and "So you'd like to.." lists are great sources of recommendations. On the other hand, it has some significant problems that keep me from going to it exclusively. For example, most any search returns reams of irrelevant hits. The subject classifications that Amazon displays at the bottom of the page I also find next to useless as they are often far too broad.

For BiP, the features I appreciate the most, the ones that draw me back from Amazon, include very good linkable subject classification and good coverage of non-US imprints. When I do keyword searches, the results seem more focused and less cluttered with irrelevant items. I also like that it gives me very complete bibliographic information, including at least part of a call number. While Amazon isn't geared to let you mark then print out a bunch of items (why would they want you to be able to do this?), I appreciate being able to generate lists and print them out using BiP. On the other hand, BiP has been slow to make their interface as quick and easy to use as Google or Amazon, to make use of the tons of data they have, to mine it to find connections, to harness user input and reviews in a massive way to compete with the Amazon juggernaut. When for-fee is competing with for-free, the one that costs money has to be very clearly the best.

Another threat to BiP is Google Book Search. As I've recounted in a story on my blog, Google Book Search in an incredible tool for research, reference and even collections. Once again, the ability to search the entire text of books is an incredible tool for revealing what they're really about, to surface them and make me want to buy them. As Cory Doctorow has said, the greatest enemy of authors (and publishers) is not piracy, it's obscurity. Google Book Search is an amazing tool for a book to get known and,ultimately, to get bought. As more and more publishers realize this (and even book publishers are smart enough to realize this eventually), they'll make darn sure all their new books are full text searchable by Google (and, presumably, Amazon and others). How can BiP compete with that?

I think it's safe to say, it wouldn't take much for me to completely abandon the use of BiP and only use free tools such as Amazon and Google. What could BiP do to keep in the game? What is their value proposition for me? What is the value proposition for all bibliographic tools hoping to market themselves to library professionals now and in the future?

Some issues I've been thinking about.

  • The changing nature of publishing. What's a book? What's a journal? What does "in print" mean? Print journals vs. online? Ebooks vs. paper books? Fee vs. Free. Open Access publishing. Wikis. Blogs. To say that bibliographic databases have to be ahead of the curve on all the revolutionary changes going on today in publishing is an understatement. Look at all the trouble newspapers are in, the trouble they're having adjusting to a new business model. Well, the book world is changing as well, especially for academic customers. The needs of academic users are quite different from regular users. They don't necessarily need to read an entire book, just key sections. Search and discovery are incredibly important to these users, almost more important than the content. They also really don't care about the source of their content, what they really care about is having as few barriers between the content and themselves. How will BiP and other bibliographic databases help professionals like me navigate this mess? Easy. By continuing to provide one-stop-shopping, only for a much wider range of items. Paper books from traditional publishers, for sure, but how about all those Print on Demand publishers? Sifting through the chaff to get the rare kernal of wheat is an important task, one I know that they're already doing to some degree. But how about digital document publishers like Morgan & Claypool? O'Reilly's Digital PDFs? White papers and other documents from all kinds of publishers? How about the incredible amount of free ebooks out there? And other useful digital documents and document collections, both free and for sale (The Einstein Archives is an example)? And breaking down the digital availability of the component parts of collections like Knovel, Safari, Books 24x7 and all the others. Any tool that could help me evaluate the pros and cons of those repositories would be greatly appreciated. The landscape out there for useful information is clearly far larger than it used to be.

  • Changing nature of metadata. Never underestimate the value of good metadata; never underestimate the value of the people that produce that metadata. It seems to me that one of the core issues is who should create metadata for books and other documents and how should that metadata be distributed to the people that want it, be it commercial search engines or library/bookstore catalogues. It would be great if all content publishers created their own metadata and that it was of the highest quality and free to everyone. There's a role for bibliographic databases to collect and distribute that metadata, maybe even to create it. The library world has a good history of sharing that kind of data, but I'm not sure how that model scales to a bigger world. It seems to me that there's an opportunity here.

  • Changing nature of customers. I've publically predicted that I will hardly be buying any more print books for my library in 10 years. Libraries are changing, bookstores are changing. Our patrons and customers are the ones driving this change. As my patrons want more digital content, as they use print collections less, as they rely on free search and discovery tools rather than expensive specialized tools, I must change too. As my patrons' needs and habits change, the nature of the collections I will acquire for them will follow those changes -- or I will find myself in big trouble. Anybody that can make my life easier is certainly going to be welcome. And that will be the challenge for the various bibliographic tools -- making it easier for me to respond to the changes sweeping my world. A good bibliographic service should be able to help me populate the catalogue with the stuff I want and my patrons need. I think a lot of progress has been made on this front in products like WC, but I think to stay in the game the progress will have to be transformative. There's lots of opportunity here.

  • What's worth paying for. In other words, BiP, WC and their ilk have to be better than the free alternatives. And not just a little better. And not just better in an abstruse, theoretical way; if it takes you 20 minutes to explain why you're better, the margin may be too slim. Better as in way better on 80% of my usage rather than just somewhat better than on 20%. Better as in saving time, saving effort, saving more money than they cost, making my life easier.


To conclude, I can only say one thing. In times of intense change and uncertainty, evolutionary pressure is extremely intense. Only those products and services that can find an ecological niche, a way to satisfy enough customers, will survive. To thrive is another story. To thrive requires a redefinition of products and services, a way to jump ahead of competitors and to win new markets with something new and exciting. It's hard to tell where bibliographic databases will find their place: will they be dodo birds, or will they find a way to survive or even thrive in the coming decade. There's certainly a window to change. Nobody is going to cancel any of these core tools any time soon. But the window will close sooner rather than later.

No comments: