Best listening experience is on Chrome, Firefox or Safari. Subscribe to Federal Drive’s daily audio interviews on Apple Podcasts or PodcastOne.
The Library of Congress has been busy building an online collection of what are known as open access e-books. The effort accelerated when the pandemic hit and people had more access to online books than to physical libraries. For more on this effort, the Digital Collections Development Coordinator Rashi Joshi, and Digital Collections Specialist Kristy Darby spoke to the Federal Drive with Tom Temin.
Tom Temin: First of all, tell us what is an open access e-book? These are titles that you might not find in your, top of the list, say, at Amazon.
Rashi Joshi: Sure. So an open access e-book is an e-book that lives on the web. Its open access content is material that’s licensed for free and open use and redistribution. So this could be a work under an open license like Creative Commons. This is a content creator and the publishing their works under an open license, which allows the library to acquire the content much more easily and redistribute it widely on our website, LOC.gov.
Tom Temin: So I’m guessing this is mostly nonfiction type work, or academic and peer reviewed types of material, Kristy?
Kristy Darby: That’s right. So they are high quality, they’re peer reviewed titles, they span myriad subjects. We have books on history, philosophy, music, life sciences, mathematics, religion, economics, so most of them are nonfiction. We do have about 50 works of fiction in the collection. Some of these are new editions of classics that have been republished. And we also have some contemporary fiction in there. So there are authors who are publishing their fiction works under these licenses.
Tom Temin: And what about, you mentioned the classics, just out of curiosity? Is there a point in history at which some of the classics become public domain, like Don Quixote, for example?
Rashi Joshi: Yes, so when an item falls out of copyright, if it’s not protected by copyright law, then it becomes public domain. A lot of U.S. government works are in the public domain. So we have posts of acquisition specialists that are trained in identifying works that fall under the public domain. And those are in scope for this open access books collection.
Tom Temin: And in some ways, that’s a greater responsibility than something that has been published contemporary, because you want to make sure that the text of a Don Quixote, which I read decades ago, is preserved in a way that is sacrosanct that someone can’t kind of reissue it under some crazy label, and change the text for whatever political bias they might want to introduce.
Rashi Joshi: Sure, so everything that’s going into this open access books collection is being reviewed by acquisition specialists to make sure that the terms of redistribution are appropriate, and that we can provide wide access to the content on our website.
Tom Temin: Now, the library also had a link to the catalog of these public domain e-books. So what is the value added of the library publishing these in a collection available to the public versus just going to that catalog?
Kristy Darby: That’s a good question. We think about it in terms of enduring access. So when we take these files, we put them into manage storage, we have them available in whatever digital perpetuity means. They’re always available. And then we have our catalog records that we add these links, they’re persistent links, so they won’t change. So anybody who has Library of Congress records will have the link to this content, it will always be there. And it will always be available. So the web is a shifting changing thing. We don’t always know where things are going to end up. But we feel very comfortable that we are providing that enduring access for this content.
Rashi Joshi: Yeah, and there’s so much unique and high research value and ephemeral open access content on the web. By acquiring the files for this content and hosting them on library platforms, we are making a commitment to preserving and providing enduring access to this content to the American public.
Tom Temin: We’re speaking with Rashi Joshi, she’s a Digital Collections development coordinator, and with Kristy Darby, Digital Collections specialist, both in the digital content management section at the Library of Congress. And what does it require? What kind of effort is needed? Is it simply transferring the file from the catalog and putting it under an LOC URL or does it take more than that?
Kristy Darby: It’s a surprising amount of work behind the scenes. So we definitely have to acquire those files. We have to create thumbnails so on our website, people will be able to see at a glance what we have. We work on those catalog records with our catalogers specialists in the Acquisitions and Bibliographic Access Directorate. We work with Rashi’s division, so it takes a lot of coordination across the Library, a lot of people doing their part to bring it online. We work with our Office of the Chief Information Officer to support our technical infrastructure. So it’s a lot of moving parts, but we have worked over the past several years to build this workflow from the ground up. And then over the course of the pandemic, we’ve really been able to kind of refine it and now we kind of have it, it’s a well-oiled machine.
Tom Temin: Got it, and is there any vetting of the material? I mean, suppose someone got something into the catalog that really is false or known to be contrary to, I don’t know, say that the world has dipped in ether or something like that. I mean, there’s still people that believe there’s ether out in space. You wouldn’t probably want to support that, or do you just put it all on and let people make their own judgment?
Rashi Joshi: So we have a very broad collecting mission. The Library’s mission is to engage, inspire and inform Congress and the American people with a universal and enduring source of knowledge and creativity. So this is a broad collecting mission. We cover all subjects except for clinical medicine, and technical agriculture, which are covered by the National Library of Medicine, and the USDA National Agricultural Library. So as I mentioned, the collection mission is broad. The collection is universal in terms of subject, but it’s not comprehensive. So to help our subject matter experts do selection of content, there’s a lot more content out there than we can collect. We have something we call collection policy statements, we have over 70 of these. These are subject- and format-focused. So when a subject expert is assessing content for potential inclusion in our permanent collections, they’re referring to these collection policy statements. These are collaboratively developed by subject experts. They’re revised periodically, and they’re all available on our website for the public to access.
Tom Temin: And what has been the take up of this collection so far? I mean, how many people do you measure the success of whether anyone’s reading this stuff
Kristy Darby: We do, every month we get a report to see what has been downloaded, what has been viewed, where the folks are who are viewing it. And it’s always really exciting to see. It’s been growing every month. Right now we have about 10,000 downloads a month. And the users, we have lots of users from the United States, of course, but we also have lots of users from Western Europe, we have lots of users in South America, lots of users in India. So it’s been really interesting to see sort of how this has grown. Every month, it seems we get about 1,000 more viewers, 1,000 more users. So we’ve been tracking that over time, we’ll continue to do that.
Tom Temin: Do you get downloads, say to China or Russia or North Korea?
Kristy Darby: We definitely get them to China and Russia? I don’t know that I have seen North Korea on our list. But yeah, absolutely. They are downloading our content.
Tom Temin: And if there are 10,000 downloads in a month, is it 9,000 of one title? Or is it kind of across the board at the appeal of the collection?
Kristy Darby: It’s a little bit across the board. We always have a top five, the ones that really rise to the top. Very often they will be educational titles, we can tell that it looks as if a lot of educators have been hitting the collection, especially over the pandemic, which is really exciting. We also have, as part of this collection, a collection of children’s books from South Africa. So these are born digital children’s books, they were not digitized. And they were totally created online. And those get a lot of use too which is really exciting because it sort of points to this collection being for everyone. It’s not just career scholars who are using this, teachers, parents, children are also using this so that’s always really exciting.
Tom Temin: If someone wanted to create a book digitally to put in the domain, does it have to be in the catalog from what you drew it? Or can you send it directly to the Library of Congress?
Rashi Joshi: Certainly not. So we started by looking at the Directory of Open Access books, because it was a large repository of peer reviewed academic open access books. And certainly not all books in the directory are in scope for our collection. So we are aiming big, we’re not only looking at specific repositories, but any open access and openly available e-books on the web that are in scope for collecting as per our cache policy statements.
Tom Temin: So someone that wants to get those 10,000 download access, they need to first check out the policy to see if it’s even something you’ll accept.
Rashi Joshi: Yes, so there is a donations form on the Library’s website. Public can also connect with a reference librarian. So the reference librarian is the real subject matter expert who will be assessing content to see if it’s in scope for the permanent collection or not.
Tom Temin: And I’m just wondering, do you have a sense of how the publishing landscape is changing? I mean, there have always been self-published books, since there were books as opposed to the trade titles, the Knopfs and so forth of the world. Is more and more coming into the reading public via not the standard publishers that have their editors that vet and have their policies, but this kind of new way of self publishing, that’s also not print, and also not the famous publishers?
Rashi Joshi: Sure, the landscape of ebooks and publishers is continuously evolving. And we are new to collecting open access e-books. So we are learning as we go. So there’s a whole diversity of publishers represented. And right now we have about 3,400 open access e-books in this collection in 50 languages, 100 countries of publication represented. We expect these numbers to only keep growing and not in terms of just the languages and countries of publication represented but the types of publishers, individual content creators and other types of publishers.
Tom Temin: And on the just technical front, do the formats that you have allow for mobile reading and mobile device reading?
Kristy Darby: Yes, absolutely. So we have PDFs for a lot. So a PDF is a very common format, and e-pubs as well, which is an increasingly common format. It’s very common for e-books. And those can be downloaded. You can read them right on the Library’s website, but anybody is free to download those and then read them at their leisure on their e-book reader, on their phone on their computer. All of those are available for download.
Tom Temin: And what about the accessibility questions? Is there a red verbal version available? Or is that a technology you’re thinking about?
Kristy Darby: It is and e-pubs are a great way because those are essentially kind of big HTML files. So you can read the text within it. A PDF is really an image. So we do have a growing number of e-pubs available.
Tom Temin: Because I was thinking for South African children’s novels, you could probably get famous actors to read them gratis, versus having some sort of robotic voice generation read them.
Kristy Darby: That would be lovely.
Tom Temin: There’s a free idea for you, see what Hollywood thinks about that. And a final tech question, is all of this searchable and keyword findable? And because 3,400 and that catalog has thousands and thousands more and so there’s really no limit to how big this can get.
Kristy Darby: One thing that we pride ourselves on at the Library of Congress is really great bibliographic description. And these books are cataloged. They are available with subject headings, and they’re fully searchable in the catalog and on the website. So people can search them by title by subject by author by publisher, and they’re integrated into the Library’s catalog as well. So they’re there with everything else.
Tom Temin: Well, I’m going to check them out myself. Kristy Darby is Digital Collections specialist and Rashi Joshi is Digital Collections development coordinator, both in the digital content management section at the Library of Congress. Thanks so much for joining me.
Rashi Joshi: Thank you so much. It’s been a pleasure.