Register | Log in     
[Help] 

Information for Publishers

How do we make our service work with CiteULike?


Why would we want to do that?

If you're not sold on the idea of making your service work with CiteULike then think of it this way: You'll get endless advertising for your site - for free!

CiteULike is effectively just a large collection of links to articles on external publishers' sites. If your business model involves selling content, and if you make more money if more people read your articles, then it's almost certainly something you ought to be interested in. If your site works with CiteULike then our users can bookmark articles and share them with other specialists in their fields. Marketing people enthuse about viral marketing. You can have it - for free.

What's even better, is that this doesn't compete in any way with what we're doing. We're interested in building communities, and trying to spot patterns in what people are reading. If we can provide a useful service to academics by pointing them towards important papers in their fields then we're happy. We're also as happy to work with conventional publishers as we are to work with open access publishers. We've got no open access evangelism agenda.

Although we're in no immediate hurry to make money out of CiteULike, our revenue stream may eventually come from targeted advertising on the site ("Good" adverts, as Google calls them, like an advert for something you actually need - like a job in your field), and by licensing the server software to research companies who're interested in collaboration amongst their employees, but don't want the details of what they're reading to go into the public domain.

The bottom line is that we're interested in linking to your content as it makes our site more useful, and you're probably interested in having your readers implicitly recommend your articles to other people. Simple. And there's not even a cryptic symbiotic buzzword in there.

OK, so how does CiteULike actually get the details of articles?

There are two ways. The first is probably more important:

Posting via the bookmarklet

If you've experimented with CiteULike as a user, you'll see that you can install a javascript bookmarklet in you bookmarks bar, surf to a "supported" site, find an article, and click the "Post to CiteULike" bookmarklet.

What happens behind the scenes is this:

  1. The bookmarklet extracts the URL and the title of the web page the user was looking at.
  2. It sends that information to CiteULike.
  3. CiteULike checks to see if anyone's already submitted that URL before. If they have, we already know the details and we simply fetch them from our database.
  4. If we don't know what the page is, our server makes an HTTP request to fetch it
  5. We route the contents of the page to the appropriate CiteULike plugin. A plugin is a script which is responsible for parsing the HTML and extracting the citation details (author, title, journal, ...) by using whatever way it sees fit.
  6. We store the details of the article, and information about how to link back to the original article (like a DOI or a PubMed ID, for example) and ask the user where he'd like to file it in his library.

Publishing via RSS

As well as storing articles that our users have actually chosen to post, we're also interested in maintaining a collection of articles which they might want to read. Our users can then search within CiteULike for these, and post them to their libraries if they find them interesting.

We do this by subscribing to RSS feeds from publishers. There's a standard which allows you to embed all the useful metadata (enough to for a proper bibliographic citation) in the feed, and that's what we need.

If you have multiple RSS feeds (one for each journal, say), then we can parse an OPML file which contains a list of where they are.

We already have feeds from Nature and feeds from IngentaConnect. These let us keep the latest issue of all these journals "on the shelf", as it were, such that our users can be informed of all the new articles when they're published.

So, what do we actually need to do?

There are a few options. Please get in touch and we can have a chat about them. However, you'll probably want to do at least some of these things:

Help us write a plugin for your site

If you're a major publisher, then we have developer resources to write a plugin for you. If you're not such a major publisher, then you might be on your own to write the plugin, or you'll have to really persuade us to do it! We've got a finite amount of time, and there are plenty of sites we need to build support for. Either way, there are some things you can do to make this easy:

  1. Provide sensible URLs for your pages. Suppose you've got an internal identifier which you use for all your documents. Maybe it's an integer? Something like http://www.bigpublisher.com/view_article?id=1234 is much nicer than, say, http://www.bigpublisher.com/search?query=something_the_user_searched_for&page=24 &item=7&random_important_magic_number=12. We're interested in storing the raw ingredients to be able to construct a stable URL for your article. Anything you can do to make this easier will help us.
  2. Include DOIs where possible. These really help us provide proper identifiers for the documents, which mean we can produce stable links and spot duplicates easily.
  3. Expose your document metadata on the site. It's much more difficult to try to parse the contents of an HTML page with the metadata in it than it is to parse something which is designed to be machine readable. Does your site have a link to export the citation details in RIS format? Maybe BibTeX? That's always really helpful, and it means we can reuse code from existing plugins. Can you embed the metadata in META tags in the HTML? That's easier to parse sometimes. Can you provide a link to the data in RSS/PRISM format?

Write a CiteULike plugin yourself

You can do this if you want. You'll need to write it in a language called "Tcl". Maybe things will change in later versions of the plugin handler, but that's the requirement for the moment.

The choice of language might sound slightly annoying, but it's designed to be one with a trivial syntax which a good programmer can pick up in an afternoon. It's something like Perl, although a little bit less insane in terms of language design.

There's some provisional documentation, and that should give you an example of how long it might take. If you do want to go down this route, please let us know and we can provide help.

Provide an RSS feed

If you can provide an RSS feed with PRISM metadata in it then that would be extremely helpful. We'll poll the feed for new articles and we'll alert our users when they're published. If you've got lots of RSS feeds then create an OPML file and send us the link. If you've really got loads of them, then remember that we'll need to poll each and every one regularly, so think about aggregating them together into a "new articles" feed if you're worried about us creating too much load on your server.

Creating an RSS feed is probably a good thing to do anyway. Librarians seem to be increasingly interested in this technology, so it might be of general benefit to your site.

If you can, provide abstracts in the feed. We'll display them on CiteULike and you might pique the interest of some of our readers if they can see them.

Tell us about any copyright issues

We take the view that it is reasonable to display citation details and abstracts on our website where they are publicly available anyway (in bibliographies in the literature, for example). If you have any specific requirements then let us know. Maybe you require us to display copyright information if the copyright to, say, your abstracts in not held by you?

Provide contact details for a technical member of staff

Although we can and do add support for certain publishers without talking to them, it's always good to have someone we can talk to at your end if we encounter problems. Actually, because CiteULike parses pages on a regular basis, we sometimes discover bugs in publishers' sites before they do. As a heavy consumer of RSS feeds, we can sometimes spot problems that you might not. Some publishers find it useful to hear from us anyway as we can generally supply you with proper technical bug reports as opposed to some of the ones which come in from the general public.

Ask questions

We're a small organization, and the site's growing fast. We've got time to answer your questions and - amazingly - we actually use human beings rather than computer programs designed solely to "Thank you for your email and interest in CiteULike." If there's anything you need to know, then please email us.

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.