![]() |
CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |

I have a about 50 papers as PDFs (from Nature, Science, PNAS etc...) on my local hard disc, and want to integrate them into my CUL libray. I wonder what the optimal workflow is. The obvious way would be:
1) open each PDF on my laptop
2) look up the title online
3) "post from URL" in CUL
4) upload the local copy of the PDF to CUL
5) Rinse and repeat 50 times
>>> I wonder if there is an easier and more time-saving way of doing this?
I tried to mendelej to generate a list of citations in bib-Format, then import this into CUL. However, the automatic extraction of bibliographic information from the PDFs in mendelej was not satisfactory (many errors, no abstracts(!)), and it generated many un-necessary and wrong tags automatically. Also, I would still have the problem of how to upload the PDFs to CUL afterwards. For this, I thought I can use SyncUthink but that was not satsfactory either (I have to disclose my password to the SyncUThink system, and also SyncUthink would try to fetch the PDFs online, but the point is it should use the PDFs which I have saved locally not the online ones, because many of the PDFs are not online and/or I do not have access for the online sources, however I do have the particular PDFs already).
Posted by antonkratz on 2009-10-25 05:50:35.
19 replies. Login or join this group to post to this thread.
I have to disclose my password to the SyncUThink system
It's not a "system" - it's just a standalone Java applet. It's quite safe from this point of view. But if were still worried, you can also download it and run it directly.
It should use the PDFs which I have saved locally not the online ones
I'm pretty sure that you can work around this. Let it do its job, then add your PDFs using the same naming convention. I'll email the author and ask for his input.
Posted by thegoose on 2009-10-25 08:15:11.
Hi thegoose, thanks for your input! Ok you convinced me to give SyncUthink a shot. However I already run into problems during the first step, automatic extraction of citation data. As I wrote I tried using mendelej for this step. But the automatic extraction of citation data from local PDFs is full of errors, it does not extract the abstracts (which is important to me), and also there is this problem that it generated tags which I don't want and which make no sense anyway. So first I need to find a way to get the citation data! Only after that I can proceed with uploading of the PDFs.
Posted by antonkratz on 2009-10-25 14:46:17.
Maybe a direct link to the search and post page on citeulike might be helpful?
If this page is available, you would just need to copy the titles and paste them in the search box without having to go elsewhere (pubmed/publisher/google etc.) and import the articles into your library directly from the citeulike page (right-click on the + button and open in another tab without leaving the search interface). This merges steps 2) and 3) in the 5-step process you described.
Posted by Zephyrus on 2009-10-25 15:24:30.
I've generally taken the view that Google Scholar (or one of the site-specific searches) are your friends. Click on an article and then our bookmarklet.
As I've said elsewhere, we have most of the API written for automating this and we're hoping some clever person will step up to the plate. I already have a "toy" GUI for uploading which I'll release as opensource.
Posted by thegoose on 2009-10-25 16:23:25.
Maybe a direct link to the search and post page on citeulike might be helpful?
This was a recent addition and I'm not terribly comfortable about it - as I said above, Google do this sort of thing so much better than we ever could. Most of Google has an API but, for some reason, Scholar doesn't. As soon as they do, we'll (probably) integrate their search into CiteULike.
Posted by thegoose on 2009-10-25 16:28:39.
....give SyncUthink a shot. However I already run into problems during the first step, automatic extraction of citation data.
I'm a big unclear. SyncUThink doesn't do that. It downloads BibTeX from CiteULike and then tries to auto-get PDFs.
Posted by thegoose on 2009-10-25 16:32:36.
> SyncUThink doesn't do that. It downloads BibTeX from CiteULike and then tries to auto-get PDFs.
Yes, but how do you get the citations into CUL in the first place? So that's why I tried using mendelej to first get the *citations* into CUL from the PDFs (1), then I can use SUT to get the PDFs. Problem is that (1) is already so buggy that I probably best do it all manually.
Posted by antonkratz on 2009-10-26 05:23:33.
It's tricky. One way to do it would be to extract the DOIs from your PDFs, and then automate submission of those to CiteULike, then upload the PDFs. That's not easy, but Mendeley does the DOI extraction reasonably well.
So, I see a couple of possible routes:
1) use Mendeley Desktop (no Mendeley login required for what we'll use it for here), then have an additional utility which reads the Mendeley database (it's straightforward sqlite), grabs the DOIs and their mapping to PDFs, and then does the submission to CiteULike.
2) write a program to parse PDFs and pull out the DOIs, then do the submission to CiteULike.
It would certainly be nice to have (2), as an open-source Mendeley-Desktop substitute. But I suspect it's no small job. That'll be why Mendeley pay people to write it, I guess ;-)
Posted by LondonAnalytics on 2009-10-26 08:03:43.
I don't think it would be very difficult to do as a semi-automated application. Mendeley & Papers try to do it fully automatically, and that's hard. I can imagine a little app that shows you the first few pages of a PDF and tries to locate a DOI. If that's not available, allow user to select the title and then search for that (PubMed/CrossRef/etc). Then post the URL and upload the PDF.
On the server side most of the APIs either already exist or could easily be added.
Will no-one answer the call?
Posted by thegoose on 2009-10-26 09:51:20.
Interesting discussion here. This really gets at the core problem of connecting old folders of PDFs to nicely structured metadata. I'm convinced that the way to go is through DOIs and title/author extraction, which *can* work reasonably well if you are smart about it.
We've got something in the pipeline that should do pretty well in this arena (open-source Mendeley-Desktop substitute). Gonna be some time before it's done, though!
Contact me directly if you run Linux and might be interested in alpha testing...
greg
Posted by gjuggler on 2009-11-09 15:05:31.
I spent all day yesterday uploading ~100 PDFs the manual way:
1. open PDF 2. Search Pubmed with title 3. Get pubmed id 4. paste into CUL 5. upload PDF
So I would be happy to help automate this process, or test something (I use Linux). Can you give some more details?
Posted by mmwoodman on 2009-11-10 10:36:42.
Hi mmwoodman,
We're still a couple weeks away from having a decent alpha-quality release, but we've got a Google Group to act as an announcement mailing list of sorts: http://groups.google.com/group/paperpile-alpha . Sign up there if you want to get updates on the project and try out an early release when it's ready.
Posted by gjuggler on 2009-11-13 09:49:54.
I just got screwed over, once again, by Mendeley's deeply unclever syncing from CiteULike, giving me another few dozen duplicates to weed out. So, go on then - where's the documentation for the CUL API? I'll see what I can do.
Posted by LondonAnalytics on 2009-12-06 14:28:30.
sorry, I meant: where's the api docs for the uploading of pdfs to citeulike? I know gjuggler's taking a crack at an open-source Mendeley-Desktop substitute - I'll take a crack at part of the problem too, and we'll see where we get to.
Posted by LondonAnalytics on 2009-12-07 07:28:56.
Oh, I missed this. Is there an API? Where are the docs? Perhaps I can tidy up my horrendous perl scripts.
Posted by tnhh on 2009-12-07 09:13:40.
It was "announced" here http://www.citeulike.org/news
Only 2 calls currently: "login" and "upload_pdf", but awaiting feedback before extending. I'll email you the link.
Posted by thegoose on 2009-12-07 09:17:11.
I'll take a crack at it using Excel for starters. I'm a prevert like that.
Before that, I had a look at the Acrobat SDK, and considered making an Acrobat plugin. Then I saw they wanted a grand a year for a licence for it!
Posted by LondonAnalytics on 2009-12-08 14:42:01.