Register | Log in     
[Help] 
Preview goes here.

Group: CiteULike-discussion - Forum Thread

Topic: General

Automatic PDF retrieval, upload and download

Hi all,


I thought I'd share with you a little tool I wrote to make working with CiteULike a little more user-friendly. It's called SyncUThink, and it aims to do two things:


  1. Search for, and upload to CiteULike, PDFs for all citations in your library.
  2. Download all available PDFs to your computer.

An early version is available online here (note that it uses a signed Java applet, so you'll need to accept a couple Java security warnings): SyncUThink


I prefer to keep PDF copies of my papers of interest, so I can read them anywhere without printing them out. Without an automatic tool for fetching these PDFs, I was looking at a whole lot of clicking and "Saving as..." over the next three years. Hopefully making this tool available online will allow others to benefit as well!


Cheers,

greg

Posted by gjuggler on 2008-01-09 23:29:08.

8 replies.    Login or join this group to post to this thread.

Greg, this tool is a wonderful idea.

I have it running on my library at the moment. So far it has made only one detectable error, retrieving the incorrect PDF for this citation:

Barker SA et al. 1995

A couple of initial observations:

  • PDF download to NFS-mounted drive failed (a drive on the local machine was fine)
  • tagging subfolders results in multiple download of the PDF to each tagged subfolder; seems wasteful?

Queries/improvements:

  • A "stop" button for both the upload and download processes would be a useful feature.
  • Does SyncUThink skip citations that already have a PDF or crawl the whole lot every time?

Great work.

Posted by neils on 2008-01-10 02:41:31.

Hi Neils,

Thanks for your comments! Regarding your points:

  • I don't know how to deal with the NFS issue.
  • I fixed the tagged subfolders to only download a given PDF once, then copy the file to the remaining subfolders. I think this is the most economical way of handling that situation.
  • Stop, pause and resume are implemented now, though perhaps a bit buggy...
  • It does (ahem--*should*) skip citations that already have PDFs.

--greg

Posted by gjuggler on 2008-01-10 16:59:46.

Thanks for the info. I don't know if the NFS issue is really an issue; I just tried the download once then moved to a local drive. May have been a temporary network glitch, I see no reason why the drive being NFS-mounted should make any difference.

Posted by neils on 2008-01-10 23:26:17.

About the tagged subfolders issue, why not make shortcut ? it will save room isn't it ? for example if ma citation is tagged tag1 and tag2 i should get this :

tag1/citation1.pdf

tag2/shortcut to citation1.pdf.lnk

however thanks for your work !

Y

Posted by yoanjacquemin on 2008-01-17 15:50:03.

Hi,

This is exactly what is needed but unfortunately it did not work for me as it didn't upload a single file (my articles are mostly Physical Review, Biophys J., & alike and have DOIs or/and URLs). The message either said that PDF already exists or it timed out (not too often, though).

Hope you'll have time to take a look at this problem.

Posted by softsimu on 2008-01-11 00:51:59.

Hi softsimu,

Try the latest test version, online at http://www.andrewberman.org/projects/sync/test/ . I fixed the bug that was erroneously making the script think that PDFs already existed, and also added a few new rules that should help in general with finding PDFs in some of the journals from which you're gathering citations. Let me know how it helps things.


I suppose I could create some sort of system that allows people to easily send me logs of how things went wrong, but then again I could (should) also start working on my PhD :-) .

Posted by gjuggler on 2008-01-11 11:04:00.

Hi Greg,

Seems like the problem persists (I tried the test version as you suggested). Now, I get either 'Error', 'Timed out' or 'No link found for citation'. I don't think I got a single success and most of my entries have links. I wonder if it would be possible to be able to have a form to retrieve a single entry to see what is the problem and if the problems are associated with certain journals? Oh, does the retrieval work for arxiv.org preprints?

Thanks for your efforts, highly appreciated!

Posted by softsimu on 2008-01-11 14:47:35.

Hi softsimu,


Something strange is going on here. I grabbed the first five citations from your library, copied them to my account, and got all 5 PDFs right off the bat (I'm getting the articles from Cambridge University). This means there's some difference in our computer or network set-ups that is preventing the scripts from working for you.


What version of Java are you running? Do you access articles through something like EZProxy? I'm interested in finding the root of this problem, but this discussion board is probably not the place for digging up specific technical problems like this... Shoot me an e-mail at the address listed on the SyncUThink website.


Cheers,

greg

Posted by gjuggler on 2008-01-11 21:11:51.

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.