![]() |
CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
When I import my BibTeX database, the german umlauts vanish from all tags, cf. http://www.citeulike.org/user/joachim_tr/article/4964080. I tried to convert my UTF8 based database to TeX notation, but to no avail, other than seeing that the umlauts are wrong at certain places in the list view too (but right in the details view).
This is strange because otherwise, umlauts are no problem.
Another problem in tags seems to be the slash: I have a tag "9/11" which looses the / in between, but again, not in titles etc. Cf. http://www.citeulike.org/user/joachim_tr/article/4964297
Thanks for your fine work, anyway.
Joachim
Posted by joachim_tr on 2009-06-26 00:32:26.
13 replies. Login or join this group to post to this thread.
We should be able to sort out the umlauts u" -> ue, etc. We don't allow slashes in tags, but could translate to "-".
Posted by thegoose on 2009-06-26 07:38:20.
Maybe I can flesh out my problem further: let's have a look at http://www.citeulike.org/user/joachim_tr/tag/rumnien, which in my original database is tagged "Rumänien" (I hope the umlaut is making it through): The 'booktitle' reads "Im Dialog: Rum{ä}nistik im deutschsprachigen Raum" with curly braces around the umlaut, whereas in the 'title' all is as it should be.
When I go in the detailed view, the booktitle is correct, but the tags are mangled: "bersetzung" should be "übersetzung" and "rumnien" as mentioned above "rumänien".
Thanks again
Joachim
Posted by joachim_tr on 2009-06-26 11:02:48.
the tags are mangled: "bersetzung" should be "übersetzung" and "rumnien" as mentioned above "rumänien".
Yes, until my change we only allowed a-z,0-9,_,- in tags and all other chars were stripped out. Those characters are still the only ones allowed but we do some "translation" of accented characters so that
übersetzung -> ubersetzung
rumänien -> rumanien
You'll have to manually change anything you added previously, but you can globally rename each tag (e.g., einfhrung -> einfuhrung) using the following page
http://www.citeulike.org/profile/joachim_tr/tag-rename
If you're a "techie" and want to do things in batch, you can call the rename function directly, e.g.,
GET -H 'Cookie: login=XXXXX' http://www.citeulike.org/profile/joachim_tr/tag-rename-do&tag=einfhrung&new_tags=einfuhrung
XXXX is the login cookie copied from browser.
Posted by thegoose on 2009-06-26 12:09:33.
OK, I've done
Übersetzung -> ubersetzung
9/11 -> 9-11
I decided not to do Ü -> Ue, etc., because (a) that a German specific thing and (b) most non-german speakers won't know about it and so might be confused.
Posted by thegoose on 2009-06-26 10:27:02.
Thanks, this is a (rather poor) workaround (sorry, presumably I don't quite understand the technical problems in putting umlauts in tags), but it works – but alas, only when editing my tags in CiteULike, not when importing.
However, the problem with emerging curly braces is still there, cf. now http://www.citeulike.org/user/joachim_tr/tag/rumanien.
Thanks alot anyway.
All the best
Joachim
Posted by joachim_tr on 2009-06-26 14:28:48.
but alas, only when editing my tags in CiteULike, not when importing.
My bad - I forgot about that. I'll fix it straight away.
I'll also look into the {} problem. Unfortunately, all (or most) of your articles are marked "private" so I can't see them - would you consider making the problematic ones public?
presumably I don't quite understand the technical problems in putting umlauts in tags
No technical problem. We made the decision to restrict to ASCII only. It makes sense as a common "baseset" of characters - i.e., almost everything can be written in ASCII, but it's hard for other cases. How would you type in the (Polish) "Ł", for example? One of the most powerful uses of tags is for sharing information amongst users (one of the key design goals of CiteULike) so we feel its important to have tags in a common character set.
Posted by thegoose on 2009-06-26 16:00:13.
My bad - I forgot about that. I'll fix it straight away.
Fixed. (I hope!)
Posted by thegoose on 2009-06-26 16:19:04.
However, the problem with emerging curly braces is still there,
Can you post (either here or email to support09@....) the BibTex you used to upload that article?
It's a very messy problem, but generally we try to preserve the actual text of what you uploaded. I expect that, if anything, the title (with the extra {}) is right and there should be extra {} in the booktitle. BibTeX is very weird.
Posted by thegoose on 2009-06-26 16:39:54.
However, the problem with emerging curly braces is still there,
I think one problem is that the input BibTeX is wrong (from my reading of the BibTeX manual). You supplied:
Darija {\v S}imunovi{ć}
But is should be (I think):
Darija {\v{S}}imunovi{\'{e}}
The extra space in {\v S} is messing up our parser.
Posted by thegoose on 2009-06-27 11:01:13.
Sorry, I don't know how to make a private article public (I didn't want to make the whole database public before having experimented a bit and before the tag problem beeing solved).
Anyway, you've got it right. I deleted all articles and reimported them (in UTF8), and the curly braces are gone. The faulty code you cited derived from a bug in BibDesk when I experimented with TeX-coded import.
We made the decision to restrict to ASCII only. It makes sense as a common "baseset" of characters - i.e., almost everything can be written in ASCII, but it's hard for other cases. How would you type in the (Polish) "Ł", for example? One of the most powerful uses of tags is for sharing information amongst users (one of the key design goals of CiteULike) so we feel its important to have tags in a common character set.
OK, your decision, but I think that that's a serious restriction for german users, and I think no one would search after "rumanien" when looking for "Rumänien" or "ubersetzung" when looking for "Übersetzung" ...
Posted by joachim_tr on 2009-06-28 18:31:25.
I think no one would search after "rumanien" when looking for "Rumänien" or "ubersetzung" when looking for "Übersetzung" ...
Actually, we'll be updating the search quite soon to cope with that, i.e., "rumanien" will match both "rumanien" and "rumänien" (though not "rumaenien"). And vice versa. Actually, as a special case, author searching already works like that, so searching for "author:Hass", "author:Häss" or "author:Häß" will match all three cases.
Posted by thegoose on 2009-06-28 19:17:29.
I don't know how to make a private article public
That's any easy one. View the article, and look for
(x) Your posting privacy settings: private
(3 down from tags). Open this and change from "Private to your library" to "Publicly visible".
Posted by thegoose on 2009-06-28 19:34:49.