Archiwum

Archive for Luty 2010

Blog sites and search engines

2010-02-03 Komentarze wyłączone

Blog sites and pages indexed by particular search engines:
Google cache is searched by site:domainname. Yahoo is queried via linkdomain:domainname. Yandex is checked by typing search domain in advanced search.

Blogspot (blogger.com) is displayed in Alexa as world’s 7 most popular page, and 8 in USA. WordPress has Alexa rank 19. Livejournal is 82 in Alexa’s global ranking, but it has very high note in Russia: no 11 (most popular blog site in Russia, Ukraine and Belarus).

wordpress.com
google – 223 mln
yahoo – 64 mln
yandex – 280 тис

livejournal.com
google – 70 mln
yahoo – 53 mln
yandex – 49 млн

blogspot.com
google – 834 mln
yahoo – 188 mln
yandex – 21 млн

The results are quite interesting. The number of pages indexed by yandex for LiveJournal is almost as high as by google and yandex, where for blogspot and wordpress results are far behind. According to Yandex, 47 mln from 49 mln LiveJournal pages are in russian, which explains the results quite good.

There are quite few wordpress blogs which are indexed by yandex, and google search for given language doesn’t return the count. There is also a question, how good the search engines are at guessing blog language. I suppose that russian-language (or partially russian-language) blog has much more chance to be indexed by yandex beeing hosted on livejournal and not on wordpress. It is another thing on my list to check in free time.

There are some tricks to test for language tips for search engines. When defining language for page via meta content-language tag is not applicable (f.g. you can’t specify custom meta tags for particular pages) you can set lang property on span element. This tip is described in submit-site tips article on seo expert services.

RtfTemplate and character encoding

2010-02-03 Komentarze wyłączone

RtfTemplate is quite interesting tool, that enables processing MsOffice serial correspondence (mail merge) templates in java code. The source and the output is in RTF format. It is very good to quickly and easily prepare prints in application. What is more, the prints can be prepared by non-technical users, teached only in using MsOffice (which is common skill by office workers).

Hovewer, there is one serious problem with the tool. It lacks the ability to property encode characters. The data put into template is not displayed properly, not counting those cases, where data is 7-bit ASCII only. However, this was propably the test case for authors, so the result is the tool, that is very useful, but only for the small part of the world.

The solution is not elegant, but quite simple. It requires using another tool, which is free: iText. iText has build-in functionality to print into RTF documents, and its class RtfDocument has methods for encoding characters according to RTF standard. The only thing you have to do, is to use it to encode the data you are going to print. The minus is, that you have to do it for all the data you send to template printer, no matter how deep in the structure sent they are. The function to call is quite simple. This is the sample how to use it:

22     public static String escape(String sentence) {
23         ByteArrayOutputStream baos = new ByteArrayOutputStream();
24         try {
25             new RtfDocument().filterSpecialChar(baos, sentence, true, true);
26         catch (IOException e) {
27             // will never happen for ByteArrayOutputStream
28         }
29         return new String(baos.toByteArray());
30     }