darch ages: 2010

Tuesday, October 19, 2010

Why is Jquery popular ? prototype.js smoke it in every way

I think the title is pretty much self explanatory, isn't it ? I had to use it for a project (company policy), and i was very disappointed by it. I reach a point where i really was asking myself why it is the most popular javascript framework ? I just see two reasons for it:

They had a better communication than the other ones.
Same "easy to use out of the box" tutorial as Ruby on Rail at the time; You can make a blog in 30min but you can't make a CRM or hight availability application (cf. the quick migration of twitter from ruby to scala/java).

And actually it is with the second point i have an issue. And to illustrate my point, i will go over the 5 main domains where javascript is nowadays used:

Event listener
DOM scripting or DHTML (for the 0ld5k00l3r ;))
Ajax
Object Oriented Programming (creating class)

1. Event listener:
It's a matter of fact that all the modern browser implement the DOM Level 2 Event Model. As you can read, there is 5 events types : HTML, key, mouse, mutation, UI.
When i check the Jquery event page: They are barely supporting key, mouse and HTML event types.
At the same time the prototype.js page says that it support them all ! I know for a designer or a beginner .click(callback) looks better than .observer('click',callback), but the result is that to a certain level you cannot use Jquery for event listening. As a result, don't us Jquery to do Event-driven programming unless you just do click event.

2. DOM scripting
That was the cherry on the cake of my deception. To create a new node, you have to call the jquery object and include the element string in it like $('<a href="test.html">test</a>'). To be honest i laugh when i rod it. I mean, first the Element class in prototype.js make the code look really legitimate and help the developer to get used to the upcoming JS syntax (like the Image object). Second, it is confusing and those fake function like attr and css as well, they are getter and setter at the same time ! i mean i would love to discuss with the architect to know what motivated this decision. Then you have to insert it in the document now ! I was disappointed by the fact that you have a set of method to interact with the document if the content you want to insert is an object (append, insertBefore etc..) and a string (text(), html()) instead of the simple prototype.js one like update and insert. Same critics about the jquery object, i understand the point to use CSS selector but by doing that, you lost all abilities to make dynamic feature (for a dynamic language that is a problem). For example i cannot have a function like "function foo(bar){return $(bar);}" that accept id string and DOM object because in one case, the syntax would be "$('#'+bar)" and in the other case, it would be "$(bar)". You what i mean ?

3. AJAX
There is nothing much to say, look at the Jquery feature list and the prototype.js one. And for the record i hate the fact that Jquery eval javascript code when it's the result of an ajax call.

4.OOP
It simply inexistant in jquery. Where in prototype.js you have access to all of this features.

I would also add the fact that i hate the fact that the template engine in jquery is a separated module, it should be included by default like in prototype.js or the effect module don't have event listener such as when the animation start or finish neither the possibility to parallelize several effect like what you can do with scriptaculous.

I would just concluded that if Jquery is popular it might be because of it syntactic sugar effect (.click(), .animate()) and, i have to admit, efficient on simple project, but if you move toward more advanced project you will end up writing most of the code in pure javascript increasing the chance of cross browsing issue.

Wednesday, May 26, 2010

a real hiragana to romanji function

Recently i had to find a solution for translating words written in hiragana into latin characters. I have to admit that i was disappointed with what i found. Basically those functions/programs were just doing a key/value lookup without taking account of some basic rules such :

- "っ " like in "せってい" (setting) that is translated in "settei".

- Long vowel, actually there is different rules such as adding a "h" (american rule) or a circumflex accent on the vowel (french rule). However, since in general people write "とうきょう" "tokyo" and so on, i decided to just get read of the long vowel.

I know that i am a picky person, but when you work with named entities, it's a minimum to keep them as right as it should be.

The actual code is in PHP but can easily be ported to other languages or modified to convert katakana (were you expecting me to do your work ;)). enjoy !


<?php 
// should be the same a the script encoding
mb_regex_encoding("UTF-8");

function charSpliter($str) {
    $token = array();
    while (1) {
        $bytes = mb_ereg("[一-龠]|[ぁ-ん]|[ァ-ヴー]|[a-zA-Z0-9]|[ａ-ｚＡ-Ｚ０-９]", $str, $match);
        if ($bytes === false) {
            break;
        } else {
            $match = $match[0];
            $token[] = $match;
        }
        $pos = strpos($str, $match);
        $str = substr($str, $pos + $bytes);
    }
    return $token;
}
function kana_2_romanji($kana_string) {
    $kana_list = array('a_h'=>'あ', 'i_h'=>'い', 'u_h'=>'う', 'e_h'=>'え', 'o_h'=>'お', 'ka_h'=>'か', 'ki_h'=>'き', 'ku_h'=>'く', 'ke_h'=>'け', 'ko_h'=>'こ', 'ga_h'=>'が', 'gi_h'=>'ぎ', 'gu_h'=>'ぐ', 'ge_h'=>'げ', 'go_h'=>'ご', 'sa_h'=>'さ', 'shi_h'=>'し', 'su_h'=>'す', 'se_h'=>'せ', 'so_h'=>'そ', 'za_h'=>'ざ', 'ji_h'=>'じ', 'zu_h'=>'ず', 'ze_h'=>'ぜ', 'zo_h'=>'ぞ', 'ma_h'=>'ま', 'mi_h'=>'み', 'mu_h'=>'む', 'me_h'=>'め', 'mo_h'=>'も', 'ta_h'=>'た', 'chi_h'=>'ち', 'tsu_h'=>'つ', 'te_h'=>'て', 'to_h'=>'と', 'da_h'=>'だ', 'di_h'=>'ぢ', 'du_h'=>'づ', 'de_h'=>'で', 'do_h'=>'ど', 'na_h'=>'な', 'ni_h'=>'に', 'nu_h'=>'ぬ', 'ne_h'=>'ね', 'no_h'=>'の', 'ha_h'=>'は', 'hi_h'=>'ひ', 'fu_h'=>'ふ', 'he_h'=>'へ', 'ho_h'=>'ほ', 'ba_h'=>'ば', 'bi_h'=>'び', 'bu_h'=>'ぶ', 'be_h'=>'べ', 'bo_h'=>'ぼ', 'pa_h'=>'ぱ', 'pi_h'=>'ぴ', 'pu_h'=>'ぷ', 'pe_h'=>'ぺ', 'po_h'=>'ぽ', 'ra_h'=>'ら', 'ri_h'=>'り', 'ru_h'=>'る', 're_h'=>'れ', 'ro_h'=>'ろ', 'wa_h'=>'わ', 'wo_h'=>'を', 'ya_h'=>'や', 'yu_h'=>'ゆ', 'yo_h'=>'よ', 'n_h'=>'ん', 'wa_h'=>'わ', 'wo_h'=>'を', 'xya_h'=>'ゃ', 'xyu_h'=>'ゅ', 'xyo_h'=>'ょ', 'xa_h'=>'ぁ', 'xi_h'=>'ぃ', 'xu_h'=>'ぅ', 'xe_h'=>'ぇ', 'xo_h'=>'ぉ', 'xtsu_h'=>'っ');
    $small_kana_list = array();
    $tokens = charSpliter($kana_string);
    $result = '';
    $word_length = count($tokens);
    for ($i = 0; $i < $word_length; $i++) {
        $char_key = array_search($tokens[$i], $kana_list);
        if ($char_key !== FALSE) {
            $translation = substr($char_key, 0, strpos($char_key, '_'));
            $buffer = '';
            if (strpos($translation, 'x') === FALSE) {
                $buffer = $translation;
            } else {
                if ($translation == 'xtsu') {
                    $next_token = kana_2_romanji($tokens[$i + 1]);
                    $buffer .= substr($next_token, 0, 1);
                } elseif (strpos($translation, 'x') === 0) {
                    $prev_token = kana_2_romanji($tokens[$i - 1]);
                    $radical = substr($prev_token, 0, strlen($prev_token) - 1);
                    $terminaison = substr($radical, 0, 1) === 'k' || substr($radical, 0, 1) === 'g' ? substr($translation, strlen($translation) - 2) : substr($translation, strlen($translation) - 1);
                    //remove the previous token
                    $result = str_replace($prev_token, '', $result);
                    $buffer .= $radical.$terminaison;
                }//*/
            }
            
            //in case of long vowel とう,しゅう etc..
            $next_token = kana_2_romanji($tokens[$i + 1]);
            $buffer_last_voyel = substr($buffer, strlen($buffer) - 1);
            if ($next_token === 'u' && ($buffer_last_voyel === 'u' || $buffer_last_voyel === 'o')) {
                // here is to add your long vowel rules
                $i++;
            }
            
            $result .= $buffer;
            
        }
    }
    return $result;
}
var_dump(kana_2_romanji('にほんご'));
var_dump(kana_2_romanji('せってい'));
var_dump(kana_2_romanji('しゅうせい'));
var_dump(kana_2_romanji('ぎょうざ'));
?>

Monday, April 5, 2010

Has HTTP became a query language ?

We all know that HTTP stand for HyperText Transfer Protocol and the story behind. However with the growing decentralization in web architecture and a web more and more data oriented, WSOA (Web Service Oriented Architecture) and service are taking a bigger part in our way to conceive online product.
Who nowadays will develop its own map service instead of using google maps or bing maps, or neglect to setup an API to increase the source of traffic and bizness opportunities ? (If your web team never talk to you about it, be very afraid).

In order to standardize this phenomenon, the W3C released in 2003 a technology called SOAP (Simple Object Access Protocol). Beside an implementation in all the programming language i know, developers didn't adopt this technology at all. A lot of critics have been said, myself have a lot to say (Why a WSDL is 100 line long and only the last 10 are really useful ? Why they forgot an easy error messaging system ?). But there are way more controversial technologies that have been well received by the developers. No, the real murderer is the foundation of internet itself: HTTP.

In my opinion HTTP (in a broader sense the REST approach) was/is preferable for the following reasons:
- Natural
- Easy to consume and implement
- Architecture-wize as beautiful as L'Albatros of Baudelaire.
Natural
What is more natural on internet then accessing an URL to get data ? Certainly not an object that the constructor take as an arguments the URL of a WSDL/XML file and then calls method.
Easy to consume and implement
As said previously where SOAP based service are not easy to use, implementing one is event a more tedious task, and debugging it the best way to get ride of people; HTTP REST approach is easy to use by anybody that know what a browser is, it's event easier to implement than a html web application and you have a proven error messaging (when you see 404 or 501, you know what the problem is).
Architecture-wize as beautiful as L'Albatros of Baudelaire
I refer to this poem because is by far the most elaborated french poem i ever rod (and wrote an essay about it). Basically there is more in this poem than a form and a meaning; This poem have a form and severals meanings. Let's make the analogy, each sentence is a URL, each word is parameter. As genius it is to use increasing tempo when the poet refer to the bird in order to emphasis the wind element and so evoke his freedom and higher mind, Isn't as poetic to call :

- http://example.com/item/123456.html to retrieve the PC HTML description page of an article

- http://example.com/m/item/123456.html to retrieve the mobile HTML page

- http://example.com/item/123456.json to retrieve the json version

- http://example.com/item/123456.xml to retrieve the xml version

- http://example.com/item/123456.jpg the product image

- http://example.com/order/123456.xml to put the item in the cart and retrieve the result in XML format

When you look at the the previous examples, you will realize that what have been written is similar to:

"SELECT * FROM item WHERE id=123456 and format='xml'". This why i do believe that HTTP have became a query language. Beside the claim and eventually the philosophical value of this statement, questions are raised:

- Did we reached the limits of HTTP ? Would the protocol support the next moves?

- What if the web was already semantic and the future of RDF, SPARQL etc.. are the same as SOAP ?

I honestly have no idea.

Monday, February 8, 2010

let's talk Data Life cycle

I would like to take some time to discuss a subject that is way underestimated: The Data Life cycle (referred as DLC in this article). We focus a lot (and for a lot of good reason) on the software life cycle, production cycle etc... but in the information society we live in, data became more valuable then the product itself. For example, What really make the value of a SNS platform ? The product itself ? The engineering team ? The answer is unfortunately No; It's the data that the platform generate, aggregate and store.

First, let's define what we call "data". In the present article (and in all article i write) is valuable bound between two information. An information is the smallest atomic unit of meaning. For example, an IP address is an information, a DNS is an information but the fact that we know that this DNS is pointing to this address, it a data.
One of the most interesting property of data, is the fact that data have an expire date. They emerge, exist, disappear and are extremely fragile. I would take my example of the DNS, imagine that i move to a brand new server with a brand new IP address. A data expired, a new one emerge and my set of information hasn't change at all. You are probably asking yourself right now: And so what ? Well, as you all know, it take approximatively 48 hours for a DNS related data to propagate around the world, meaning for 48 hours your application, your business won't work properly and might have drastic consequences (corrupted data, service inaccessible for a part of your customer, instability etc...).

Of course, nowadays we have strategies for DNS migration because people worked on it, but we don't have anything similar for other cross-service data propagation.

We move toward a more and more decentralized system design. Who will create nowadays it's own map service when you can use proven service such as google maps, yahoo map or bing map ? Nobody.
Are those API trustworthy ? Yes and No.
Yes they are, because they are great product produced by great companies. And No they aren't, because they haven't been designed to be used as a part of a product but to be a product on its own. What make me claim that is the fact that the communication is a one communication ; the API send you information but you can't send them information.
The consequences on your system, it that you end up with partial data or expired data or non existing data. Be cause in that case, the best strategy to handle geological DLC would have be able to submit them your set of information in order to keep your data valid.

Let's have a look to one of my last year project : merial japan's vet locator . This is a service that merial japan offer. The name speak by itself, it's an application that reference pet clinic in japan and help pet owner to easily locate and contact the most suitable pet clinic according to the user location (holiday location, residence etc...). We are really happy with the result, the feedbacks are great, in a nutshell, a success. But still, we have a hard time to keep our data relevant. Google (we are using google map) is a really knowledgeable company but it isn't omniscient. So sometime google is not able to locate one of the clinic, or the position is not precise enough etc... In these case we need to update/create those information manually in order to keep our data up to date. Here we clearly see that the best strategy to handle data would be to be able to post to google our updated/created information, so their geolocation database would be continuously up to date. Du to the number of google map's user, we tend to believe that we would be able to reduce the number of manual update because the number of information modification would be shared among the users and not have to administrate it's own set of "information patch".

I wish this article raised the awareness of DLC and it consequence on tomorrow's system.

darch ages