I would like to take some time to discuss a subject that is way underestimated: The Data Life cycle (referred as DLC in this article). We focus a lot (and for a lot of good reason) on the software life cycle, production cycle etc... but in the information society we live in, data became more valuable then the product itself. For example, What really make the value of a SNS platform ? The product itself ? The engineering team ? The answer is unfortunately No; It's the data that the platform generate, aggregate and store.
First, let's define what we call "data". In the present article (and in all article i write) is valuable bound between two information. An information is the smallest atomic unit of meaning. For example, an IP address is an information, a DNS is an information but the fact that we know that this DNS is pointing to this address, it a data.
One of the most interesting property of data, is the fact that data have an expire date. They emerge, exist, disappear and are extremely fragile. I would take my example of the DNS, imagine that i move to a brand new server with a brand new IP address. A data expired, a new one emerge and my set of information hasn't change at all. You are probably asking yourself right now: And so what ? Well, as you all know, it take approximatively 48 hours for a DNS related data to propagate around the world, meaning for 48 hours your application, your business won't work properly and might have drastic consequences (corrupted data, service inaccessible for a part of your customer, instability etc...).
Of course, nowadays we have strategies for DNS migration because people worked on it, but we don't have anything similar for other cross-service data propagation.
We move toward a more and more decentralized system design. Who will create nowadays it's own map service when you can use proven service such as google maps, yahoo map or bing map ? Nobody.
Are those API trustworthy ? Yes and No.
Yes they are, because they are great product produced by great companies. And No they aren't, because they haven't been designed to be used as a part of a product but to be a product on its own. What make me claim that is the fact that the communication is a one communication ; the API send you information but you can't send them information.
The consequences on your system, it that you end up with partial data or expired data or non existing data. Be cause in that case, the best strategy to handle geological DLC would have be able to submit them your set of information in order to keep your data valid.
Let's have a look to one of my last year project : merial japan's vet locator . This is a service that merial japan offer. The name speak by itself, it's an application that reference pet clinic in japan and help pet owner to easily locate and contact the most suitable pet clinic according to the user location (holiday location, residence etc...). We are really happy with the result, the feedbacks are great, in a nutshell, a success. But still, we have a hard time to keep our data relevant. Google (we are using google map) is a really knowledgeable company but it isn't omniscient. So sometime google is not able to locate one of the clinic, or the position is not precise enough etc... In these case we need to update/create those information manually in order to keep our data up to date. Here we clearly see that the best strategy to handle data would be to be able to post to google our updated/created information, so their geolocation database would be continuously up to date. Du to the number of google map's user, we tend to believe that we would be able to reduce the number of manual update because the number of information modification would be shared among the users and not have to administrate it's own set of "information patch".
I wish this article raised the awareness of DLC and it consequence on tomorrow's system.
No comments:
Post a Comment