Wednesday, 4 November 2009

Who owns user data?

I have been involved several times on discussions around user data ownership. With the advent of cloud computing, though, some of the topics identified may take another shape and service/application providers may have a chance to fully leverage the capabilities exposed by cloud computing.

A bit of context first. If I subscribe to a service on the web, whatever it's nature is (Facebook, Twitter, you name it!), I generate data about me that the service stores, on my behalf, in it's premises. Examples are username, contact details, preferences, tags, ...
If for whatever reason I want to access such data, I typically can only do it via the user interface that the service provides. In any case I never - it seems to me - have access to the raw data; let alone being able to use some other access interface.

But think if such data is stored on the Cloud on an Amazon S3 bucket (or equivalent), which offers an easy to access interface over HTTP, user ownership, Access Control Policies and the possibility of managing such data via the plethora of applications part of the Amazon ecosystem. In such case I - the user - have access to the "master copy" of my data and I can do whatever I want with it (such as delete it, modify it, make it more or less public).

Clearly there are drawbacks:

  1. it requires the service provider to implement and manage the mapping of it's internal user representation with that of Amazon (or whatever other cloud storage provider)

  2. the a service provider looses the ability to control the stuctural consistency of the data. And this can cause the application to malfunction.

  3. the service provider looses the possibility to hide such data from other service providers/application and consequently looses the power to forcibly keep the user using its servie (well, at least it doesn't facilitate it)


Point 1 is in reality a "non" issue from an implementation perspective. And it will be easily solved via agreements between the two interested parties. Agreements, moreover, are facilitated by the adoption of standard formats as and when appropriate.

Point 2 is the most interesting as it involves a different thinking on architecting applications (I refer here to the seven points illustrated here by Simone Brunozzi, that I discussed here): architect for failure (in this case data structures), loose couple application logic and data and so on.

Point 3 is probably the hardest to implement, not certainly because of technical issues. But this seems to me a false problem anyway as if the service offered degrades users will go away no matter what.

I want to see people making use of what the cloud brings to the table on top "glorified web hosting" or grid computing. So maybe this is a valid "use case" that will stimulate developers and architects to think more out of the box.

Things learned at AWS Cloud for the Enterprise: London

The event was organised and funded by Amazon to market AWS to the enterprise and took place on Tuesday 3/11/09 in London. This is the list of points summarising what I learned. This list has been created by consolidating a bunch of Tweets fired off in the last hour or so.

  1. Paul Nasrat, Lead System Integrator at Guardian mentioned some of the tools and techniques used to develop and deploy a website to distribute content related to PMs expenses. Technologies also includeed InnoDB and Puppet. Puppet seems definitely a tool I should be looking at for developing an idea I have been having for some time.

  2. Bob Harris, CTO at Channel 4 did the best presentation of the day. The most interesting things he mentioned are:

    • Amz roadmap isn't public. He hopes one day it will be to help enterprises to plan ahead

    • Amz believes that Cloud computing is at the "peak of inflated expectations" http://is.gd/4Mo1F

    • Cloud is no silver bullet: in fact he only deploys web sites for content distribution (one of which is ScienceOfScams)

    • The way C4 managed legal issues around Data Protection is by having made an enterprise agreement with Amz covering all those issues

    • Myths around cloud (lack of security, low throughput, low reliability, low availability) are just myths - to his experience, anyway.

    • You need a techie and a credit card to get going on the Cloud. In fact, although CapEx is limited, he had to do initial investments to guarantee that AMIs being used for his deployments were up to his required standards (bottom line: it's not as simple as it seems)

    • Hybrid clouds are next in his pipeline, with the intent of overcoming issues around deployment of enterprise applications in the public cloud.

  3. There were, in total, 4 Amazon customers' presentations/case studies. Yet again all falling in one of the "number crunching" and "glorified web hosting" categories. (That is, I have yet to see a proper enterprise app making best use of the whole set of services and API access that Amazon uses)

  4. Amazon took three years to push in anger the deployment of his retail stack (the book store and the reseller site) on the Cloud. This, to me, it's rather obvious as only recently AWS has reached a level of maturity and a set of functionalities that guarantees a certain success. They have currently migrated about 7% of their applications.

  5. Simone Brunozzi (@simon on Twitter) - Technical Evangelist at AWS - presented an interesting summary of the things to take into account when architecting apps to be deployed on Cloud (see here). In a nutshell:

    • Design for failure: apps should be written with failure in mind. Plan for when the app fails (not *if*)

    • Loosed coupling sets you free: loose coupling improves resiliency (incidentally any new app developed at Amazon is exposed as a service

    • Design for dynamism: leverage the cloud elasticity by making no assumptions on the location, health and configuration of the apps and it's dependencies should be made when the app is developed (note to self: should start making use of user data)

    • Security is everywhere: think of security constraints at every level of the stack, OS, network, application (some of these are facilitated by the Amazon infrastructure)

    • Don't fear constraints: think laterally on how to overcome constraints - does your app need huge amount of RAM? consider distribution across instances...

    • Many storage options: one size doesn't fit all for storage, choose the one right for the type of application needed (see also One Size Doesn't Fit All, also via @simon)

    • AWS ecosystem and community: re-use the expertise available in the community and ecosystem of applications both in the open-source and paid-for worlds.

PS: within the context of this post "the Cloud" is actually "the Amazon Cloud"