Wednesday, 4 November 2009

Who owns user data?

I have been involved several times on discussions around user data ownership. With the advent of cloud computing, though, some of the topics identified may take another shape and service/application providers may have a chance to fully leverage the capabilities exposed by cloud computing.

A bit of context first. If I subscribe to a service on the web, whatever it's nature is (Facebook, Twitter, you name it!), I generate data about me that the service stores, on my behalf, in it's premises. Examples are username, contact details, preferences, tags, ...
If for whatever reason I want to access such data, I typically can only do it via the user interface that the service provides. In any case I never - it seems to me - have access to the raw data; let alone being able to use some other access interface.

But think if such data is stored on the Cloud on an Amazon S3 bucket (or equivalent), which offers an easy to access interface over HTTP, user ownership, Access Control Policies and the possibility of managing such data via the plethora of applications part of the Amazon ecosystem. In such case I - the user - have access to the "master copy" of my data and I can do whatever I want with it (such as delete it, modify it, make it more or less public).

Clearly there are drawbacks:

  1. it requires the service provider to implement and manage the mapping of it's internal user representation with that of Amazon (or whatever other cloud storage provider)

  2. the a service provider looses the ability to control the stuctural consistency of the data. And this can cause the application to malfunction.

  3. the service provider looses the possibility to hide such data from other service providers/application and consequently looses the power to forcibly keep the user using its servie (well, at least it doesn't facilitate it)


Point 1 is in reality a "non" issue from an implementation perspective. And it will be easily solved via agreements between the two interested parties. Agreements, moreover, are facilitated by the adoption of standard formats as and when appropriate.

Point 2 is the most interesting as it involves a different thinking on architecting applications (I refer here to the seven points illustrated here by Simone Brunozzi, that I discussed here): architect for failure (in this case data structures), loose couple application logic and data and so on.

Point 3 is probably the hardest to implement, not certainly because of technical issues. But this seems to me a false problem anyway as if the service offered degrades users will go away no matter what.

I want to see people making use of what the cloud brings to the table on top "glorified web hosting" or grid computing. So maybe this is a valid "use case" that will stimulate developers and architects to think more out of the box.

Things learned at AWS Cloud for the Enterprise: London

The event was organised and funded by Amazon to market AWS to the enterprise and took place on Tuesday 3/11/09 in London. This is the list of points summarising what I learned. This list has been created by consolidating a bunch of Tweets fired off in the last hour or so.

  1. Paul Nasrat, Lead System Integrator at Guardian mentioned some of the tools and techniques used to develop and deploy a website to distribute content related to PMs expenses. Technologies also includeed InnoDB and Puppet. Puppet seems definitely a tool I should be looking at for developing an idea I have been having for some time.

  2. Bob Harris, CTO at Channel 4 did the best presentation of the day. The most interesting things he mentioned are:

    • Amz roadmap isn't public. He hopes one day it will be to help enterprises to plan ahead

    • Amz believes that Cloud computing is at the "peak of inflated expectations" http://is.gd/4Mo1F

    • Cloud is no silver bullet: in fact he only deploys web sites for content distribution (one of which is ScienceOfScams)

    • The way C4 managed legal issues around Data Protection is by having made an enterprise agreement with Amz covering all those issues

    • Myths around cloud (lack of security, low throughput, low reliability, low availability) are just myths - to his experience, anyway.

    • You need a techie and a credit card to get going on the Cloud. In fact, although CapEx is limited, he had to do initial investments to guarantee that AMIs being used for his deployments were up to his required standards (bottom line: it's not as simple as it seems)

    • Hybrid clouds are next in his pipeline, with the intent of overcoming issues around deployment of enterprise applications in the public cloud.

  3. There were, in total, 4 Amazon customers' presentations/case studies. Yet again all falling in one of the "number crunching" and "glorified web hosting" categories. (That is, I have yet to see a proper enterprise app making best use of the whole set of services and API access that Amazon uses)

  4. Amazon took three years to push in anger the deployment of his retail stack (the book store and the reseller site) on the Cloud. This, to me, it's rather obvious as only recently AWS has reached a level of maturity and a set of functionalities that guarantees a certain success. They have currently migrated about 7% of their applications.

  5. Simone Brunozzi (@simon on Twitter) - Technical Evangelist at AWS - presented an interesting summary of the things to take into account when architecting apps to be deployed on Cloud (see here). In a nutshell:

    • Design for failure: apps should be written with failure in mind. Plan for when the app fails (not *if*)

    • Loosed coupling sets you free: loose coupling improves resiliency (incidentally any new app developed at Amazon is exposed as a service

    • Design for dynamism: leverage the cloud elasticity by making no assumptions on the location, health and configuration of the apps and it's dependencies should be made when the app is developed (note to self: should start making use of user data)

    • Security is everywhere: think of security constraints at every level of the stack, OS, network, application (some of these are facilitated by the Amazon infrastructure)

    • Don't fear constraints: think laterally on how to overcome constraints - does your app need huge amount of RAM? consider distribution across instances...

    • Many storage options: one size doesn't fit all for storage, choose the one right for the type of application needed (see also One Size Doesn't Fit All, also via @simon)

    • AWS ecosystem and community: re-use the expertise available in the community and ecosystem of applications both in the open-source and paid-for worlds.

PS: within the context of this post "the Cloud" is actually "the Amazon Cloud"

Sunday, 26 July 2009

The real RestFixture and the copy

I noticed, some time ago, that RestFixture has been implemented on GreenPepper a commercial clone of Fitnesse.
The concept has been taken on board and seamlessly re-implemented to the extent that the developers documentation of the fixture is pretty much identical.
I feel sooo proud!

Monday, 6 April 2009

JMeter to test robustness of system exposing REST APIs

UPDATE: code moved here: http://github.com/smartrics/JMeterRestSampler

I have been involved on continuously testing systems for robustness. One of my toy projects is the JMeterRestSampler, a plugin for JMeter to ease creation of such tests and automate their execution on a continuous integration server.

Robustness testing
In this context, a robust system is such that if it is subject to a constant load for a given amount of time, its response is correct and its throughput is constant.

The aim of a robustness test is to identify design or development issues such as memory leaks or random/unfrequent bugs that over time may degrade or crash the system. It's worth pointing out that robustness tests differ from load or stress tests which aim more at identifying bottlenecks or actual system limits (responses to peaks or bursts).

Continuously test for robustness is hence a must to improve the quality of the system under construction.
I, Robbie and Raghav presented a session at Agile '08 and XPDay 08 on this very topic.

Specifically for robustness, a test is defined by running scenarios concurrently. A scenario is a set of operations on the system that emulate typical usage. Our examples at both the presentation were based on the robustness tests we built for Aloha, an application server for Voice over IP applications. In this context a scenario - typical usage of the system - is for example "Connect voice call between two participants" that translates into a sequence of simpler operations performed via the application server's public interface.

Robustness with JMeter
When preparing the XPDay presentation, I started musing with JMeter to see if it offered simple means to build robustness tests. The bare essentials i was looking for are

  • the possibility to build scenarios in relation to the system under test (given the above definition of scenario)
  • automate tests and run them from a continuous integration server and
  • the possibility to report on the outcome of the tests.


As I have been involved in building REST APIs for several months, I chose to experiment assuming that the system under test offered a REST exposure.

Scenarios

In JMeter robustness scenarios can be modeled as ThreadGroups. A ThreadGroup groups a set of sampling operations on the system under test. Each sampling operation is executed serially within the thread running the ThreadGroup and each ThreadGroup can be run in loop and in parallel with other ThreadGroups for as long as necessary, either terminating them after a fixed number of runs or after a given time.

All the goodies shipped with JMeter can be used like pre/post processing instructions, definitions of user variables, throttling, etc. (see JMeter manual for more).

The following picture shows a simple test plan for a system under test with a robustness test made of 2 scenarios: "Register customer" and "Place an order". For the "Register customer" scenario, the ThreadGroup will be run in 10 parallel threads and each thead will run in loop for 100 times (so, in toal, the system will be running 1000 scenarios).



Sampling REST resources

JMeter offers a wide selection of samplers for sampling systems and collecting results. It does not offer, though, a proper sampler for the definition of REST requests. In the spirit of readable tests for documentation I wanted each sampler (effectively a REST request) to look like a REST request with a URI, method, list of headers and a body (much in the spirit of the RestFixture). So I discarded the HTTP sampler as too bound to HTTP requests.

So the first stab was to build a new sampler: the REST Sampler. It allows to define a rest request by setting all the relevant parameters: the URI, the headers, the body, the verb. It also processes JMeter assertions and post processors (attached as chidren to the sampler node) to perform basic assertions on the response elements and, for example to extract data from the last response and store it for future reference.

The REST sampler


Asserting the response code of the last response


Extracting data from the response for future reference, via JMeter variables




Monitoring the remote system: JMX Sampler

As said, part of a robustness build is to observe the system under test for key indicators of the health of the system. If the system under test is built in Java, one key element is the amount of heap memory allocated by to the system at any point in time: the trend of the average value of the heap memory may spot memory leaks. To grab this data in the context of a robustness test via JMeter, I have added a JMX Sampler, which basically connects to the given JMX URL and retrieves Heap memory information.


Next, I wanted to present the information collected. If JMeter is run via GUI, a JMX listener can be added to gather and chart the memory data.



It's possible to add more than one JMX sampler pointing to different JMX servers in such cases when the REST application is distributed. Charts of each Heap memory usage are displayed in the same listener window. Also, the charts can be saved in one or more files if needed.

Test Automation

The REST and JMX samplers plus the JMX listener can be used to build robustness tests necessary to test systems exposing REST interfaces. The next step is automation.
Automation is achieved with Ant and the JMeter task, which allows to run the same test plan but from whitin an Ant task (hence invokable via any Continuous integration server). Documentation on how to use the JMeter ant task is available in the website.

I, though, thought to extend the task to generate charts using the data available in the JMeter result output file. The Charts generated are one for the time occurred for each of the rest requests (which gives an indication on how the performance of the system are degrading overtime) and one for the JMX sampling as detailed above.

<taskdef name="jmeter"
classname="smartrics.jmeter.ant.JMeterTaskExt"
classpathref="project.classpath" />

<target name="run" depends="clean, init, copy">
<jmeter jmeterhome="${jmeter.home}"
resultlogdir="${artefacts}"
failure="false"
failureproperty="jmeter.failed"
chartsoutputdir="${artefacts}"
succecssThresholdPerc="98"
jmeterproperties="${jmeter.home}/bin/user.properties">
<testplans dir="${build}" includes="Robustness.jmx" />
</jmeter>
</target>

<target name="report">
<xslt in="${artefacts}/Robustness.jtl"
out="${artefacts}/Robustness.html"
style="${etc}/jmeter-results-detail-report_21.xsl" />
<copy todir="${artefacts}">
<fileset dir="${etc}" includes="*.jpg" />
</copy>
</target>


The location where the files are generated is defined by the attribute chartsoutputdir="...", if omitted no chart will be generated.

Correctness and the failure threshold

In certain cases a build may be allowed to pass even if some of the assertions did not pass. Such behaviour is admissible in systems that implement, for example, optimistic locking or other mechanisms that trade correctness for performance. Each REST sampler flags the sample as sucess or failure depending on the result of each associated assertion. By counting such flag values it's easy to determine the percentage of successes.

If the Tester decides that failure is acceptable, she can specify the succecssThresholdPerc="..." attribute to define the minimum percentage of successes that need to happen in order for the build to pass.

Build and isntallation instructions

Build
The JMeterRestSampler is available here:
http://github.com/smartrics/JMeterRestSampler
currently hosted in the same SVN trunk as the RestFixture
After checkout, add a properties/${user.name}.properties file and define the jmeter.home property. Then run ant.
${user.name} is the username of the logged in user.

If the build passes the file build/JMeterRestSampler.jar is generated.

Install
Edit the ${jmeter.home}/bin/user.properties and set the searh_paths property to the location of the JMeterRestSampler.java file: search_paths=/path/to/JMeterRestSampler.jar. Alternatively, copy the JMeterRestSampler.jar in the directory ${jmeter.home}/lib/ext.

If installation is successful, when starting JMeter, the Rest Sampler and JMX Sampler should be available in the Add/Sampler ThreadGroup context popup menu.

Eclipse project
The .classpath and .project Eclipse metadata files are included. To successfully compile the project within Eclipse, once imported, you need to define two workspace variables JMETER_HOME and ANT_HOME to respectively point to your JMeter home and Ant home directories.

License
The software is licensed using a BSD license.

Disclaimer
I should point out that the JMeterRestSampler is no more than a toy to prove the concept of Robustness testing via JMeter. It started as a spike back in the days of XPDay 08 and evolved - slowly - to it's complete form. I am not intentioned to improve it in the foreseeable future.

Tuesday, 24 March 2009

Building Spring 3.0 M2 from behind a proxy

It took me a while to find out where to put the proxy settings for allowing the ant/ivy build scripts to fetch the dependencies.


$> vi /spring-framework-3.0.0.CI-167/projects/spring-build/lib/ivy/jets3t.properties


then


#...
httpclient.proxy-autodetect=false
httpclient.proxy-host=<your-proxy-host>
httpclient.proxy-port=<your-proxy-port>
#...


This should proxy out any request (to s3). Note that proxy-autodetect is by default true.

You can now build the framework via ant:

/spring-framework-3.0.0.CI-167/projects/build-spring-framework$ ant

Tuesday, 27 January 2009

Words art: wordles

I have found a cool website to create pictures from words, wordles that is. This is what I have come up with using the this blog posts as source.

Tuesday, 13 January 2009

New kids on the Blues block

I just bought the debut CDs of Oli Brown and of Dani Wilde. I'll keep an eye on these two new kids on the block.

A taste of Blues from Oli Brown is here.

Dani Wilde instead can be heard here on a John Lee Hooker classic

Saturday, 10 January 2009

Rotten apple meets the amazon

I just found that I can't login any more into my iTunes account. "This Apple ID has not yet been used with the iTunes Store" I am told. Clearly it's not the case as I have already spent few bucks in the shop. I have few pounds left - I am hoping to spend as little as possible troubleshooting the issue (first clue is that I just updated to iTunes 8).

Anyway, I will not care for long: I have just relised - a bit late I know - that Amazon is selling DRM free MP3s cheaper than iTunes Store.

And, by the way, iTunes' interface still sucks big time hasn't improved at all.

Monday, 5 January 2009

Quote of the year

It may seem obvious, but I believe this quote is pure truth:
"Pleasure in the job puts perfection in the work." (Aristotle)