Tuesday, April 30, 2013

"Can Anyone Explain SOA In Simple Terms?"


A few days ago, David Diamond posed a deceptively simple question on one of the LinkedIn Group sites (SOA SIG) - "Can Anyone Explain SOA In Simple Terms?"

The barrage of widely varying responses that followed was, in a way, an eloquent answer to that question!

I've had my own take on SOA for quite a while now, so this gave me the opportunity to validate my model against what other practitioners had to say. And I must say this: I'm more convinced than ever that the industry is horribly confused about SOA. There are those whose understanding of SOA is at a purely technology level (even some of those who profess to understand that SOA is not (just) about technology). And there are others who may understand SOA for all I know, but whose explanations tend to be couched in so much jargon that they're really hard to understand.

In hindsight, David Diamond could not have asked a more insightful question.

Well, this is my blog, so just as history is written by the victors, the one correct answer to David's question is to be found here :-).

Here's what I wrote (put together from more than one post that I made to that topic):

My initial one-paragraph answer: "SOA is the science of analysing and managing dependencies between systems. That means the elimination of unnecessary dependencies and the formalisation of legitimate dependencies into readily understood contracts. The more dependencies there are between systems, the less agile an organisation is (because of the number of related systems that have to change when one of them has to change), the higher its operating costs (because of all the unnecessary extra work to coupled systems) and the higher its operational risk (because of the number of things that could break when something changes). Dependencies exist at all of the BAIT layers - Business, Applications, Information (Data) and Technology. That's why a technology-only view of SOA does not solve an organisation's real problems. SOA should have been called DOT instead ("Dependency-Oriented Thinking")."


After a few days of reading other responses and feeling dissatisfied, I posted again:


"Many of the comments here emphasise reuse as part of the *definition* of SOA. Is reuse a core feature of SOA or just a side-benefit? If the latter, what are SOA's defining features (which is what the original question was about)? Also, while we use the word "services" a lot, how do we define the term?

Let me try and address these two points.

SOA is an organising principle for the enterprise, and the fundamental skill that an architect requires to apply this organising principle is the ability to see dependencies between systems, - to be able to eliminate the ones that shouldn't exist and formalise the legitimate ones into "contracts" maintained in a single place and covering all the dependencies between two systems. This approach greatly reduces the cost of change, improves the speed with which changes are made (agility) and reduces the risk of making changes, all because the number of dependencies (aspects of an interaction affected by a change) are now smaller, one can tell at a glance what they are, and there are no surprises because there are no dependencies outside what is documented by the contract. This is not limited to technology interactions. One can apply this thinking to the design of business processes just as naturally.

When we look through a dependency lens at an organisation, our tasks are quite distinct at its four layers (Business, Applications, Information (Data) and Technology).

At the Business layer, it is more of a BPR (Business Process Re-engineering) exercise, because we end up rationalising processes when we weed out unnecessary dependencies. When we finish, we have a traceability matrix linking the following:

Vision (Our idea of Utopia)
Mission (Our strategy to bring about that Utopia)
Functions (The main groups of activities we need to be doing as part of that strategy)
Processes (The detailed and related sequences of steps comprising each function)
Process Steps (The basic building blocks of these processes)

[At the business layer, we will come across some *potential* reuse when we look at the definition of some of the Process Steps (operations) we arrive at. Only further analysis at the Information layer will tell us if reuse is actually possible or these are independent operations.]

The Application layer is all about grouping "related" operations, and the dependency principle used is that of "cohesion and coupling". In other words, we need to determine which process steps belong together and which do not. This cannot be done independently but must involve the Information (data) layer as well. [That's why architectural frameworks like TOGAF combine the two into a single step (Phase C)].

The Information layer looks at data dependencies (shared models) and classifies data into two groups - "data on the outside" and "data on the inside". "Data on the inside" is the set of internal domain models for operations that other operations do not need to see. "Data on the outside" is what goes "over the wire" between operations.

When we apply the dependency principle of cohesion and coupling to the combined Application and Information layers, we have two ways of grouping operations together. Operations that share a domain model ("data on the inside") coalesce into Applications that are called Products. Operations that share an interface data model ("data on the outside") coalesce into Applications that are called Services. So this is where Services fit into SOA - as a bundle of related operations sharing an interface data model.

The Technology layer deals with "implementation". As others have pointed out as well, implementation need not have anything to do with SOAP, ESBs, etc. We need distinct components to host implementations of exposed operations (Service Containers), to mediate interactions (Brokers) and to coordinate operations (Process Coordinators). Other components merely support these (Rules engines, registries, monitoring tools).

This is SOA :-)."


I would have posted more, but I exceeded the word count for the site, so I had to post my thoughts about the Technology layer separately:


"I must add that when viewed through a dependency lens, the Technology layer often introduces artificial dependencies of its own. There is a reason why many people prefer REST to SOAP. It's because WSDL is a dependency hotspot. Think about it. If a WSDL file describes just one service, and that service comprises 5 operations, each with an input document and an output document, then the version of the WSDL file has a dependency on the version of 10 document schemas. If any one of them changes, the WSDL will have to change! That's why we have so much version churn in organisations.

In addition, because we don't build explicit interface data models with type hierarchies, our operation interfaces are too rigid and low-level, requiring a fresh *version* whenever a new *variant* is to be supported.

A second major dependency introduced by the technology layer is through the ESB, or more correctly, through incorrect use of the ESB. The dependency principle at the Technology layer is to use the right tool for the job and to use it the right way. If we use the ESB to host business logic, we are making it perform the role of a Service Container. If we use the ESB to orchestrate a process, we are making it perform the role of a Process Coordinator. Both of these mistakes create dependencies that reduce performance and increase the cost of change. 

The other ESB-related mistake is its deployment in a hub-and-spokes architecture. Then the ESB becomes both a performance bottleneck and a single point of failure - both symptoms of a needless topological dependency that was created at the Technology layer. IT organisations often ask for funds to buy an ESB because they want to "do SOA", then implement it in a topology that creates dependencies and thereby violates SOA principles. What an irony! 

So one of the reasons why SOA has acquired a bad name is that its practice often introduces dependencies at the Technology layer even as it tries to reduce dependencies at the Business, Application and Information layers. Worse, because organisations are often too technology-focused, they don't do enough of the dependency-reduction at these higher layers and their net effect is to introduce new, technology-related dependencies to an existing set of business processes and data structures. The net effect of SOA on such organisations is then entirely negative.

I'm in the process of writing a white paper on "Dependency-Oriented Thinking" based on my experiences with SOA in large organisations. Stay tuned :-)."


Well, this represents my current thinking about SOA in a nutshell (a fairly large nutshell, I'll grant). The coming white paper on Dependency-Oriented Thinking will elaborate on these points. The workshops on "Dead Simple SOA" that I've been conducting through my company (Eigner Pty Ltd) along with my colleague Rahul Singh, address these very topics.

Monday, April 29, 2013

JEM (JSON with Embedded Metadata) - A Simpler Alternative to JSON Schema?


I've long been a supporter of the JSON Schema initiative, and I was also happy to see developments like Orderly, which offered a simpler and less verbose format than JSON Schema. But Orderly introduces its own format, which necessitates conversion to JSON Schema before it can be used. Both approaches are unsatisfactory in their own way. One is too verbose and the other needs translation.

All of this made me wonder if we aren't approaching the problem the wrong way. JSON Schema is a conscious effort to replicate in the JSON world the descriptive capability that XML Schema brings to XML. But is this the best way to go about it?

I would like descriptive metadata about documents to be capable of being embedded inside the document itself, rather like annotations in Java programs. Indeed, this metadata should be capable of forming a "scaffold" around the data that then allows the data itself to be stripped out, leaving behind a template or schema for other data instances.

So I'm proposing something that I think is a whole lot simpler. It does require one fundamental naming convention to be followed, and that is this:

Any attribute name that begins with an underscore is metadata. Everything else is data.

Let's take this simple JSON document:


We can embed metadata about this document in two different ways. Click diagram to expand.


I'm calling the first style "Metadata Markup", where the data elements of the JSON document retain their primacy, and the metadata around them is secondary and serves to add more detail to these data elements. One can readily see that "_value" is now just one of the possible attributes of an element, and many more such attributes can therefore be added at will.

I call the second style "Metadata Description", where the primary elements are metadata, and any data elements (whether keys or values) are modelled as the values of metadata elements. Note that describing a document as an array (a nested array in the general case) rather than as a dictionary (or nested dictionary) of elements allows the default order of the elements to be retained. This is quite useful when this format is used to publish data for human consumption.

The first style, Metadata Markup, is more suitable for document instances, because a lot of detailed meta-information can accompany a document and can be hidden or stripped out at will. It is easy for a recipient to distinguish data from metadata because of the leading underscore naming convention. There is no need to pre-negotiate a dictionary of metadata elements. (Click to expand.)



The second style, Metadata Description, is more suitable for schemas, because in this format, all elements pertaining to instance data (both keys and values) are just values. If only the values representing keys are retained, we get a "scaffold" structure describing the document, and more metadata elements representing constraints can be added, turning it into a schema definition. (Click to expand.)


Obviously, this system will not work for everyone. I'm sure there are JSON documents out there that have underscores for regular data (HAL?), so adoption of this convention won't be feasible in such domains. But if a significant subset of the JSON-using crowd finds value in this approach, they're more than welcome to adopt it.