16 Dec 2014

RIP Cougar?

Open source Cougar is dead. Or at least appears to be.

I think this is a great shame. For me. For Betfair.

I haven’t spoken about Cougar yet. I had planned to, in those 12 posts I was going to write this year. Instead I spent most of the year writing features for Cougar, trying to round off it’s functionality before promoting it more widely. This might have been a mistake. Maybe it wouldn’t have mattered whatever I had done.

So let me tell you about Cougar. Someone once asked me for an elevator pitch, I didn’t even have to think about it: “It makes writing services easier”. Simple. A little more? It abstracts away concerns of transports, monitoring, configuration, security and frees up the service developer to worry about just their logic. Call it a service container if you will (or more accurately a service interface container). It provides consistency across your estate, ensuring that all services can talk to each other, and all they have to do is define their interface(s).

Cougar is about 5 years old, maybe a little more. It was born at the beginning of a massive internal re-architecture at Betfair, moving from a single monolithic J2EE application to a service oriented architecture. We rewrote the core around 4 years ago to be entirely async (previously it had an async transport but synchronous core), expanded it’s capabilities beyond RPC to support blind event subscriptions and replicated objects, improved the security model, added a binary transport and a whole bunch of other stuff. Read more on it’s incomplete site.

Then I took voluntary redundancy from Betfair, but during my notice period, got permission from a forward-thinking CTO to open source Cougar. It was a pretty herculean effort (even if I say so myself) to clean up the code, split the bits remaining internal to those going out, getting full security review and pen testing, migrating the various projects to Github, moving CI to Travis etc etc. But I did it. Cougar left the building before I did, I left a team managing Cougar, but still maintained commit rights. I was a happy bunny.

So what went wrong?

In-sourcing. Cougar was handed over to a team in a satellite development office. I don’t know which team, I don’t even have their names, I do know which office. I found out that it had moved from some ex-colleagues. And then the silence started. I haven’t received a shred of communication from this team, either publicly or privately. I continued working on Cougar off and on, a few interactions with teams in other locations led me to believe that Cougar was alive and well. Until I received a comment on one of the issues, which alluded to Cougar development continuing from the point before we started open-sourcing. Why would anyone do this? At least take advantage of the bug fixes (including security), maintainability improvements and desirable features from the open source version.

To cut a long story short, this unilateral decision is to remain in place. I don’t know why. I’m not party to that information, it’s private to Betfair. I don’t know the reasons for ignoring the open source version. That also is private. I only have one piece of information, a name. Yes, I know I said I didn’t have any names, I didn’t at first, and I’m not sure this guy is in the team, when I left the company he was a Programme Manager (read Senior Project Manager if you’ve never heard of that) - surely someone in that role wouldn’t be taking these kinds of decisions, they’re facilitators right? Well, I have a name, so I’ve sent him an email. Yes, probably not the wisest thing, but anyone who knows me knows I’m a bit passive aggressive (ok, maybe a lot), and hey, it can’t do any harm right? (Unless I get myself sued, so I’ll try not to do that). People who do know me will be impressed at the restraint. Here it is:

Hi __,

I just wanted to write to say thank you for taking the time to keep me in the loop regarding your plans for Cougar. Obviously I never expected to have a say in the decision, but it was good to be able to verify the assumptions being made and to see you trying your best to ensure that Betfair’s investment in open source Cougar wasn’t squandered.

Most of all though, I’d like to thank you for the professional courtesy in ensuring I didn’t waste a year of my time and the accompanying emotional investment contributing to a dead branch.

Kind regards, Simon

I haven’t had a response yet. I don’t know if I will. Even if I do I probably won’t publish the response, if any. Seems like a surefire way to get sued.

So what now?

Well, I’m mid-way through some reasonably large changes for 3.2.0, so at least to satisfy my own desire to see things tied up nicely I intend to complete this release. Albeit it might get cut down in scope now. I might even send the release notification to Mr _____ at Betfair ;)

As for the future, I haven’t really decided. If I continue with Cougar I’ll have to change the name. I quite like Disco, which incidentally was Cougar’s original name (hence the DSC codes for exceptions) (ok, it was actually Data Services Container first, but it didn’t take long for someone to insert the vowels) - which would be quite cool as I can change the core error message “Panic in the Cougar” to “Fire in the Disco” :D

I’ve been meaning to implement Communicating Extended Finite State Machines for some time. It seems reasonable to use Cougar as a base, I might integrate a Raft library to provide consensus (I once integrated a Totem implementation for a similar purpose, but it never made it to the mainline codebase). But we’ll see. I suspect I’ll never do all that doco I had planned to rewrite now, and I’ll probably never publicise Cougar like I planned, but let’s see if we can at least get some use out of all that work.

At least I can finally ditch SOAP!

View comments
04 Dec 2014

Unilateral decisions

Unilateral decisions are always a bad idea.

I’m pretty certain that’s true. At least, it always has been in my experience. Yes, even when I made the decision.

The problem is this: Pretty much every decision you make affects other people in some way.

In the case of small decisions, these effects are often quite small. If I decide to read in bed 5 minutes longer, then there’s a chance my wife will be unimpressed if the light wakes her up, but by morning she will probably have forgotten, so no great shakes.

Inevitably, decisions we make in our professional life tend to be rather further reaching. Even just writing functionally correct code in different ways can affect other people for some time to come - be it in terms of maintainability, performance, extensibility, … the list is near endless.

When it comes to big decisions, ones that clearly are going to affect a lot of people, for example choosing the direction to take a widely used library, mandating a particular toolset or technology, even mandating dress codes, deciding who’s going to be invited to a party - these are decisions where not involving others is likely going to hurt, sooner or later.

The key is knowledge, after all knowledge is power right? Getting input from others gives you more knowledge, be that the knowledge that they’re in agreement with you, that maybe you could make a better decision, or just the realisation that they’re all morons and you should just do it anyway.

Just one thing - don’t ask a group of ‘yes-men’ (or women). Then you may as well not have asked anyone.

View comments
22 Jan 2014

New look blog

Historical note: This post written under the previous blog name ‘Diary of a Distributed Virgin’

New job, new year, new year’s resolution: I’m going to blog more.

I really didn’t like the look of my blog hosted on blogspot, and with most of my online time spent on GitHub, I’ve decided to move the blog to hosting with GitHub pages. It let’s me choose the styling, let’s me write in Markdown, which I’ve really come to like, and consolidates where I spend my time.

So, new look, new hosting, new… name?

Yes, well, I don’t really consider myself much of a “Distributed Virgin” anymore and my interests are much wider ranging now (more on that in a seperate post), so I suspect I’ll end up changing the name. The old posts will need some tweaking in order to make the context in which they were written clear, but that’s easy now.

My target for this year: 12 posts. That’s just 1 a month. Surely that’s achievable?

View comments
12 Jun 2013

I should have known...

Historical note: This post written under the previous blog name ‘Diary of a Distributed Virgin’

So, it’s been a while since I posted anything on my blog, but I feel like having another go. I found a couple of old posts I’d been working on before I apparently gave up, so I thought I’d pop them up since I put the effort into writing them in the first place…

I bought myself a book recently, Pattern-Oriented Software Architecture Volume 4: A Pattern Language for Distributed Computing, and I should have known better than to think I could find a single book covering DC patterns.

Part of my problem is the definition of distributed computing. I know you can implement a distributed system by hiding the details that a service is remote from the calling code, or by using some middleware solution, but that’s just taking the old J2EE approach - hide the details, use a framework, take the hassle away from the developers.

But I’ve done that, and whilst it basically works for building basic enterprise systems, you can never entirely hide the details, you just try to ignore them without worrying about the consequences. So if you can’t ignore the details, then surely you have to embrace them. So for me, distributed systems is learning about the hard and big stuff, things like leader election, gossip protocols, CAP, BASE, byzantine failure. But not only what these concepts mean, I also want to know how to implement them, what the tradeoffs are, what’s best to use in what scenario, what I have to consider.

And so, back to my default mode of operation, I want a book that will describe it. And whilst I know in my heart that I’m probably not going to find one, I’m still going to buy books until I find it. Hence the purchase of the aforementioned book. Lo and behold - it’s not what I’m after.

To it’s credit, it contains many patterns that will appeal to the kind of developer I used to be. It talks about MOM and pub-sub, about RPC and hiding the fact that RPC is actually happening. There was one chapter that held patterns that were promising to me, entitled “Component Partitioning”, covering situations where objects need to be distributed. But then I read the text:

Divide the objects into multiple ‘half objects,’ one for each address space in which they is used. Each half object implements the functionality and data required by the clients that reside in ‘its’ address space. A protocol between the half objects helps to coordinate their activities and keep their state consistent.

Now aside from the grammatical errors (theirs not mine), there’s a worrying statement around consistency. From reading about CAP, I know that you sometimes need to trade off consistency, but this pattern assumes that your state is always consistent, this is reinforced later:

The greater the need for distributed computation, and the more data that needs to be exchanged via the protocol, the less beneficial a Half-Object plus Protocol design becomes. As a general rule of thumb, duplication of internal state should be reduced to minimize the need for data exchange and synchronization via the protocol.

But:

The concrete design of the protocol betweeb the half objects depends on what particular coordination they need. Simple data exchange protocols can be based on an Observer design to avoid unnecessary updates and coordination activities. Actions that the half objects in the arrangement can invoke on one another can be encapsulated into Commands or Messages, to keep the protocol independent of a specific action set.

Reading this with my new awareness makes me think the book is hinting at the ability to trade consistency for other benefits, but it’s not explicit. Which is enough to make me think this book is no good for me, not that I can’t learn some useful patterns, but that it doesn’t spell out the tradeoffs they’re offering me. It also makes me wonder whether this book is good for the old me - should developers be allowed to continue living in the Matrix, shouldn’t they be forced to swallow the red pill?

One other example before I leave it for today… The next pattern is called “Replicated Component Group”. It is described starting with the problem:

Some components in a system must meet high availability and fault tolerance requirements, in particular if they execute or coordinate central activities, such as a directory service in a telecommunications system. Brute force solutions to this problem, however, such as complete hot or cold stand-by system replication, are often too expensive for many applications due to their high total cost of ownership.

Ok, good start, we continue with the solution:

Provide a group of component implementations instead of a single implementation, and replicate these implementations across different network nodes. Forward client requests on the component interface to all implementation instances, and wait until one of the instances returns a result.

Wait, this sounds like a paper I glanced through recently. Unfortunately that’s where the book stops. The paper goes into somewhat more detail, unearthing an important piece of information with regard to this pattern. The pattern is assuming fail-stop failures - it’s invalid if the instances can suffer Byzantine failures. A quick trawl of the index finds no reference to Byzantine failures, which is worrying, given that they are real.

Finally, perhaps the book covers the kinds of things I’m interested in, let’s try the index… Alas, no entries for consistency, availability or network partitioning. I’ll be sending the book back to Amazon tomorrow…

View comments

Older Posts

Poker, FSMs & Grid Computing 23 Aug 2009 Comments
The journey so far... 19 Aug 2009 Comments
Where’s the manual? 19 Aug 2009 Comments