Welcome Apache Otava (incubating project)

Over the past 9 months we’ve worked to the open source project that Nyrkiö is based on, accepted into the Apache Software Foundation’s incubator. As I’m writing this, the project known as Apache Otava (incubating), is submitted to the customary 72 hour voting process whether our first release candidate will be accepted as an official Apache Software Foundation release.

It is a proud moment for me personally to have started an open source project that (hopefully) becomes accepted into the largest and one of the oldest non-profit project governance organizations.¹ In practical terms Apache Otava is also now the “upstream” focus point where the loose community of contributors can merge their improvements.

Also happening this week is the International Conference on Performance Engineering, where the use of Change Point Detection for Continuous Performance Engineering was first published in 2020. We are hosting an Otava/Hunter/ChangePointDetection reunion meetup on Friday 8 AM Toronto time, which is 14:00 Central Europe Time. Otava/Hunter contributors and users are welcome to join remotely or of course also in Toronto.

(Google meet: meet.google.com/emh-xfrq-ust
Google Calendar invite.)

I’m also doing a presentation at 12:20 ET, loosely similar to the white paper below. (Conference Program.)

8 Years of Optimizing Apache Otava: How disconnected open source developers took an algorithm from n^3 to constant time

The first Otava release already includes all improvements myself and Matt developed for Nyrkiö during last year. This includes a rather significant performance optimization I did last October that reduces the computation complexity to constant time O(1) in the common case of appending new results to an existing history. We are today publishing a paper that chronicles both this and other significant performance improvement done by 5 different contributors over the past 8 years. Converting that theoretical achievement to the field of practical benchmarks and milliseconds, we find that Otava today is between 18,000 to 300,000 times faster than the first version was in 2017. (Result varies depending on input data.)

8 Years of Optimizing Apache Otava: How disconnected open source developers took an algorithm from n^3 to constant time Download

The people that each played a part

In addition to bringing together different forks of the code base, the different people who contributed over the years also literally came together in a Slack channel when we drafted the proposal for the Apache incubator. It had the feeling of a class reunion.

The thing is, I never thought of this project as some remarkable innovation that one day would be my full time job. It was a piece of automation we needed in a much larger puzzle. It’s a sequence of subtractions and multiplications, freshman year university calculus. We had an intern do it in a couple of days…

It wasn’t before I was assigned to write the “Related work” section of our first ICPE paper that I realized that nobody had done this before. Google had a patent on a similar system based on moving averages. I read it, stared at it with my mouth open and realized that a) it would only trigger with a delay b) even then it would not find the exact commit, and c) it probably couldn’t handle two regressions close to each other.

The MongoDB Performance team

The code originally just unpoetically known as “signal processing algorithms” wasn’t even on the backlog for our interns in 2017. It was one of them – William Brown – who volunteered to implement the e-divisive algorithm in his last two weeks at MongoDB. For symbolical extra credit. He had found the ticket somewhere down in our backlog. If he hadn’t done it, I’m sure we would have gotten to it eventually.

William also implemented the first optimization that improved the computational complexity from O(n³) to O(n²). (Described in our paper above.) This introduced a funny off-by-one bug that would confuse the team for the rest of the year. The running sum would accumulate one variable too much, too much weight if you will, towards the right hand side. This caused the selected change points to sometimes be 1 or even 2 points off to the left, and caused much discussion in the team, what exactly a change point detection algorithm is supposed to find? The first commit after the regression, or the last commit before the regression, or is the change actually in the gap between two commits? In the end this is the reason I came to know the algorithm so well. To show everyone there was a bug I finally had to read the Matteson & James (2013) paper, and compute the algorithm with pencil and paper. (With the bug fixed, the answer to the philosophical question is the commit after the regression.)

William went back to University of Pennsylvania to finish a Masters degree, then did another Masters degree and finally a PhD last year. Today he works at Morgan Stanley as an AI researcher.

When I joined the MongoDB performance team in 2015, I was the first European allowed on any of the server teams. I assume there literally has been a meeting where they reluctantly must have decided that as much as they didn’t believe in remote work, they’d rather keep me around than fire me. “We can put him on the perf team… what do we have to lose?” …they said. Soon after I was in Dublin to onboard a second European, Jim O’Leary, to the team. Jim was to join a program where support engineers could spend 1 day a week with an R&D team. By lunch time I went back to his boss Ger Hartnett to say I was confused. The program was for 1 day a week, but I got the impression Jim wanted to join R&D full time. “Yes, and please don’t tell him the program is only 1 day per week. Jim is a talented C++ developer who has been in the Dublin support team for 5 years. If he can’t move to R&D he will either resign or kill himself. But those Americans won’t let Europeans do R&D, so this 1 day program is my only chance.”

“Say no more and leave it to me.” I winked. After several months the program manager realized what we had done and moved Jim permanently to the R&D head count. (Today there’s a huge optimizer team in Dublin and Europe, led by Bernard Gorman, who joined MongoDB engineering via the 1 day program together with Jim. Ger himself eventually moved to R&D too.)

It took 4 hours to walk Jim through the stack of 5+ different scripts and tools we used to run benchmarks at the time. The amount of joyful Irish swearing I heard during those 4 hours are a dear memory to me. At one point you needed to download and build an in-house go tool, because one team member had wanted to learn go at that point of development.

Jim took ownership of the python code developed by William. He implemented it in numpy and native C code and some alternative Python compiler. The resulting improvements are all in the paper.

And yes, he was a good C++ and Python developer. 5 years after leaving MongoDB, I still use some of the performance tools open sourced by the team back then. I can still recognize when code was written by Jim. If I’m not busy I take a moment just to silently read it and admire its elegance.

David Daly was the one person who predated me on the MongoDB performance team, and stayed there many years after I had left. He was the only person with a broad enough skillset that he had been one of the first hires on the team of system specialists that were going to do manual benchmarking and performance analyses, yet survive the transformation to a team of programmers writing tools and automation for something that we today call Continuous Performance Engineering or Continuous Benchmarking. We had a lot in common, and at one time the performance team was just the two of us, and also for a couple of years David was my direct manager. David had a PhD from his time at IBM Research, where he had invented something related to cpu caching. Any time a code review descended into the customary bike shedding, he would step back, saying in a somewhat fatherly old man voice: “Well, I’m really good at cache invalidation, so I’ll let you (kids) figure out how to name these things.” I immediately latched onto the joke, asserting that I’m good at naming things, and would happily be in charge of that. With this division of labor settled, we went on to work many years together, knowing that there wasn’t a computer science problem in the world that would be too hard for the two of us to solve.

With my university education having essentially been sponsored with Nokia tax euros, and David doing transistors at IBM for a living, one thing we had in common was that we looked at the performance dashboard in the MongoDB CI, blinking red and green (and often also purple) and 99% of those red squares were false alerts… as if it was Christmas tree misplaced in a family celebrating Hannukkah… We both agreed that this was a problem that needed a signal processing solution. The difference between us is that David is the person who would then go “yippiee, literature review!” and enjoy reading all the recent academic articles relevant to our topic. (If you remember, I would only read the one chosen paper much later, to fix the off by one error. And I didn’t enjoy it…) I recently asked him how he actually chose the Matteson and James paper as his first choice. After all, it was the perfect choice and we all owe it to David. Apparently – this is his answer – so apparently once you understand the difference between anomaly detection and change detection, and then rake out all the papers that are focused on dealing with trends and seasonality, which we didn’t have, then the choice was obvious.

David also, of course, was the one that got us, and later all of MongoDB engineering, to write papers and submit them to academic conferences. We both believed and often said that “writing is powerful”. Followed by “whenever you do write something, then including at least one graph is mandatory”.

Our first paper was presented, remotely, because of Covid, at the International Conference of Performance engineering. One person that we also should mention at this point is Alexander Podelko. It is an open secret that the most enthusiastic anonymous reviewer is always him, and in his role as chairing this or that workshop at these conferences we would get more airtime than just the main publications in the main track. It was Alexander who first saw and said that this is amazing, the world needs to hear about this. And they did. It was this paper that would be read by Netflix and Datastax engineers among others, where the story continues.

There were many others on the MongoDB performance team that worked on other tools and therefore aren’t mentioned here. A highlight was to work a few months together with my long time friend Mark Callaghan. With his kind permission I added“Mentored Mark Callaghan on how to benchmark a database” to my resume when our ways parted. Mark has been one of the first advocates of change point detection ever since he saw the system at use inside MongoDB.

VP of Engineering Dan Passette would periodically complain that our benchmarks (and salaries, I guess) cost millions,but what does he get out of it? But he never gave up on us, his expensive performance team, maybe because Eliot (Horowitz, Founder and CTO) wouldn’t have let him.

People at Datastax

When I interviewed for Datastax, I at first didn’t understand why everyone was so excited to have me. It was like a rock star visiting a small village… It turns out a reading group had just read the paper we had published and they couldn’t believe their good luck that one of the authors was considering to join Datastax and could help with the project. For context, whereas MongoDB was all about user experience and performance was a reluctant obligation, Datastax and Cassandra was the exact opposite. Everyone cared about performance. (And possibly nobody about user experience…)

So one of the teams I ended up managing was the performance engineering team… One young engineer on my team had a data science background, Alexander Sorokoumov. In our first one-on-one meeting he was eager to share a paper he had recently read, about using change point detection to catch performance regressions. “Ah, I know this paper too”, I said smiling. Missing the hint, he went on to enthusiastically describe the idea to me. As much as I enjoyed watching someone so excited about a statistical algorithm, in an effort to not waste time I interrupted him: “Yes, I’m a co-author of the paper”.

“Yes but I want to explain how the algorithm works”, Alex continued.

“Look, I wrote that very section of the paper”, I insisted.

“Oh…”, there was finally a moment of silence. “Well, can we do it?”, Alex summarized his pitch in one sentence. “Yes, we’re gonna do it”.

Ultimately Alex was not the one assigned to work on the tool that came to be known as Hunter. He later got a job at Confluent where he went on to build a Continuous Performance Engineering pipeline for Kafka. In the past few weeks it is Alex who has worked hard to publish the first release of this code under its new name Apache Otava. The energy is still there and carries the project forward.

The same day I started at Datastax, a certain Piotr Kolaczkowski submitted his resignation. He was one of the earliest engineers hired to Datastax and especially for the European team something of a father figure. So together with EMEA HR we put in some effort to see if we could change his mind. In the end, once certain other hopes and dreams were addressed, his grievances really could be summarized as “after 20 years, Piotr needs to do something that doesn’t involve Java”. Which of course is a totally reasonable position, if you ask me. Piotr quickly found a new energy and would spend the next year extending the python library open sourced by MongoDB to a full featured command line tool that could read CSV, Prometheus and other data sources.

Another hidden gem on my team was Shaunak Das. A humble servant of the corporation, he had found a test engineer position on the performance team. One morning I woke up to a slack thread where Shaunak had responded to some simple question with a 5 page PDF that very well could have been a conference paper. It was only then that I took a moment to look at his Linkedin profile. Turns out this junior test engineer actually had a PhD in math. I made some immediate changes to his job description and compensation. Shaunak became Piotr’s math advisor for the next year, and the two PhD’s had a lot of fun together. Their biggest improvement to the change point detection code was to replace the Monte Carlo simulation with a regular Student’s T-test, which was both much faster and qualitatively more accurate, not the least because it was deterministic.

Ishita Kumar was a Summer intern at Datastax. Like William before her, she also had a math and statistics focus and would go back to Massachusets Amherst to pursue a Masters degree. Under the tutelage of Sean McCarthy and Pushkala Pattabirham her project was to evaluate Hunter against other algorithms found in the literature.

When I was editing what was to become the Datastax’ team’ ICPE publications on change point detection, I initially didn’t include Ishita’s section at all! The reason was that the math was way above my head! I couldn’t understand any of it and in any case it seemed substantive enough that it would have deserved its own paper with her as the primary author. (While it was an excuse, the latter is actually true!) Faced with complaints from Sean and the wrath of Pushkala, I spent another day trying to understand Ishita’s notes. Turns out she had done something none of us had done before, nor have I ever seen it done in any of the “Related work” I reviewed over the years. Ishita was the first one to benchmark the accuracy of e-divisive and other algorithms in an objective setting. Until then, we would simply look at the change points generated and more or less assess whether we liked them or not. (Even if you call it a “labeled” test data set, it is still a subjective judgement by a human doing the labeling.) Ishita got a permission slip from Massachusets Amherst to join Piotr in presenting our work at the Portugal incarnation of ICPE.

The bulk of that article was written by Matt Fleming, who is listed as the primary author, while Piotr is the one who “contributed equally to the work” – for some reason this academic footnote continues to irritate me. The ACM guidelines didn’t permit “Fleming & Kozlakowski” as the short form attribution of an article.

To repeat the refrain that writing is powerful, Matt is yet another friend I came to know because he read the original paper we wrote with the MongoDB team. He emailed us with some questions about the paper, and as David as the corresponding author was on vacation, I answered the email. Soon after Matt sent another email, saying he was interested to switch from kernel development to databases, and wondered whether there were any open positions on the MongoDB performance team. I told him there might very well be, because I had just resigned to go to Datastax. Matt eventually joined me at Datastax and we took turns managing the Cassandra developers there.

One day, in reaction to me explaining something about Datastax strategy and investors and something, Matt asked whether I had ever thought I could be an excellent CEO for an open source startup. My first reply was that I had always thought so, and dreamt of it, but after being diagnosed with Parkinsons disease, I had given up on that dream ever happening. The conversation ended there, but luckily a few weeks later I went back to him to ask why on earth he would ask such a random question. Turns out Matt’s dream was to be the CTO of an open source startup. At the beginning of 2024 both of our dreams became reality as we co-founded Nyrkiö.

The pull request titled “Upstreaming Nyrkiö 2024 patches” contains improvements from Matt too. You might even say “both authors contributed equally to this work”.

¹⁾ For the uninitiated, the apache.org website is in the top 100 of websites in the world.