Organizing around the Value Stream
This post was originally published on CIO.com on February 25th, 2020.
Flow. That feeling you get when working on something or doing an activity when time seems to fly by, and the work is happening. Everyone wants to achieve it. We achieve productivity not because it’s effortless, but because all our systems are working in concert with one another. But flow is not just for individuals. It’s for teams, and organizations, and is one of the core goals in the first way of DevOps, borrowing the idea from its roots in Lean.
In any system of work, we endeavor to optimize flow. This concept was popularized by many, including Dr. Eliyahu Goldratt in The Goal where he describes the Theory of Constraints. According to the theory, in any system, there will be a bottleneck that causes work to stack up before it and starve downstream work stations after it. The outcome of analyzing our work flow is to elevate and alleviate the constraints so that work can flow more quickly through the system. In our businesses we often refer to this flow of work as the value stream.
One of the best ways to visualize the value stream is called Value Stream Mapping, which is used heavily in manufacturing but can also be used in other processes like software delivery. Another way is a much more constant presence in software delivery: the Agile board. Whether we’re using Scrum or Kanban, the board allows us to see work as it flows from left to right on the board, almost in real time. It also allows us to see the bottlenecks. Why has that user story been sitting in the same “Waiting” column for 3 weeks? Why does that story say it’s currently assigned to someone who is on holiday in Colombia?
Agile boards allow us to visualize the bottlenecks, but if they are not implemented as part of a well thought out strategy for moving work through the system, the visualization of work can be the bottleneck itself!
Many of us have worked with large IT organizations before. Often if you want to get something done, the answer is to “file a ticket”. Want a new phone? File a ticket. Want a new email group? File a ticket? What access to production? File a ticket. What usually results is a rather lengthy cycle of filing tickets, being asked for clarification, checking the status of the ticket, often, more waiting, and finally resolution.
Tickets are essentially handoffs as the work is passed from human to human. In Lean thinking, handoffs, and thereby waiting, are considered waste. If we design our flow of work around handoffs in the ticket system, we are designing waste in, which will slow our processes down, and thus reduce the flow of work in the value stream.
Tickets, by design, interrupt our process. In software development, we have a certain class of errors that are called exceptions. An exception is when the software doesn’t know what to do, and it breaks deliberately because it needs to call in a human to resolve whatever the problem is. This could be a missing file, or wrong permissions, or something more ghastly. Our ticketing system is designed to create exceptions! Literally, exceptional behavior, or something that is not a normal part of the flow of the program (in software) or process (in the value stream). Once a ticket is created, the system stops and waits for a human to resolve whatever the problem happens to be.
Often the type of work we see in a ticketing system is repetitive and simple. In Site Reliability Engineering (SRE) we have a name for this type of work: Toil. In the Site Reliability Handbook, Vivek Rau defines toil as, “the kind of work… that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.” If the ticketing system is for an IT department, it may be that someone needs a new mouse, or phone. If it’s for a production operations department, it may be copying data from one place to another, or running a report against the data warehouse.
However, the common thread for all of these activities is that they are toil. This toil becomes a bottleneck in our systems and impedes the flow of the value stream. It is our job as leaders to reduce the amount of toil and free people up to do work that is not “devoid of enduring value”. At Facebook or Salesforce, if you need a new mouse or more memory, you use your badge to go get one from a vending machine. This principle of automating production operations work is core to the concept of the Amazon Web Services (AWS) API. In the old days, if you wanted a new server, you would file a ticket and wait. With AWS, you make an API call from your software or by clicking in the web console and a few minutes later, you are rewarded with a server!
Trying to minimize the amount of toil we have in our ticketing system does not mean that we don’t need to track our work. Any company that has adopted Agile methodologies, will manage their project work in a work tracking system like Jira. Unfortunately, many engineers dislike project tracking software even more than they dislike meetings!
In order to ensure the maximum use of the tracking system it’s best to keep it as simple as possible (but no simpler). This is fundamentally a User Experience problem. If people do not have a good experience when interacting with the system they will not use it as much. I’ve seen many overzealous project managers design elaborate workflows inside their system with many required fields that don’t mean anything to the engineers. If the workflows are overwrought and lack flexibility, the system will not get used. If there are too many fields and required checkboxes, the system will not get used.
Often in the service of trying to gather lots of good data for reporting to upper management, we end up with a system that collects far less quality data than it would if it were simpler. The goal is to make the work visible so that we can look for bottlenecks in our flow and to ensure the system is working as intended. If we have good data, we can figure out how to make good reports. Just like in machine learning, if we have bad data, or no data, our desired results will be of poor quality.
We need to organize the utility of the system around the Agile pigs, not chickens, in order to maximize the utility of the tool. We should not have engineers spending lots of extra time jumping through hoops for the sake of upper management, when they should be delivering value to the business (in their stream) by writing code.
Organize by Product not Project
Once we’ve done the work to continually eliminate toil from our ticketing system, and we’ve made sure that our work tracking system is not overly burdensome, how do we organize the work?
Dr. Mik Kersten in Product to Project discusses the differences between work being organized around products vs. projects. In the project model, people are assigned to work, and then when the work is complete, they are assigned to something else. In the product model, the work is brought to the people and there is an understanding that we will never be “finished” with a deliverable because that deliverable is a product itself.
For example, if we were to have a project around patching all the servers, we would assemble a team, they would divide up the work and go around patching all the servers, most likely with lots of associated toil. However, what happens the next time the servers need to be patched? Do we re-assemble the team and have them go around redoing everything just like they did the time before? And the next time? And the next time?
If instead, we were to make (or buy) a product around patching servers. We assign a team, who as part of their responsibilities are responsible for patching servers. The first one or two times, they might do the job the old project way. But by the 3rd or 4th or 10th time, they will probably have invested a lot of engineering time and effort into making the job of patching servers as quick and painless as possible. They would want to work to eliminate toil, and make their jobs easier, not perpetuate it.
This is a great example of how even Operations teams can deliver a product. In the case of the patching (or monitoring, or deployment) product, the customers of the product happen to be internal, unlike the usual customers being to whomever we sell our product.
For those products, it’s important that we organize the work so that everything it takes to specifically deliver that product is tracked in the same space. The work should not be spread across multiple different spaces in the same work tracking system, but in the same space. This is a lesson that we’ve learned from cross-functional teams: that the work flows fastest through the value stream when all the capabilities to deliver the product are on hand and we don’t have to wait for handoffs through a ticketing system.
Everyone involved in the delivery of a product should have access to the data about what is required to deliver the product. Whenever I work with clients where the Ops team says something like, “We did our part, that’s the Dev team’s responsibility.” I explain that there is only one product! There is not a Dev product and an Ops product, there is only one product. Our customers don’t care whether the Dev team fulfilled their responsibility. Engineering leadership doesn’t care whether the Ops team got something done. Both those customers care that the product is delivered to the customer with high quality and on-time. It is the responsibility of both teams to sit down and work together to make sure that those business objectives are achieved, which is, of course, DevOps.
For this reason, all the work needs to be tracked in the same space related to the product. This includes remediation work. I’ve seen many teams do post-mortems or learning reviews after an incident and then come up with a list of remediation items that should be completed to prevent the next outage. But those items are tracked separately, not as part of the regular work tracking system because they are “special”. The problem is, they are not special. Dr. Kersten teaches us that there are 4 types of work:
Dealing with risk is just one of what he called the “Flow Items”. To put it more simply, remediation work is just work! That means that it needs to be prioritized accordingly. If there is technical debt that needs to be paid down for the business, or a bug (defect) that is more important that the remediation (risk) task, then we should not claim that the risk is special and do that work first.
Traditional Project Management
Some may read this and wonder “Where does traditional project management fit in? They were always running all the projects.” The best thing is, project managers have a much more important role to play in this new model.
In this model, engineers are empowered to manage their own work. They do not need a project manager following them around asking for status reports, or typing up lists of what was accomplished, because that is all being done as part of the system. The engineers can type just fine. In this new way of working, the project managers can take the skills they are best at, which are often working with people and tracking progress, but use it to make sure that the work flows smoothly through the system.
In any sufficiently large system, one large enough to need project managers, there will always be dependencies between teams. This is where the project managers can shine. Instead of managing relationships within a team or teams, they get involved between teams.
Dominica DeGrandis has taught our industry about the time thieves. Those things that are often not seen, but which drag down the velocity of our value streams. The unknown dependencies thief is a big problem. Ms. DeGrandis explains “Dependencies (whether they be on architecture, on expertise, or on activities), increase the need for coordination.”
This is where our project management organization is very skilled. They are master coordinators and can surface those dependencies and allow the teams, products, and organizations to optimize for the fast flow of work through the value stream.
Project management has changed a lot with the advent of Agile and cloud. Dr. Kersten explains that the companies that are better at delivering software will thrive, while those who struggle will be left behind. One only needs to look at the tech giants’ abilities to enter new markets and products and displace established players to see this in evidence. By learning to optimize our value stream, both through the elimination of toil, but also by eliminating barriers to flow, we can enable our companies to compete from a position of strength in the marketplace, and continue to grow and thrive with our customers.