Can The Real Codeowners Please Stand Up? Code Provenance at Scale

September 08, 2021
Written by
Reviewed by

Code ownership at scale

Figuring out code ownership at a large company can be challenging. And identifying code owners during code related incidents is hard – with an element of stress to boot. The Product Security team at Twilio set out to solve our code ownership challenges in a way that we think can help you as well.

Today, we’re proud to release two things that go along way towards solving the problem:

  • about.yaml -  a new code ownership file specification that has all the information you need to trace any code’s current owner across your company
  • Gordon - a Github app service to monitor repositories for keeping about.yaml files up-to-date.

Why do we need this?

More times than we’d like to admit, we found ourselves in a situation where we find a bug or vulnerability in a piece of code, do a git blame to see who last touched that code, and find out that person no longer works at Twilio –  or is on PTO. Then, the adventure starts: pick the next name in the git blame timeline and go down the rabbit hole to find the right owner to work on a fix.

That’s a lot of time wasted in a state of emergency, isn’t it? Every Security team out there had to go through this situation at some point.

Now imagine a galaxy far, far away (spoiler alert: not far at all, actually) where the code ownership information and all your required metadata (e.g., owning team, Jira project, PagerDuty information) for every piece of code lives within that codebase and is machine parsable.

The Product Security team at Twilio set out to see if they could make that a reality. And thus – the about.yaml and Gordon initiative.

about.yaml: The YAML file that knows it all

“about.yaml” is the file specification we came up with to solve our difficulties finding code owners. It’s designed to be included in all repositories company-wide and have all the information we need to track ownership.  

But why that name? The file is essentially talking “about” the codebase and its ownership, hence our choice of “about.yaml” (yes – our humor is quirky!). The YAML specification is extendable and can be modified to fit the needs of your company. We at Twilio use multiple specifications to scale this file to our codebases while leaving it adaptable for when new teams and companies join us.

One about.yaml specification we think would serve as a great example of the power of the paradigm – and also can be widely adopted – is below:

version: 1
organization: twilio
jira_id: <jira project id>
pagerduty_id: <pagerduty schedule id>

This specification has the following fields:

  • Version - this can be used for the versioning of the file in case you ever decide you want to change the YAML specification and do not want to break any automation tied to specific file formats.
  • Organization - this field helps identify which organization this file is coming from during automation efforts – useful for if a team joins your company, perhaps through an acquisition. For us, this might allow Twilio and any child organizations to maintain slightly different formats when needed.
  • Jira_id - If you use Jira then you’d add the Jira project ID of the team owning the codebase.
  • Pagerduty_id - Pagerduty schedule ID of the team owning the codebase which can be used to page the team when there’s an incident on a particular codebase.

If you introduce an about.yaml file like the above, you have a specification that gives you ownership information on repositories. 

But – there are still a few important questions. How do you get everyone to add this to their repositories? And even if you can convince people to do so – how do you know the data in these files is actually valid, and not gibberish?

As a Security team of developers favoring an automated solution as a way out of our problems, we introduced Gordon* – an automated service to validate the contents of about.yaml files – company-wide.

* Yes, we are superhero fans, and name our tools after them… that's the closest we come to having super powers ourselves. Although James Gordon is the hero no one recognizes, Batman cannot do cool superhero things without Gordon. So... Gordon is pretty cool.

Gordon: an automated service to help determine code ownership

Gordon is a Github app that you can install on your Github organization. Gordon runs as a status check on every commit of a pull request to get the about.yaml file from the default branch and commit reference branch, and validate each piece of information mentioned in it. If it can validate all the data mentioned in the about.yaml file, it passes the status check, otherwise it fails and users see a cross mark on their pull requests.

Here’s a pull request with a successful Gordon app status check due to passing validation of the contents of the about.yaml file:

PR passing Gordon ownership checks

And here’s a pull request with a failed Gordon app status check due to failed validation of the contents of the about.yaml file:

PR failing Gordon ownership checks

Why we needed Gordon

In 2017 GitHub released CODEOWNERS to help solve this problem. One limitation with CODEOWNERS is that it is tied to users – and they might be out of office or no longer with the company. We also did not find a place for metadata such as jira project id to automate our code security vulnerability management.

We wanted a deploy-and-forget solution that constantly monitored the validity of the contents of the about.yaml file and was reliabled. It had to be a service we never needed to touch except when we decided to add a new version of the YAML specification for the organization, or if we decided to propagate this file to a new acquisition.

Dealing with shared ownership

One of the main challenges we faced while enforcing the Gordon status check across all of Twilio’s code bases was “shared code”.

We had (and have!) repositories which had code developed in collaboration with a lot of people and teams across the company. While collaboration is good, it is an issue when you need to quickly determine ownership in an emergency. We have been working to find a single owning team for every piece of code in the company, but as a workaround, we use shared_repos.json files where we list all of the “shared repos” on which Gordon checks are ignored.

The benefit of this approach is that you can roll out Gordon without having figured out how to deal with your own shared repositories – and it also gives you an inventory of shared repos you need to eventually handle.

Try using Gordon and about.yaml

For the past two years, Gordon has proven to be very helpful in promoting the adoption of about.yaml files on repositories and has helped us determine code ownership across the organization. We’re excited to release it to the open source community – and can’t wait to hear about how you use Gordon in your organization.

To learn more about how to deploy Gordon and start using about.yaml files, see:

Laxman Eppalagudem is a Senior Product Security Engineer at Twilio focused on securing Twilio’s products before they go out to customers. He can be reached at seppalagudem [at]