Building CLI features often boils down to:
- Release new API endpoint(s).
- Build new CLI actions that need the API changes.
- Realize mistakes were made in the API you just released.
- Try to fix the API for this feature without creating more problems in the future.
- Try to remember what you were trying to achieve in the CLI and then actually achieve it.
- GOTO: realize mistakes were made.
Each step requires the one before and oops we've reinvented waterfall project management. You get worn down by the pain trying to gracefully pave over mistakes until you limp to functional, but bow out well before exceptional. And don’t get me started on maintaining the resulting cluster of ad-hoc “fixes” and warts.
Been there, done that. We knew we needed to move past the waterfall-ish approach.
Here is the story of how we got there and some of the tools that helped along the way.
Off to a Sketchy Start
We wanted cheap, fast iteration until we understood the feature, and only then commit to expensive implementation and long-term support. As a small team, I was often doing this process end to end, and wanted to focus on each part in turn. We wanted to fake implementation parts until we felt confident enough to make it.
Getting back to the process, it starts with proposing features. We wanted to get out of the abstract, but not if it meant half-baked implementation. We faked it with "sketches", inspired by the Google Docs CLI sketch approach that Github describes here.
Unfortunately, static sketches didn't quite give us the feedback we wanted. Our CLI changes output over time, more like animation than a drawing. To achieve higher fidelity, I wrote little Ruby programs to take basic inputs and respond by printing appropriate canned responses.
Since then we've found an even better way to capture animated CLI output, but to explain that requires a little detour.
Do You Even Test?
As we started to flesh out our CLI, we also wanted to test edge cases and detect regressions. I surveyed public cobra/bubbletea based CLIs to look for ideas, and found frustratingly few tests. Then we stumbled upon Charm's teatest which gave us a starting point.
Teatest focuses on golden tests, capturing a known good output and then asserting that future outputs continue to match it. Which brought us back, once again, to high fidelity capture of animated CLI outputs. Teatest gave us the great idea of a frame-based solution, like a flipbook, which we have built upon:
This simplified example shows what a golden output might look like for a basic authorization command. The horizontal lines delineate frames, with labels indicating the active model. Taken together we get a high fidelity capture of output even as lines are added, removed, or replaced.
We use a flag in our test suite to update files with the golden outputs, and otherwise the tests fail if the output doesn't match the files. This keeps us aware of output changes and facilitates PR reviews by allowing us to understand what the output should look like and if it has changed. We like it so much that we plan to replace our sketch programs with golden style output in Github style google docs so we can capture both animation and style ideas.
With our once and future sketches in hand, let's return to getting started with new API endpoints.
Designing API and CLI Together
We work on the API and CLI simultaneously, because the best designs for these grow out of tight integration. We are able to do this, while still avoiding the perils of waterfall, by iterating on designs in cheaper contexts and waiting to implement until the requirements solidify. For our APIs, this means sketching with OpenAPI:
This simplified example shows what a schema might look like for a basic authorization command. We use the spectral linter to simplify working on these files.
With a sketch in hand, we then use prism as a mock API server while we implement the CLI. When we inevitably realize mistakes were made, we can just tweak the spec and get back to our CLI iteration. Working at this high level allows us to evolve the API and CLI together and defer costly implementation until we have better knowledge.
Implementing APIs
We also lean into our OpenAPI spec to keep us honest during implementation using committee. assert_schema_conform tests alignment and the middleware notifies us of any live discrepancies. These combine to allow for red green implementation while protecting us from regressions.
Testing with Mocks and Proxies
To round things out, our test suite uses flags to run prism in either mock or proxy mode. By using flags we can focus on writing just one kind of test, though it does mean we skip some tests in one mode or the other. We use mock tests for their speed and on Windows and macOS where our full stack doesn't run in CI. Our proxy tests let us run tests against our entire stack just by adding a flag, making it easy to run end to end testing whenever we deem it necessary.
Pulling it All Together
Sketches and specs help us iterate past the abstract without having to get bogged down in implementation. Then mocks and proxies help us ensure the implementations match the sketches. By continuing to iterate our process, each feature causes less pain, which we have deeply appreciated while building the teams experience we will deliver later this month.
We will keep iterating on our process, I hope you learned something from it and I would love to learn from you. What have you tried, where are you proud and what remains frustrating?