"The user clicked something and it broke"
Customer support filed bug reports that read like this: "The user did something and it stopped working." Sometimes there was a screenshot. Sometimes just a sentence. The developer receiving that ticket had almost nothing to work with.
The problem wasn't that CS was lazy. They described what they saw. But what they saw was the end state -- an error message, a blank screen, a spinner that never stopped. They couldn't describe the path that led there because they weren't the one walking it. The user was.
Without the user's actual journey through the app -- what they clicked, what pages they visited, what state changed along the way -- debugging was guesswork. You'd open the page, click around, try to recreate conditions you didn't fully understand, and hope you stumbled into the same failure. Most of the time you didn't.
I built a user journey tracker on top of Sentry. It records the user's navigation path, their clicks, and key state transitions as they move through the app. When something breaks, that journey is already captured. CS doesn't need to reconstruct it from memory. They attach the journey trace to the bug report. The developer opens it and sees exactly what happened -- page by page, action by action, right up to the moment it failed.
Bug reports went from "it broke" to "here's exactly how it broke." Developers stopped guessing. CS stopped getting asked follow-up questions they couldn't answer.
"QA can't work because the cache is lying"
QAs on lower environments would run a test, see stale data, and fail the test. Then they'd realize the data was actually correct in the database -- the cache was just serving old values. The fix was simple: clear the cache. The problem was who could do it.
Only developers had SSH access to the servers. So the QA would message a developer, explain the situation, wait for them to connect, run a command, and confirm it was done. Sometimes the developer was in a meeting. Sometimes they were heads-down on something else. Either way, a five-second operation turned into a thirty-minute interruption for two people.
This is a broken workflow. Not because anyone was doing anything wrong, but because a mechanical task -- flushing a cache key -- was gated behind access that only developers had. A QA should never need to wait for a developer to do something that requires zero judgment.
I built a cache management dashboard. It shows what's cached, lets you inspect specific keys, clear individual entries, or flush everything. QAs can see what the cache is holding and decide for themselves whether it's the problem. No SSH. No waiting. No interrupting someone else's work.
The principle behind it is simple: if someone regularly needs to ask a developer to do something mechanical, that's a missing tool. Not a workflow.
"The translations updated but the app didn't notice"
Translation files at TitanFX are generated during the CI pipeline. On production, this works fine -- you update a translation in Loco (the translation management service), the pipeline runs, the new strings get baked into the build, and the app shows the updated text.
On lower environments, it doesn't work like that. QAs would update a translation in Loco to test how a new string looks in Japanese or Spanish. Then they'd refresh the app. The old text was still there. They'd clear their browser cache. Still there. They'd wait. Nothing changed.
The translation files only update on the next deploy. On lower environments, deploys don't happen on a translation change -- they happen when code changes. So the QA is stuck. The translation is correct in Loco. The app is showing the old one. And there's nothing they can do about it except ask for a deploy or wait.
QAs reasonably expect a direct relationship: I changed the translation, the app shows the new text. The architecture just didn't support that outside of production's pipeline.
I built a tool that pulls the latest translations from Loco and replaces them in the running application. It handles both server-rendered and client-rendered pages. It only runs on lower environments -- production still goes through the pipeline, untouched. This was specifically for unblocking QA so they could verify translations without waiting for a deploy that had nothing to do with their work.
The common thread
These aren't a suite. They weren't planned together or built on a shared platform. They're three separate responses to three separate moments where someone on the team was stuck and the existing tools didn't help them.
The common thread is a simple belief: when someone is blocked by a missing tool, build the tool.
Related Reading
- What Nobody Tells You About Migrating a Live Trading Platform — The dashboard migration happening alongside these tools at TitanFX