.png&w=3840&q=80)
I recently spent a few months redesigning Terminal X's custom reports feature. I want to go over the whole process from what broke to how we fixed it, and what I learned about building production agentic systems in one of the world's most demanding industries. For a bit of background; Custom Reports was a feature we had originally made just after our product launch in 2025, marketing that it could generate institutional-quality investment memos, DDQs, and research briefs from an uploaded template and a prompt. In practice, the content was often too bloated or too sparse, and the output was markdown that often failed to resemble the user uploaded template document. As a result most clients tried it once and went back to writing reports themselves.
.png&w=3840&q=80)
The system had been built before Terminal X moved its main deep research agent to the orchestrator architecture that's now standard. The rest of the platform had evolved; Custom Reports was still running on an outdated workflow that looped ad infinitum through a hardcoded RAG retrieval network, passing everything to a single LLM that wrote the report in markdown - in other words, it was highly inefficient.
By November 2025, the gap between what Custom Reports was and what it needed to be had become too large to ignore. Further, clients were demonstrating a tacit demand for the feature through their deep research queries. Daily, we would receive queries asking for outputs mirroring a specific report structure, queries asking for long multiturn threads to be reformatted into a “downloadable report”, or detailed, multipart queries that would be better answered in a report format. At the same time, the competitive landscape was heating up. Other services were pushing their own report features, underscoring the need to level up a key strategic minimum. Thus, we gave ourselves a six-week window to rebuild the entire feature before the end of the year.
There were a number of compounding issues that existed in the previous version of Custom Reports that had to be addressed. For one, our looping architecture ran various parallel data retrieval deployments with a fixed search pattern that had a hard time adapting to what the user requested. For example, if someone requested an investment memo focused on management quality and governance, the system would run the same SEC filing, news, and broker research queries it would run for an earnings analysis.
This was of course a symptom of the entire Custom Reports feature running on a now outdated, purely RAG system. The multi-agent system that we had moved our Deep Research agent to in the months prior, which has been largely adopted as standard practice since, had not yet been integrated into our Custom Reports system. As a result, we lacked an intelligent orchestrator agent to conduct a thorough and token-efficient search across the expansive field of Terminal X sources. We would brute-force our search across all of Terminal X’s sources, including broker research, SEC filings, our host of premium news sources, call transcripts, and more. This also prevented us from effectively auditing the research process as it was occurring, inhibiting our system’s ability to adapt its research questions if certain research lines came up empty handed. While Terminal X had the data quality to write detailed and client ready reports, our system struggled to tell which of it mattered and which was irrelevant.
Critically, this looping system also accumulated everything into a single context window. For a 25-page report pulling from 100+ sources, the system would blow past token limits, take 20+ minutes, and cost a fortune to run. Additionally, one of the main motivators for the revamp was the final output being in plain markdown. There were multiple instances of users uploading beautifully formatted reports as their template, only to get back a wall of text hosted within our Custom Reports tab, outfitted with basic H3 and H2 headers. The final result was a bloated report, without any data visualizations, in markdown, and often missing the key ask from the users prompt. If that wasn’t enough to discourage adoption, generating a report required a significant amount of friction; 5+ fields of data for the user to fill out, including a report title, thesis, and tickers to include, just to make the decomposition process easier for our backend system to process. The culmination of these flaws created a system that was unattractive for users and challenging for our team to address.
Gabe, our product lead, had given me the green-light to start on the redesign just as Terminal X began moving to our new Madison Avenue office at the end of November. He also gave me some homework which proved to be invaluable; read Anthropic's research on multi-agent systems.
In their article "Building Effective Agents", Anthropic lays out the performance improvements to be had by implementing orchestrator agent architectures for complex research tasks over traditional pre-defined code path approaches. The architecture was clear in theory, where a lead agent plans and coordinates, specialized workers are called on to execute in parallel, and a synthesis agent assembles the final output. The challenge was translating that into a production system within Terminal X's existing infrastructure, running on our in-house LLMOps platform GAIA, with a small team of myself and two engineers handling the pipeline design, prompt engineering, and code. The final architecture settled into five stages.
.png&w=3840&q=80)
The first step in the chain is quite simple, where an initial stage extracts a number of structured variable components and decides on a set of preliminary research queries to give the model context. The main purpose of this stage was to move the onus of the information input phase from the user to the LLM. As a result, we were able to consolidate the generation of Custom Reports into a single natural language query. This moved the generation process from a complicated UI/UX flow architecture into our main search bar, activated simply by typing “/report”.
Following this initial intake step, an outline is created based on the user's uploaded information and additional context from the initial stage. Its job is to analyze the template's structure, understand what sections should exist, be modified, or get removed based on the user's request, fit everything to the new subject's context, and produce a tailored outline. Each description specifies what data the section needs and how complex that section should be. These directions directly inform how detailed the research in the next stage should be, allocating bandwidth up front so the orchestrator agent can call on subagents efficiently.
Following these two steps, the complete research plan is then sent to our main research orchestrator agent. This orchestrator agent receives the outline alongside access to Terminal X's tool suite: SEC filing search, earnings transcript search, broker research reports, the private data room, our full premium research corpus, and web search. The orchestrator agent executes tool calls until it feels it has sufficient information to inform a final report generation. Following our research phase, we eliminate redundant sources to fight context rot and ensure our writing agent receives minimal source information bloat.
Critically, the writing agent writes the report section by section with awareness of what's already been written and what's coming after it. This ensures the output is coherent, follows a narrative flow, and reads like an actual human analyst wrote the content. Any report worth reading shouldn’t be a disjointed collection of independent sections; it should include a valuable executive summary, foreshadow and reference specific metrics across sections, and synthesize meaningfully across the entire document.
Getting the content right was really only half of the problem. The more difficult half was making it look like something an institutional investor would actually want to read. As I mentioned, the original system outputted markdown, which was unpresentable. Therefore, I had to add an auxiliary step, where an LLM would write JavaScript code that programmatically generates styled Word documents using the docx-js library.
The complete markdown report, along with the images of the user's original template showing the visual style they want replicated, gets passed to this .docx generation prompt. The prompt's job is to output executable JavaScript that creates a .docx file matching the template's look, including colors, fonts, spacing, headers, footers, table styling, page layout, and everything else you can possibly replicate. This step allowed for the custom reports feature to fully realize the “Custom” part of its name, as we could now support complete adaptability, mirroring each client's template as it got uploaded.
Of course, this additional step also introduced a host of problems. Tables with borders that should have been invisible rendered with visible lines, which made the output look unprofessional. Content got truncated under output length pressure, with the LLM summarizing paragraphs instead of preserving them verbatim as it approached token constraints. The same issues presented for citations, which would get dropped or misformated. Each of these required specific prompt engineering fixes.
Additionally, the system was forced to evolve into two tracks to work for template and non-template reports. Template reports (where the user uploads an example document and the system replicates its style) run through the full JavaScript generation pipeline. No-template reports (where the user just provides a prompt) use a more standardized formatting approach with the client's company branding, Terminal X co-branding, custom colors, and a dedicated title page. The final output for both report types is a downloadable Word document. On top of all of this, Gabe was able to build out a new charts feature that intelligently renders data visualization elements into this step, adding charts and graphs to break up a previously text-only output.
.png&w=3840&q=80)
The most common use case among buy-side research teams is weekly reporting. We tend to see analysts running the same prompt every week on Monday morning to gather a market recap, sector summary and earnings round-up all into one report, which they then download to edit and polish. This is a critical time saving workflow, as it cuts down on manually combing through disparate sources, instead allowing them to gather all of the sources instantly alongside a rough draft that they can then clean up for final review. Our team has found that power users tend to take advantage of the /reports feature the most, typically starting or ending their day with a report summarizing the session while using our deep research feature during the work day.
For portfolio managers and analysts evaluating a new position, the workflow is slightly different. They upload their internal investment memo template, provide a thesis (bearish on a Korean pharma name on clinical trial risk, for example), and get back a formatted memo synthesizing broker research, SEC filings, and earnings transcripts, all structured to match how they actually present to an investment committee. Not only does this help them evaluate the stock name at hand, but it also allows for quick logic checks, where users can run both bullish and bearish reports for the same stock and see which logic holds up under IC scrutiny.
For some teams the use case is even more straightforward. They will run the same report, for a different company, every week. This can be a fund comparison or a DDQ that runs on a fixed cadence, where the template and format doesn’t change, and only the underlying name does. In the past, this meant having an analyst spend an hour or more reformatting the same document for a different name, hunting down the same sources they pulled last week and seeing what (if anything) changed. Now they can simply run a single prompt, get the doc with all of the sources cited, check it over, and move on to the next report. We find that one of the highest ROIs for /reports is the ability to save time on these types of repetitive coverage write-ups that previously had no reliable way to be automated due to LLM context window cut-offs and a lack of access to rigorous sources.
Anthropic's architecture that kickstarted this system conversion was much trickier to implement in practice. Token management was the persistent constraint. Due to the breadth of our sources, the research agent could gather enormous amounts of data across many rounds of tool calls. Extremely long reports required the writing agent to have enough context to maintain coherence across 10+ sections. Additionally, the markdown to javascript converter needed to have the full report content along with the template images, which could be hundreds of pages long. Thus, at every stage we had to manage the tension between giving agents enough context to do their job well while also staying within token limits that wouldn't bankrupt us.
Prompt engineering at this scale is also fundamentally different from prompt engineering for a single LLM. When you’re prompt chaining five agents in sequence, some of which have control over subagents, a subtle change to one agent's instructions can cascade into completely different research patterns, which produce different source tool calls, which change the way the writing agent composes its output, which breaks the markdown to .docx document generation. Like any good experiment, I learned to make changes to one element at a time and evaluate that change end-to-end, resisting the temptation to tune multiple agents simultaneously.
As always, working primarily in the Korean language added another layer of complexity as a native English speaker. Reports needed to be generated correctly in Korean that sounded natural and true to how an analyst would write a real memo, including the proper shorthand and industry jargon. As the only true language-agnostic AI investment platform, it helped to have experience prompt engineering for both languages simultaneously, but LLMs were still my primary tool to translate and verify outputs once the final report came off the press.
I’m pleased with the reception of the feature two months post launch. Custom Reports usage hit a 3.6x month-over-month multiplier in growth. The usage is also quite sticky, with those who use it once returning to generate reports that they have to create daily or weekly. While the content is superior from its predecessor, I believe that the largest explanation for this growth is the format improvement. When the output is in Word and formatted like something you would want to put your name on, you’re more likely to use the feature again. Once you verify the content and make slight tweaks as you would with any AI output, you’ll find that you’ve saved hours of research, writing, formatting, editing, and back and forth with your team.
If you’re interested in demoing the feature, please reach out to me at [email protected]