
Every institutional AI conversation eventually arrives at the same question. It’s usually not as simple as "does it work," or "how do we know this is accurate," but the one the CTO asks in the second meeting: how does our data get in, and where does it sit once it's there?
Internal research, investment memos, proprietary models, deal notes, and meeting transcripts are the most valuable parts of a firm's information stack. They are also, typically, the most sensitive. Any AI platform that wants to be useful to a buy-side research team has to be able to ingest this data. Any AI platform that wants to be deployable at a regulated institution has to do it without creating a compliance problem.
This is the part of institutional AI adoption that gets the least coverage. Vendor pitch decks focus on data accuracy and workflow demos, while analyst articles focus on use cases. Meanwhile, the aspect that determines whether a platform can actually be adopted by a firm after a security review is the data architecture.
Here’s how asset managers are handling it.
There are two viable patterns for connecting a firm's private data to an institutional AI platform. Which one a firm chooses depends primarily on where their data already lives and how much infrastructure they want to build to support it.
1. Direct Upload
The first path is the simplest. A firm uploads folders or documents directly into the AI platform, either through drag-and-drop or a folder sync. Files are stored in a private workspace that only authenticated users from that firm can access, governed by permission rules the firm controls.
It is the fastest path to production, requiring minimal set up. Many of our institutional clients use this as their primary integration method and never move beyond it, because the workflow fits the way their analysts already operate.
The trade-off is that data moves into the AI platform's environment. For firms with strict data residency requirements, or for firms that have already invested time and money in a unified internal data warehouse, the second path is usually preferable.
2. Cloud Integration (Snowflake, AWS S3, Google Drive, OneDrive, SharePoint)
The second path is for firms that already have a data layer and want the AI platform to operate on top of it rather than import from it. This covers integrations with AWS S3, Google Drive, OneDrive, SharePoint, and Snowflake. The Snowflake deployment is the most sophisticated of these, and the one we see most frequently at large asset managers with mature data infrastructure.
The firm's documents already land in Snowflake, typically piped in from AWS S3 or similar storage through Snowpipe. Nothing about that pipeline changes. The AI platform deploys as a Snowflake Native App inside the firm's own Snowflake environment, meaning the agent systems run next to the firm's data rather than pulling the data out of it.
When an analyst runs a query in Terminal X, the agents read documents on demand from the firm's Snowflake database, do the retrieval and synthesis work within the Snowflake environment, and write the output to a designated output table within the firm's environment. The firm's sensitive content never leaves the firm's Snowflake account.
What does move back to the AI vendor is narrow and explicit: operational metadata of what was processed, and logs required for monitoring and service operations. No document content, no extracted text, no embeddings of client data. The data stays in the customer's boundary. In this architecture, Terminal X gets what it needs to operate the service and nothing else.
For compliance teams at regulated firms, this is typically the architecture that clears security reviews. It preserves the existing data governance model the firm has already built around Snowflake, applies the firm's existing access controls, and leaves the audit trail in a system the firm already owns.
The value of running inside the client's Snowflake environment is not theoretical. It addresses four specific concerns that come up in every institutional procurement process.
The architecture is straightforward once you map it out.

Terminal X’s own data (external sources like SEC filings, earnings transcripts, Bloomberg, FactSet data, broker research, and real-time market data) sits in a separate Snowflake account on the vendor side. The two sides exchange what is strictly necessary through a Snowflake share: metadata, logs, and the operational signals the vendor needs to run the service. The client's document content stays on the client's side.
Not every vendor that claims to support "enterprise data integration" actually does this. A few questions surface whether a platform can genuinely meet institutional requirements or whether the claim is marketing.
Does the platform support deployment inside our cloud environment, or does it require us to move data to yours?
The answer should be specific. Snowflake Native App, AWS PrivateLink, Azure Marketplace, or similar. If the answer is along the lines of "we have strong encryption," this is a red flag.
What data leaves our environment during normal operation?
A credible vendor can name exactly what flows back to them. Metadata, logs, and service telemetry are acceptable. Document content, extracted text, or embeddings of client data should raise concerns.
Is client data used to train the vendor's models?
The answer for most vendors should be no. This should be well documented in the data processing agreement, and confirmed with the SOC 2 report.
The firms pushing hardest on AI research are also the ones with the most mature data infrastructure. An asset manager with data already centralized in Snowflake has a much faster path to production AI than one that is still running research off shared drives and email attachments.
For firms in the first category, the conversation with an AI vendor should start with the deployment architecture. A vendor that can connect directly to your existing cloud infrastructure, whether that is Snowflake, AWS S3, Google Drive, OneDrive, or SharePoint, lets you bring AI to your data rather than the other way around. For firms with data centralized in Snowflake, the Native App pattern is the strongest version of this: it is the shortest path through security review, and the one that is defensible when a regulator asks how sensitive information is handled.
For firms without an existing data layer to integrate against, direct upload remains a legitimate starting point. Many of the largest users of institutional AI platforms began with direct upload and migrated to native app deployments later as their data infrastructure matured. We see this pattern play out at Terminal X once clients recognize the efficiency gains from having their documents surfaceable with a simple query, as opposed to wrangling and uploading every document themselves.
The worst option is the one some firms still end up with: an AI tool that requires data to be duplicated into a third-party environment, with unclear controls, an audit trail the firm cannot see, and a data processing agreement that glosses over how training data is handled. That architecture does not survive a serious compliance review.
While the deployment architecture is the part of institutional AI that rarely makes it into the product demo, it’s the part that determines whether the product ever gets used. For asset managers operating under SEC, FINRA, FSA, or FSC oversight, the standard should be straightforward. Your data stays in your environment, your access controls still apply, your audit trail stays yours, and the vendor gets only what it needs to run the service.
Terminal X supports both direct upload for firms that want the simplest path to production, and native app deployment inside the firm's Snowflake environment for firms with mature data infrastructure and strict governance requirements. In either case, client data is never used to train models, access is governed by the firm's existing permissions, and every output is traceable to the source document it was drawn from.
If your firm is evaluating AI research platforms for finance and your security team is asking the architecture question, that is the right question. For more information on how to combine your entire firm's data with millions of public and private financial data sources, get a demo here.