Skip to content
Video

Architecting a Secure & Scalable Cloud Data Lake: Lessons from TriState Capital Bank

|

HIKE2

Building a modern data lake is a strategic foundation for scaling analytics, AI, and business decision-making. At Innovation Summit 2025, TriState Capital Bank’s data leaders shared how they successfully architected a secure, scalable cloud data platform built for both operational needs and future AI capabilities. Their firsthand lessons highlight how to design for flexibility, governance, and trust from the ground up—essentials for any organization navigating today’s data-driven future. Watch the full session below, and start with these key insights:

Key Takeaways:

  1. A Secure, Scalable Data Lake Enables True Business Transformation
    By consolidating all structured and unstructured data into a centralized platform, TriState created a “single source of truth” that improves reporting, analytics, and operational efficiency—critical steps for supporting AI initiatives and federal regulatory requirements.
  2. Tokenization Is Essential for Safe AI Integration
    Before layering AI over their data, TriState implemented tokenization in-memory to protect sensitive information. This approach allowed them to experiment with AI models safely, ensuring that no private or regulated data could inadvertently leak into external systems.
  3. Governance and Data Normalization Must Be Prioritized Early
    Cross-functional collaboration helped the team create a normalized data model and shared business definitions, reducing confusion and improving decision-making across departments. Strong governance frameworks were key to managing schema drift, AI bias, and ensuring long-term data quality.
  4. AI Readiness Requires More Than Technology—it Requires New Processes
    TriState’s AI proof-of-concept emphasized the need for internal AI review boards, model monitoring, and retrieval-augmented generation (RAG) techniques to ensure safe, effective use of AI on enterprise data. AI governance isn’t optional—it’s foundational to responsible innovation.

Why a Data Lake?
The bank needed a centralized platform to meet increasing regulatory demands—especially the $100B asset threshold initiative—and to empower business users with real-time analytics and AI capabilities. A cloud data lake, unlike traditional databases or warehouses, offers flexibility with schema-on-read, supports structured and unstructured data, and scales effortlessly with organizational growth.


The Common Data Platform
Built on Microsoft Azure, TriState’s “Common Data Platform” integrates several key technologies:

  • Azure Data Lake Gen2: A familiar, folder-based file system that supports all data formats.
  • Azure Function Apps: Enables change data capture for critical updates like customer changes.
  • Tokenization: Protects sensitive data like SSNs before exposing it to AI models.
  • Azure Data Factory: Automates data flows with minimal setup time—onboarding a new source now takes less than a day.
  • Azure Synapse: Allows business users to query data directly, increasing speed to insight.
  • Medallion Architecture: A three-layer system (bronze, silver, gold) to incrementally refine data for broader business use.


Best Practices for Implementation
Tommy Moran, a Solutions Architect, and Benji Opoku Agyemang, a Data Architect, on the team emphasized these principles:

  • Single Source of Truth: Consolidating data ensures consistent, reliable reporting.
  • Defined Business Logic: Business terms are clearly mapped and standardized across tools.
  • Schema Drift Handling: Built-in flexibility allows seamless adaptation to changing data structures.
  • Data Governance: Quality, lineage, and ownership are rigorously maintained to comply with regulatory standards.
  • Serverless Technology: Reduces maintenance overhead and improves system reliability.


AI: The Next Frontier
With the data lake in place, the team explored integrating AI. They launched a proof-of-concept to test AI capabilities, using Azure AI Studio and vector-based retrieval augmented generation (RAG). Critical to this was in-memory tokenization, which anonymized sensitive data before feeding it to AI models—ensuring security and compliance. They focused on retrieval, not modification, using AI to answer internal data questions without exposing sensitive information externally. This isolation prevented data leakage and enabled safe experimentation with tools like semantic search and embedding.


Challenges and Learnings

  • Normalization: Cross-functional collaboration helped unify definitions across departments.
  • Data Formats: Adapting formats (e.g., JSON for AI ingestion) was necessary but manageable.
  • AI Governance: Recognizing risks like bias and data drift, the team proposed internal review boards for oversight.

TriState Capital Bank’s journey is a case study in building future-ready data infrastructure. Their experience proves that with the right architecture, governance, and foresight, organizations can securely harness the power of data, and even responsibly explore AI, without compromising on security or agility.

This story isn’t just about technology; it’s about vision, collaboration, and preparing for a data-first future in one of the most regulated industries in the world.