Article Data Architecture for AI: 8 Tips to Help You Succeed August 6, 2024 Morgan Llewellyn, Mike Lampa As organizations increasingly integrate Generative AI into their operations, they face many challenges establishing the data architecture for AI. Getting beyond these challenges require more than theoretical solutions. Good AI, whether you’re buying a tool or building your own, will have some key components. What do you need to think through? What will be required regardless of the approach you take? What data architecture is needed for AI? What engineering and architecture principles do you need to keep in mind? In this article, we’ll discuss the challenges we’re seeing in organizations adopting Generative AI as well as the tried and true practices to address each of these challenges, step by step. Our approach isn’t theoretical. It’s very much tied to what people are actually doing. Gain an understanding from a practitioner’s point of view with the following 8 tips. 1. Position Gen AI correctly Broadly, but specific to analytics, it’s important to understand that Generative AI is a necessary extension of your analytics ecosystem. Why? Think about the data engineering practices that we have in place for doing traditional data and analytics, like: Building out the data pipelines Doing data quality and data observability checks Deploying, automating and monitoring data pipelines to take messy data, turn it into a meaningful asset, and use it — whether that’s to train a large language model or machine learning engine or for enabling analytics that people can glean insights from and take action on. Following DataOps principles and CI/CD engineering disciplines For example, many machine learning models generate predictions or customer segmentations that feed into building target campaign audiences. But organizations aren’t consistently evaluating results overtime by saving these outputs. As a best practice, companies and firms should save, assess and analyze that prediction data, like they do sales over time or customer service complaints over time. This may seem obvious, but in Mike’s experience the discipline isn’t as ingrained as it should be. Even when it comes to monitoring model performance, many organizations still deploy a model and never go back to check it. In the meantime, it’s drifting and eroding. This is where MLOps Principles come into play. This is a main reason why it’s necessary to think of Gen AI as an extension of your analytics. 2. Anticipate and prepare for tipping points Companies, especially ones that are scaling quickly, can reach tipping point moments where it’s clear that if they don’t do something quickly, their processes will break. When this happens, they understandably gravitate towards point solutions that solve particular pain points without a clear context of how that point solution fits into their broader business strategy, tech strategy, etc. That’s why it’s critical to get into the strategic planning mindset as soon as possible when it comes to your analytics. Anything that moves you closer to where you want your organization to be puts you on the right track. Don’t find yourself in the position of feeling beholden to your vendors when they come to you with opportunities and ideas. Have your own North Star in place first. We’re seeing this when we partner in conversations with our clients and Salesforce, for example. We can let Salesforce know when clients need someone to help in a particular area because their current approach isn’t cutting it. One example of a company that’s growing astronomically delivers healthy meals to seniors. They’re realizing their processes are breaking because they’ve been manually oriented for so long. They have some very gifted and talented people who work really hard to keep things afloat, but that’s not sustainable in the long run. They need to lean into enabling technologies to get over that hump and reach the next level of scale. They’re balancing new technology adoption against that time sensitivity element that’s so common. As much as you might feel you really need to move, don’t just run out and grab a point solution. Temper that impulse. Step back for a moment and make sure you understand how a new tool will fit into the bigger picture. 3. Build user trust in current models One of the biggest hurdles we’ve seen for users who want to use some of these chat features is, “How can I trust that it’s right?” and “How do we know what goes into this Gen AI model?” Some common responses are doing a RAG model or grounding, or putting guardrails around data. Those are good, appropriate solutions. But they don’t address the fundamental issue — how can your users trust the tool to give them reliable, consistent results? The problem is not that users don’t understand a RAG model. Instead, it’s a fundamental issue with user adoption, which requires having solid metrics and a testing plan. Users need something they can understand. They need to know that if it works for a specific test case, it will work for their use case. The best practice that we’ve seen in developing ML models in AI over the past 20, 30, even 40 years is this: if you want to convince someone that your model is accurate, you need to be prescriptive in developing questions that the AI can objectively answer correctly or incorrectly. Create that list of questions — but don’t only go through it once. Test it multiple times so you get both a vertical and horizontal dimension. Having that breadth and depth in testing will let you prove to your users, “We used 100 questions. Five times we could confidently answer 98 of them correctly every time. Twice, we either didn’t answer them correctly every time or we had some variance.” That’s how you ground users in. Know the scope of what the model’s answering and what you’ve tested it on. Validate your results. 4. Implement data governance Data leaks are another major concern. Will you be exposing sensitive information that puts your organization at risk? The solution to this comes down to tried-and-true architecture best practices, including having a well-defined data architecture for AI, as well as having a policy in place. In addition, you need to find ways to detect, execute and enforce those policies. That means from a data security standpoint, you need to understand what security and regulatory compliance issues you have to account for and build these into the design of the data architecture and the data product. One example might be companies doing customer-facing reporting because they’re sitting on an industry’s worth of data. They have to make sure that they don’t let Mike see Morgan’s company data. So you’ll need good access control rules, row-level security, and row-based architecture built into the data product, and those rules and security policies have to be tight and enforced. Also, consider using Gen AI to constantly digest all the regulations as they come up and create the structured output that tells you where to apply these new regulations into your governance and ethics characteristics. These should be embedded in the data product as well. 5. Evaluate new tools with a clear strategy Another challenge that’s very particular to Generative AI is that the space is moving so quickly. There are so many tools coming at us left and right and leapfrogging the incumbents. It’s a real challenge to first get your feet underneath you, and then evaluate new tools, especially when everybody has their own Gen AI. How do you choose among all the new tools? When should you choose? How do you decide if you even should switch? Our advice is to build a backlog, an inventory of use cases for AI Automation and Gen AI that will benefit your organization. Because in the end, it’s not about the technology. It’s about the use case. What do you need to move your business torward on their strategic objectives and achieve their key results? Identify the capability, prioritize the backlog, and then decide which use cases map to what technologies. Are there dependencies? Going through this process will help you understand whether everything has to live within one platform or whether you’ll need different tools. You’ll be able to see whether the tools have to integrate, and if so, how the data will flow through. Then create a holistic roadmap and journey of what you want your business to look like — not today, not tomorrow, but constantly out two to three years. Finally, given where the space is going, as well as what’s current and what’s feasible, how do the opportunities in front of you map to the technology? Then make your tool selections accordingly. Without this roadmap or strategic plan in place, you’ll have vendors approaching you with a hundred and one different pieces of technology and no concept or context of whether that technology is right for your organization. It might sound right. It might scratch an itch. But you might not realize that that itch is actually connected to some other need. You might buy a point solution that actually doesn’t allow you to solve the bigger picture problem. That’s why having a strategic roadmap and a vision for where you need to go is absolutely paramount. Otherwise, you’ll be subject to the whims of every vendor with an idea. Finally, you must continually revisit your strategic roadmap. Choose a meaningful cadence. You can’t put together a three-year roadmap and then not review it a year or even six months from now. Things will shift. Be prepared to be agile with your strategic roadmap as well. 6. Validate new models Once you’ve decided you’re ready to upgrade, how will you know that your model still works? That’s where your test base comes in. Know the questions you tested before and the accuracy you achieved. As an example, let’s say you achieved 98% accuracy with your previous model. Now you can accurately measure how your new model performs against that same baseline. Does it achieve the same 98%? Is it better with 99%? Or is it only hitting 80%? Have a consistent way to test the accuracy of different models. Whatever your rationale for updating to a new model — be it cost, ease, or speed — you’ll be able to see how it performed against your benchmark questions. These are tried and true practices that have been around for quite some time in other areas and they’re still 100% applicable in the brave new world of Gen AI. Consider applying a similar approach to Gen AI model risk management as well. 7. Build a business justification Another major challenge we see frequently is defining the value a solution will provide versus the cost. Again, the answer is knowing your use cases. Assign a value and prioritize them so you can clearly articulate the cost versus the value. Suddenly, it’s a whole lot easier to get projects approved. The dialog shifts from a cost mindset to a value enablement mindset. A key practice is to find a valuation method that your stakeholders will believe in and that you consistently apply. It’s not always easy in analytics unless you’re generating revenue directly from selling the analytics. But wherever analytics is helping the business enable better decision-making or to move the needle in their business area, it’s represented by some metric — for example, 10% growth over the next year. And you know the organization will make decisions being informed by analytics to get that increased growth. That’s the basis to come up with that framework for valuation. What’s the value the business is going to get from this enabling capability? Again, use it consistently so that everybody agrees and buys into that value statement and the ROI. 8. Future proof your organization No one wants to make significant investments in technology only to find that it’s obsolete within 6 months or a year. One of the best ways to future proof your organization against this risk is to decouple your solution designs from the technology. In other words, design with portability in mind. That includes your data or analytics products and the services around them, like access and integration methods. For data products, for example, you want to be able to port data content between enabling technologies. There’s not an exact rubric for making that happen but it’s a design concept that architects need to start thinking about: designing with interoperability in mind. Building a product so you can put a wrapper around and move it. It all comes back to software engineering best practices, or purchasing tools that are built on best practices. Think about what tools this integrates with. It might not involve the tool that you’re looking to replace but another tool in that ecosystem. This is an old example, but it’s representative. When Mike was at Dell, they were a Teradata shop. At one point, they brought in an upstart company that offered significant savings. Mike asked the data architects to design the model using the lowest common denominator so that if they had to move back to Teradata, they could do it with very little recoding. The architects laid out all the data types and made sure they avoided the non-ANSI standard data types. Something as simple as that made it easier when Dell eventually did have to move back to Teradata. Building a Data-Driven AI Future There’s so much happening in the world of Gen AI at the moment, and so much uncertainty about how to get started. It’s hard to cut through the noise and know what makes sense for your organization. You might need to take a step back and get a bigger picture before you jump into the rabbit hole of a point solution, going too narrow, too deep, too fast. That’s where the experts at HIKE2 can help. Does your organization need help generating use cases and mapping them back to your tech stack? Talk to us today. Latest Resources Article Data Architecture for AI: 8 Tips to Help You Succeed As organizations increasingly integrate Generative AI into their operations, they face many challenges establishing the Read The Full Story Article From Point Solutions to AI Platform: Implementing an AI Strategy There’s a good chance that your current approach to AI isn’t really serving you. Maybe Read The Full Story Stay Connected Join The Campfire! Subscribe to HIKE2’s Newsletter to receive content that helps you navigate the evolving world of AI, Data, and Cloud Solutions. Subscribe