Unleashing Generative AI - Considerations, Pitfalls, and Applications
Saurabh: Hello folks, thank you for taking the time and joining us in this session that goes over the considerations, pitfalls, and applications of Generative AI. Saurabh: Before getting started with the boring stuff, a little about us! I’m Saurabh Suman, a Generative AI Expert at Wednesday Solutions. With a passion for all things AI, I spend my days delving deep into the latest advancements, cultivating a keen understanding of how we can harness this technology to solve complex problems. By weekend, you'll find me looking for similar advancements in potions and food. I've had the pleasure of working with generative models in various projects, allowing me to navigate the potholes that come along in this journey with ease and precision. So whether you're an AI novice, an expert, or simply curious, I’ve got you covered. Pay close attention through the webinar, and get your questions answered at the end of it. In this journey with me I’ve got Mac. Mac, if you could please introduce yourself? Mac: Hey folks, I’m the CTO at Wednesday Solutions, and spend my days taking credit for other people’s work. On a more serious I’ve written Android and web applications that have handled millions of daily users, and billions of API calls. I've created and maintained Infra that’s scaled to handle 4x Peak Load at less than linear costs. I now spend my time on the lookout for advances in technology that increase people-productivity , & system-reliability. The most recent entry, and perhaps the most logical next step given how Data Engineering heavy we are, is generative AI. And we haven’t just watched from the sidelines, but instead we've been actively involved in pushing the boundary of what’s possible with GenAI. To know more about the industries that we've impacted please keep an eye out on our Engineering Case Studies page. The link is in the comments Mac: Alright, now that the intro’s are out of the way let’s dive in. Here’s what we’ll be covering today - What’s needed to kickstart your Gen AI journey - Finding and working within the boundaries of the model - The what, and how to validate the results of your Gen AI model - Collecting and using analytics and performance metrics - And calculating the RoI of your Gen AI initiative Saurabh, it’d be great if you could take us through the ABCs of kickstarting a Gen AI journey Saurabh: Sure, before we do that, let me first talk about our first Gen AI product. I’ll be using this as an example later on to explain use cases so pay attention folks The goal was to reduce the manual effort that was spent in sales prospecting. We wanted to create an intelligent system that was able to identify SQLs (sales qualified leads) by creating offers that are relevant to them. It had to understand their need and context. Once the potential customer responds a real human would come and take things ahead. We called it the Autonomous Sales Agent, and fed it data about Wednesday, case studies, blogs, service offerings, skills, etc. The bot needed to identify leads based on BANT- Budget, Authority, Need, Timeline and craft a custom email highlighting a relevant case study, or blog based on target company’s news, hiring posts, products, and customers. Mac: Let me give you an example of how it worked for one prospect. Let’s call them “Company A”. Disclaimer we used AutoGPT with GPT 4, and built a plugin on top of it for some more advanced use cases. Company A has really big customers and is in the payment reconciliation business and is hiring for Java/MySQL roles. Given the assumed traffic based on it’s customers, they should ideally be facing a bunch of issues due to MySQL scaling constraints. The Autonomous Sales Agent actually crafted a message that touches on this pain point, and coupled it with the appropriate content from our internal resource-pool on ways to solve it. And as bizarre as it may sound we actually got on a call with a leader from the company on the same day that the message was sent. We just focused on the inputs, the different variables involved in the sales process, and a hardened success metrics and we were able to get multi-tiered thinking abilities from the bot. I think that's enough gloating from me; back to you Saurabh, the ABCs of kickstarting a Gen AI journey. Saurabh: Prerequisites: You want to move from point A to point B. You need to define the characteristics and attributes of both points i.e. - what information, data, tools, & subject matter expertise do you need to start your journey? - what metrics, performance indicators and outcomes would mean success? In the Autonomous Sales Agent example, it’s impossible to just say potential lead without defining what qualifies as a successful lead. Without identifying what a successful lead is, it’s impossible to finalize data sources. Yes, data! Any GenAI endeavor can only be as good as the data that is being used. Quoting Mac here “Your Model is as good as the data you feed it” Identifying what categorizes as good reliable data, how you’re handling inconsistencies, and how rapidly you’re Model changes to adapt to data trends are all critical metrics that behave as an early warning system for whether you’re moving in the right direction. Mac: And btw identifying data sources, and what makes data reliable is not an engineering skill. It’s a Subject Matter Expert’s call. An SME in the field that we’re disrupting. Any GenAI endeavor without domain expertise will fail 100%, They are essential at the beginning, while building, and definitely once we’re out in production. It’s only once you’ve identified how to morph the input data so that it’s in the shape, form or state that makes it valuable to this endeavor does engineering come in, validate, clean, & sanitize inputs, build pipelines to process, transform it, and use the right models to train, test, validate and deploy. And this is neither easy nor cheap. It’s important to understand the pricing behind the entire endeavor, upfront costs, and ongoing cloud/infra costs, maintainance, demos, demo instances. And definitely take into account factors like security, compliance, and integration with existing and new systems. So checklist before you kickstart your GenAI journey -> Don't bring out your pens, we'll send over links to these - Solve exactly one, very well defined problem - Understand the shape, reliability, age, and continuous availability of data - Set success indicators & well success metrics. - Money in the bank. AI endeavors are expensive. Don't wing this. Once you've ticked all the boxes and are ready to start this endeavor, where do you start? Saurabh: This one’s easy, you need to choose the right tool for the right job. No matter how amazing your data is, if you chose the wrong model you’ll never get the desired results. If you’re running an architecture firm and have got 100 thousand designs and you want to create the perfect layout based on your designs from the past, GPT models by OpenAI which are text-based model will never give you satisfactory results. But for the Autonomous Sales Agent which was text-based, it gave amazing results. Also when to use open source models like Falcon and llama2, the delta between state-of-the-art and an open source model, the impact of that on your end results, data privacy ,governance, understanding the models at your disposal, their limitations, and applying the right model to your problem is a must. Mac: And it doesn’t stop there. Once you’ve zoned in on the right model, you need to understand the model's limitations and create avenues of human intervention at the right step, with proper feedback cycles, and analytics. This will increase the chances of meeting your productivity and revenue goals. It's important to understand your model's limitation or misbehavior and be able to attribute it. The misbehaviour or unintended consequences are a result of input data, constraints and training. And we'll come to how you can triage what's causing the skew in a subsequent slide But you can save a lot of headaches by just thinking about all the different ways in which your model will misbehave. The model will behave the way you ask it to, not the way you intend it to. Clearly stating what you don't want it to do, is as important as stating what you want from the model. For example, for the Autonomous Sales Agent what went wrong initially is that it concentrated on leads of a particular type due to biases in the input constraints and wasn’t able to help us get leads in the demographics that we were targeting. Saurabh: So the checklist for defining boundaries is - Evaluate the available models, and chose the model that fits your use-case the best. - Understand the trade off between the state of the art AI, and consider data privacy and governance before making a choice. - Create avenues for human intervention, process for feedback and analytics - Along with listing down what you want from the model, make what's not expected from the model very clear. Mac: Just gonna pause here for a few seconds while you take in this data. --- Just because your use case is textual doesn't automatically make GPT the ideal fit, each of these models have their strengths. To build a defensible product you'll probably need different models to solve different aspects. Else it's just like any other wrapper on top of GPT. And just expecting since I'm using XYZ, and it's known for these capabilities and things will just work is risky. So how do you validate the direction, and output of your model and ensure that there is intervention before it’s too late? Saurabh: When we're dealing with Generative AI, we need to have a system in place to make sure it's doing its job well. We call this 'validation'. The type of validation we use can vary, depending on what we're asking the AI to do. For example, if we're using AI to translate text from one language to another, we use something called BLEU scores to measure how well it's doing,(Bilingual Evaluation Understudy) scores, a method of evaluating the quality of text which has been machine-translated from one language to another. On the other hand, if we're asking AI to create images, we could use measures like the Inception Score or Frechet Inception Distance to gauge the quality and variety of the images it creates. Mac: That's a great point, Saurabh. And you're right. it's important to analyze the output, but once you have the results you need to be able to use those to make the model better. Which brings us to 'model transparency'. Foundational models, are super complex, and seem like a 'black-box' – we put something in and get something out, but it's not always clear what's happening inside. That's where methods like LIME(Local Interpretable Model-agnostic Explanations) and SHAP(SHapley Additive exPlanations) come in. These are techniques to make the workings of the model more transparent. They can help us understand why the model is making a decision or what factors or features are responsible for the output and it's weight. For example, LIME uses a simpler model that approximates the decisions made by the model in this particular instance and outputs it in a way that it is interpretable by humans Saurabh: Absolutely, Mac. And the last big thing we need to think about is making sure our AI model continues to perform well over time. AI models learn and change as they're exposed to more and more data. So, something that worked well yesterday might not work so well tomorrow. That's why we need to keep checking our AI model's performance. One way we do this is by holding back some of our data when we're training the model, and then using this held-back data to check the model's performance. This can help us spot if the model is starting to 'overfit' – a term we use when the model is so focused on the data it was trained on that it struggles with new, unseen data. But, we can't just rely on this held-back data. We also need to track how well the model is doing in the real world. For example, if we're using AI to come up with email subject lines to get more people to open our emails, we should keep track of the actual open rates to see if the AI is really helping. So, again going with the checklist, working with Generative AI involves a lot of checking and re-checking to make sure it's doing a good job. Ensure that you know why your model behaves in the way it does. Keep the track of accuracy of the model output in comparison to its relevance to the real world. Saurabh: So you’ve now validated results, you’re confident that you’re solving a real problem and you unleash your solution to the world. You see crazy uptake and you don’t want customer/user complaints to be the only indicator of how well you’re model is doing. You must put in performance analytics to measure how well the model is doing compared to some self-made benchmarks or industry standards. Break it down into what problems you’re model is solving really well, what parts is it not doing so well in and create correlations between those. For example, in the first go of the Autonomous Sales Agent it did a really bad job with the sourcing but wrote really beautiful and impactful emails. Once we fixed the issues with the lead sourcing or generation we immediately started seeing much better overall results. It’s extremely important to analyze these metrics over time, correlate them with changes, and identify trends to understand fallacies and create a data-driven plan of action ahead. GenAI may seem like magic and the one-size-fits-all solution that we’ve been waiting for our whole lives and it may very well be, but it’s not a mind reader. It needs to be constantly guided, nudged, and morphed into the perfect solution. Mac: And introspection or diagnostics isn’t just about figuring out what isn’t working well, and fixing that. It includes understanding why something is working really well, and figuring out ways to include that within the process of building itself. And while we’re on the subject it is possible that your model is amazing but the market conditions have changed. Exactly like the current market conditions. What you could do in the pre-GPT era vs what is possible now is completely different. The world has changed, and your system needs to change with it. And this could even mean creation of an altogether different success metric, this may sound daunting but it’s not nearly as bad as a useless model and success metrics that no longer make sense. Setting up a cadence of regular check-ins to validate or measure - the output - product performance across time, and across the industry - success metrics will enable preventive action and course correction before it’s too late. The checklist for analytics, and performance is - Routine performance measurement i.e the outcomes of your endeavour. What is the reduction in cost/man-hours. What are the growth numbers that you're hitting. Change in market conditions and impact on success metrics. Rate of change in market conditions to product? - Replicate success indicators and ensure you're identifying failure indicators and applying learnings across endeavours. - Create a plan on how and when to upgrade your model to use newer variants, validate and check availability of new data sources and how to incorporate them. Alright, this is probably the most awaited section. All of this is for nothing if we can’t really justify the return on investment. A GenAI endeavor however amazing and cool it may sound needs to be backed by solid justification on what the expected return is. We need indicators to determine if - we’re moving in the right direction - whether we need to pivot - or if we need to cut our losses and just stop right here. And the first step in this is Calculation of the Economic Value Generated: Sometimes this is straight forward cause it’s a absolute number or saved man hours, and in some cases its a bit more skewed when you’re relating it to eventual growth and expansion. It’s easy to get caught up and be overly optimistic - my advice here - try to be as real as possible and set up actual short-term indicators to see if this “growth/expansion” is happening, or even valid in long term market conditions. The world changes faster than we expect. Next is the quantification of costs incurred. Upfront investment in infra, ongoing costs for maintenance, upgrades, PoCs, RnD, SME time, etc. No matter how insignificant, these costs pile up and it’s needed to understand which endeavors to double down on and which to cancel You’re then going to have to define concrete efficiency metrics. Efficiency in the context of an AI model can be viewed in different ways. - One approach might be to measure the efficiency of data usage. This can be quantified by the outcome. - It could be the time saved by automation by comparing the time taken traditionally vs with the new system. However, the RoI in this case isn’t just simply how much time or money you’ve saved. It's by value of the opportunity that you've created for subsequent endeavours. GenAI endeavors aren’t sprints, their marathons. Treat it as such. And with that we've come full circle. Key Takeaways! Saurabh: Alright so Quick Key Takeaways and then we'll get to the QnA! - Before you start make sure you have - well defined success metrics for one well-defined problem - reliable, good and available data - skills and money. - Before you start building make sure - you chose the right model. Understand the limitations of open-source vs state of the art and the implications - ensure there is a process for analytics, feedback and check-ins - have proper guard rails, and domain expertise to get the best out of the model - When you're evaluating make sure - Model Validation: regularly check ins to ensure right outputs. - Transparency: Understand why our AI models make the decisions they do. - Continuous Checks: kepp an eye on performance - Once you've built make sure - you set up routine checkins to measure performance metrics, success indicators and metrics. - replicate success factors, identify failure indicators and incorporate feedback loop - Plan for maintainence, upgrades, and new data - And calculate the RoI by - caclulating economic value generated - calculate costs incurred - measure efficiency The change from Analog to Digital wasn’t just about scale, efficiency and precision, It was about reimagining what was possible. Get books delivered at the tap of your finger, see and speak to family across the world. This is another internet moment. The world as we know it is changing. The advances in AI have made it part of our everyday, whether it is the algorithm that decides what you see on your timeline, or the smart watch that records how well you’ve slept. Consumer facing AI is a reality, and it’s now a necessity for businesses to accept, and adapt with it. I hope this session makes the transition to being an AI-Powered business easier for you.. For a RoI worksheet you can reach out to firstname.lastname@example.org with your business use-case and we'll send you your own custom RoI worksheet! The email address is in the comments below.
Saurabh Suman, Mohammed Ali Chherawalla (Mac)