Model Monitoring Part 3: Detecting Drift in Model Structure, Model Coefficients, and Real-World Data

InsightsPortfolio

Feb 18

Previous discussions of model monitoring (Part 1 and Part 2) have focused more on user experience and application development. But the first and, in some ways, foremost consumers of model monitoring tools are the data scientists themselves. Data-driven models are highly dependent on the quality of the data given to them for training, as well as on the consistency of the data given to them in deployment. If the training data or the real-world test data changes in some material way, data scientists need to have the tools to detect those changes as soon as they might begin impacting the performance of those applications, or even before.

Machine learning models for artificial intelligence applications require constant retraining. The real-world data for deploying these models is a moving target. People change the way the interact with each other, computer systems, and the world at large. Language use changes subtly, with new words being introduced and new language patterns and word frequencies changing. The user base of an application will gradually change over time. For all of these reasons, models should not remain static, but should be retrained from time to time with new training data.

As models are retrained on new data, data scientists need to track whether their new models are similar enough to the previously deployed models to believe that the models will continue to work on the applications they have been performing well on to that point. The process for doing that analysis depends on what kind of models the data scientists are using.

If they are using statistical regression models, tracking the change in the coefficients and performing well-understood cross-validation tests should detect significant changes in those models. If the new models pass those tests, then all is good. But what should they do if they fail those tests? Data scientists need tools to dive into the changes in the new models, to understand what has changed in the regression coefficients, what features of the data are likely to have caused those changes, and if the users of those models should be concerned about the changes in the models or if they are unlikely to have a negative impact on the performance of the applications that use them.

In the case where data scientists are using deep learning techniques, or other non-linear learning methodologies that produce less humanly explainable models, the problem of tracking drift in models is even more acute. One can use cross-validation methodologies to detect changes in behavior of the new models versus the old models, by simply applying the new and old data sets to the new and old models to see if there are noticeable differences in key performance indicators, e.g. task completion, accuracy, recall, etc. But if differences are detected, it’s less clear how to identify the root cause of the change, much less to determine if the changes are a cause for concern. This is where good and focused model monitoring software is critical. The model monitoring software must be developed in a way which anticipates the learning techniques used in building the models, so it can be manipulated to understand the parts of the models that have changed, what features of the data are likely to have caused the changes, and whether the new models are likely to behave substantially differently in the real-world applications they are deployed. The less explainable the models are, the more difficult this task will be. Facile and nimble visualization tools are critical in these cases to allow data scientists to dive into the innards of the models they have built to understand the important features of these models, in order to diagnose whether the changes in the models are idiosyncratic and unimportant, or significantly impactful on performance.

One way to avoid exposure to model changes before they impact production performance is to have monitoring tools that act directly on the data itself. Data monitoring tools are critical for detecting changes in the real-world data that an application processes, to anticipate any future problems when retraining new models and to avoid having applications begin to fail because the world is changing more quickly than the training algorithms for the model can handle.

Monitoring data involves more than just tracking superficial statistics. Data monitoring software needs to have access to information about what features of the data are important for the performance of the source models. Without this input, the data monitoring is more likely to produce lots of false alarms as innocuous features of the data change over time, drowning out information about what changes in the data might be of greater concern.

For example, stock trading systems depend on many different features of price movement patterns, and there is a great amount of variability in trading patterns that are just part of the real world of financial markets. However, some trading systems can be sensitive to the way price movements are distributed over the course of the trading day. Those patterns can be impacted by when derivative markets open and close, and how trading changes on those derivative markets. For instance, US stock markets can be impacted by trading on stock index futures markets and stock options markets. In the past, those markets have changed their operating hours, their liquidity profiles, and the kinds of derivative products offered on their exchanges. The changes in those derivative markets can have a significant residual impact on the trading of the underlying stocks, which could cause dramatic changes in the markets that retraining algorithms might not be able to adapt to quickly enough to avoid negative impacts on application performance. Trading system data scientists need data monitoring tools which will be sensitive to detecting these kinds of systematic changes in real-world data so that they can be alerted to the need to intervene in the automated retraining of the models. How to react to these changes in the real world can be more art than science, but science is needed to detect the need to react.

Data science is fundamentally about data. Data scientists depend on deep understanding of the data that goes into those models in order for them to be effective at building useful models for solving real-world problems. Tracking the behaviors of the model is useful and important, but waiting for data changes to filter into those models is irresponsible if those changes materially change the performance of applications in ways that harm the customers or users of those applications. Data monitoring solutions can help data scientists protect the users of their models from data changes by alerting them to changes in the real-world data that might have an impact on models before the impact is painfully felt.

More News & Insights

News & Insights

On the Record with Lizzy Kolar, Scope Zero Co-Founder & CEO

Differential goes on the record with Lizzy Kolar, the co-founder and CEO of Scope Zero. Scope Zero's mission is to reduce annual utility bills and fuel expenses by $300 billion, the environmental equivalent of removing 125M cars from the road.

Agnostiq

DataRobot Acquires Agnostiq

Agnostiq

AttackIQ Acquires DeepSurface

Insights, Portfolio

On the Record with Moshe Hecht Hatch.AI Founder & CEO

Insights, Portfolio

Differential goes on the record with Moshe Hecht, an award-winning philanthropic futurist and innovator, reshaping the world of giving through technology and data solutions. The founder and CEO of Hatch, he is a dedicated philanthropist and has been published in Forbes, Guidestar, and Nonprofit Pro.

Insights, Portfolio

Driving Sustainability through Employee Benefits with Scope Zero CEO Lizzy Kolar

Insights, Portfolio

All, On the Record, FlowFi

Driving Sustainability through Employee Benefits with Scope Zero CEO Lizzy Kolar

Insights, Portfolio

All, On the Record, FlowFi

The WorkplaceTech Spotlight host Hadeel Al-Tashi sits down with Lizzy Kolar, Co-Founder and CEO of Scope Zero to dive into how Scope Zero's Carbon Savings Account (CSA) empowers employees to make affordable home technology and transportation upgrades while aligning with corporate sustainability goals. They discuss how the CSA not only supports environmental and financial wellness for employees but also strengthens a company's commitment to sustainability. Don't miss this opportunity to learn how integrating green benefits can drive meaningful impact within your organization.

Insights, Portfolio

All, On the Record, FlowFi

Hatch. AI Closes a $3 Million Seed Round

News

Pienso

Hatch. AI Closes a $3 Million Seed Round

News

Pienso

Hatch AI, a groundbreaking intelligence platform for nonprofits, announced a $3 million raise in seed funding, led by Differential. Read the full press announcement at the link below.

News

Pienso

News

Pienso

Pienso: Putting AI into the hands of people with problems to solve

News

Pienso

MIT News: Alumni-founded Pienso has developed a user-friendly AI builder so domain experts can build solutions without writing any code.

News

Pienso

On the Record with Nate Cavanaugh, CoFounder & Co-CEO of FlowFi

Insights, Portfolio

All, On the Record, FlowFi

On the Record with Nate Cavanaugh, CoFounder & Co-CEO of FlowFi

Insights, Portfolio

All, On the Record, FlowFi

On the Record with Nate Cavanaugh, CoFounder & Co-CEO of FlowFi.

In 2021, Nate co-founded of FlowFi, a SaaS-enabled marketplace that connects startups and SMBs with finance experts. FlowFi has raised $10M from top VC firms including Blumberg Capital, Differential Ventures, Clocktower Ventures and Precursor Ventures, and generated 7-figures of annual recurring revenue in its first year.

Nate was nominated to the Forbes 30 Under 30 list for Enterprise Technology.

Insights, Portfolio

All, On the Record, FlowFi

News

FlowFi

FlowFi Closes on $9M in Seed Funding

News

FlowFi

TECHCRUNCH: FlowFi, a startup creating a marketplace of finance experts for entrepreneurs, closed on $9 million in seed funding.

Blumberg Capital led the investment and was joined by a group of investors including Parade Ventures, Differential Ventures, Precursor Ventures, Special Ventures, 14 Peaks Capital and Cooley LLP.

News

FlowFi

Insights, Interviews

Cyolo

Cyolo’s Almog Apirion on Nasdaq TradeTalks

Insights, Interviews

Cyolo

NASDAQ: Nasdaq TradeTalks: 2024 Cybersecurity Budget Outlook with Almog Apirion, Cyolo.

Insights, Interviews

Cyolo

News

Retrocausal

Retrocausal Raises $5.3M in Financing

News

Retrocausal

FINSMES: Retrocausal, a Seattle, WA-based platform provider for manufacturing process management, raised $5.3M in funding.

The round was led by Glasswing Ventures, One Way Ventures, and Indicator Ventures, with participation from existing investors Argon Ventures, Differential Ventures, Ascend Vietnam Ventures, Incubate Fund US, SaaS Ventures, Hypertherm Ventures, Stage Venture Partners, and Techstars.

News

Retrocausal

Nick Adams Discusses How To Get Your Generative AI Startup Funded

Insights, Interviews

Model Monitoring Part 3: Detecting Drift in Model Structure, Model Coefficients, and Real-World Data

More News & Insights

Startup Financial Projections: Rocket ship or Snake pit?

Model Monitoring Part 2: Model Explainability for Consumers and Users of Data-Driven Models

Contact Us

Learn more

Sign Up