Posted on Semiconductor Engineering: Click here to view original article
By: Greg Prewitt & Marc Jacobs
Advanced test has become one of the semiconductor industry’s most promising frontiers: adaptive binning, feed-forward models, and real-time analytics pulling signals from mountains of measurement data. But there is a problem hiding underneath all that ambition, and it is neither compute nor algorithm; it is data. More specifically, it is the unglamorous, foundational question of whether the data flowing across the fab-to-test chain is clean, complete, and correctly associated in the first place.
At the recent PDF Solutions Users Conference, we presented some of the solutions we offer to ensure the availability of accurate data across the test supply chain.
In this blog post, we identify where the real bottlenecks lie, what good data infrastructure looks like, and why the industry’s aspirations for machine learning are running ahead of the data plumbing required to support them.
1. How often does poor data correlation undermine adaptive test and binning?
The short answer: more often than engineers realize, and with consequences that are much harder to detect than a simple measurement error.
When a human analyst is doing exploratory analytics, they can navigate around imperfect metadata through intuition. They notice when data is disjointed, recognize the anomaly, and course correct. That tolerance disappears the moment automation is introduced. When test results from one operation feed into a model, or even when voltage thresholds and current indicators from a prior step are passed to a subsequent test operation, a computer cannot intuit its way around a broken association. If the metadata is not aligned, the downstream operation simply never receives the context it was designed to use.
A computer can only intuit so much, if the metadata is not aligned, it causes a breakdown. You’ve got a test operation that’s expecting some previous measurement or prediction, and it’s not there, because the association wasn’t present in the data.
This is not a niche edge case. Both feed-forward of extracted parameters and model-driven feature engineering are increasingly mainstream in the testing of complex advanced packages. The customers who are taking data quality most seriously have made it a formal key performance indicator (KPI), monitoring data standardization metrics and data health scores on a weekly basis. That level of vigilance is telling: it suggests that data correlation failures are common enough to warrant continuous, structured attention rather than reactive debugging.
2. What does good data infrastructure look like?
Where do manufacturers fall short? Most commonly, in their metadata. Getting all the correct identity information still seems to be something people struggle with. When test data flows back into design or process decisions, a missing or mismatched identifier is not merely an inconvenience; it severs the linkage that makes feed-forward and adaptive test possible.
Good data infrastructure has two foundational requirements that most manufacturers either underinvest in or implement incompletely.
The first is collecting data at the tool. Relying on an Outsourced Semiconductor Assembly and Test (OSAT) to bundle and deliver data after the fact introduces opportunities for omission that compromise the completeness of what the semiconductor company receives. Conversely, direct collection at the point of measurement gives the highest data quality and the most complete picture.
The second requirement is comparison against a system of record, typically a manufacturing execution system (MES) or enterprise resource planning (ERP) system that knows which lots were released for build and packaging, which devices should be grouped together, and what the expected lot structure looks like. Cross-referencing incoming data against that ground truth serves two purposes: it enables data augmentation and correction when something is missing or wrong, and it provides a benchmark against which you can continuously assess data quality. This is better than relying on manual entry or OSAT-supplied fields that may be inconsistent or incomplete.
3. Distinguishing process escapes from measurement and interconnect variability
This is the domain we call “test process control,” and it involves two-stage logic.
The first question to answer is whether any excursion in the measurements is a measurement problem: bad contacts, a worn probe card, leakage in the test setup, or some other instrumentation artifact that would produce a false alarm or invalidate certain results entirely. These setup-driven anomalies must be ruled out before any process conclusion can be drawn.
Once it is established that the setup is sound, the second step is to monitor a small set of key parameters with the greatest diagnostic impact. Not all of them, since that is impractical, but the ones that are most diagnostic correlated. One of PDF Solutions’ customers, for example, places particular weight on die temperature under measurement, treating it as a validity check before interpreting any electrical results. Automated rules can watch these sentinel parameters and trigger alerts when an excursion occurs.
Can analytics automatically determine whether an escape originated in the process or in the test chain? The system can tell that something is wrong, and it can surface indicators in aggregate (trends by day, by week, by lot), but narrowing down root cause automatically is constrained by the data available. In distributed manufacturing, certain data types are rarely accessible across the full supply chain, which limits how far automated attribution can go. The system surfaces candidates, but the engineer still has to close the loop.
In essence, the value of analytics in this context is not to produce a definitive verdict, but to generate a ranked list of plausible root causes. A trained engineer can then quickly dismiss the physically nonsensical suggestions and focus investigation on what remains. If the machine comes up with five possibilities of which three bear further investigation and two are complete nonsense, that engineer ditches the two without spending a lot of time getting there.
4. Where is machine learning being applied effectively, and where is it still aspirational?
After years of broad enthusiasm and uncertain application, the industry has arrived at what is described as a “de facto acceptance” around one specific use case: using models to generate data for feed-forward between test steps.
Today, the mechanism is still mostly asynchronous, a model runs between steps, engineers features, makes predictions, and feeds those results forward to inform the next operation. The payoff is twofold: improved test efficiency and reduced quality risk. This is real, it is in production at multiple customers, and it is the clearest demonstrated value for machine learning (ML) in the test data chain today.
What is not yet ready for broad deployment is synchronous, real-time model inference running in-line with the test operation itself. Some vendors can support this technically, but most manufacturers cannot yet justify the investment or sustain the confidence level required to act on model outputs in real time without human review.
The confidence question matters enormously. When test data flows back into design or process decisions, a missing or mismatched identifier is not merely an inconvenience; it severs the linkage that makes feed-forward and adaptive test possible. A large language model that advertises 90% accuracy is not operating at the precision level that semiconductor manufacturing demands. A 10% error rate, which might be acceptable in consumer applications, is potentially catastrophic in a high-stakes yield and quality context. The result is that human oversight remains essential, not because the models are bad, but because the cost of acting on a bad model output is too high to accept without a check.
Looking further out, we see a logical endpoint: models that watch models. When a tier-one model’s predictions begin to drift from actual test results, a monitoring model can detect the divergence and flag it before it propagates into downstream decisions. So far, this is not yet commonly done at scale, but the logic is sound because the volume of data generated is already beyond what human reviewers can practically inspect.
PDF Solutions has also developed a product line aimed at bridging the gap: using models to make preliminary judgments across all collected data, surfacing the subset that warrants human attention. The goal is to make the analyst’s review tractable without requiring them to sift through everything manually.
5. What is the single biggest data quality problem manufacturers could actually fix?
We think there are two answers to this question.
The first is metadata consistency, specifically, the consistent labeling of tests and measurements across products and facilities. This problem becomes acute when companies try to train models that generalize across product families. Even within a single organization, tests that measure the same underlying electrical characteristic may carry different names across product lines, making it impossible for a model to recognize the correspondence. A group of experienced engineers around a table can usually work it out; a machine cannot, at least not yet. The implication is that investment in naming standards and test data models is not a clerical task; it is a prerequisite for scalable ML.
The second, and perhaps more structural, source of trouble is mergers and acquisitions. When two companies combine, they almost inevitably bring incompatible data standards, different MES systems, different lot naming conventions, and different test labeling schemes into a shared organization. Rationalizing those differences is technically demanding, politically complicated, and rarely treated as a high priority during integration. Yet it sits squarely in the critical path for any intelligent test initiative that depends on cross-facility data.
There is also the temporal dimension of the problem: there is both a future problem (standardizing going forward) and a past problem (what to do with legacy products that were built under the old system and may remain in production for another decade). Both halves must be addressed to unlock the value of unified analytics.
Chiplets add another layer of complexity. When integrated devices from multiple companies or multiple internal teams are assembled into a single package, the traceability challenge becomes acute. If a package fails, tracing the failure back through multiple chiplets, each potentially with its own ID scheme, its own test history, and its own data format, requires infrastructure that the industry is still building. Traceability within a single company’s chiplet ecosystem is hard enough; across company boundaries, it may require a neutral data intermediary that can analyze combined data without exposing proprietary process information to competitors. Early efforts in this direction exist, but the problem is not yet solved.
The underlying theme
Across all these topics, a consistent theme emerges: the value of intelligent test is capped by the quality and structure of the data feeding it. Models, analytics, and adaptive algorithms are only as reliable as the measurements and metadata they consume.
The practical implication for manufacturers is that the highest-leverage investments right now are not in more sophisticated algorithms; they are in the infrastructure underneath them: direct data collection at the tool, comparison against systems of record, consistent metadata standards, and the traceability linkages that allow test history to follow a device through every step of its manufacturing journey.
From our point of view, the greatest value an analytic provider can offer is the ability to collect, align, and normalize data, and then deploy models wherever they are needed. The model itself matters, but the data engineering platform underneath it matters more.
The industry has gotten past the compute bottleneck. The next hill to climb is the data quality bottleneck, and unlike compute, it cannot be solved by buying better hardware.