Analog NPU

Mythic M1076 Analog Matrix Processor — 19×15.5mm BGA package, front and back

Mythic M1076 at a Glance

26 TOPS AI performance (INT8)
3–4W power consumption
19 × 15.5 mm BGA package (~3 cm²)
76 AMP tiles, up to 80M weight parameters
No external DRAM — weights stored on-chip in flash
PCIe Gen2 x4 interface

The Problem: AI Needs a GPU, But GPUs Don't Fit

Deep learning requires massive parallel computation — the kind of work that GPUs excel at. But conventional GPUs are designed for data centers or PCs:

Too expensive to embed in a banknote counting machine
Too power-hungry — a typical AI-capable GPU draws 75–300 W
Too large — requires active cooling and substantial board space

Digital AI accelerators for edge devices do exist, but they face a fundamental bottleneck: memory bandwidth. Neural network inference requires billions of multiply-accumulate operations, and in a traditional digital architecture, model weights must be constantly shuttled between external memory (DRAM) and the compute unit. This data movement consumes most of the power and limits throughput.

The Solution: Analog Compute-in-Memory

Mythic's M1076 Analog Matrix Processor takes a fundamentally different approach: it stores neural network weights directly in flash memory cells and computes using analog signals. Instead of moving data to the processor, computation happens where the data already resides.

Digital computing architecture — processor and memory separated by a bus, causing energy and latency bottleneck

Traditional: data moves between processor and memory

Mythic analog computing — compute-in-memory architecture, 18,000x faster, 1,500x more energy-efficient

Mythic: computation happens where data resides

How it works:

Model weights are stored as analog values in flash memory cells (76 AMP tiles, up to 80M parameters)
Input signals are converted to analog voltages
Matrix multiplication happens through the physics of the circuit itself — Ohm's law (V=IR) and Kirchhoff's current law perform the math
Results are converted back to digital for the next layer

This eliminates the memory bottleneck entirely.

Analog Trade-off: Noise

The inherent challenge of analog computing is noise — analog signals are subject to thermal noise, process variation, and other imprecisions that digital circuits avoid by definition. This is why the industry moved away from analog computing decades ago.

However, neural networks are inherently noise-tolerant. Unlike traditional algorithms where a single bit error causes failure, neural networks are designed to generalize from noisy data. Mythic exploits this property: the small inaccuracies introduced by analog computation fall within the tolerance of the neural network, so the final classification result remains accurate.

Performance Comparison

To understand where the M1076 fits, consider a comparison with the NVIDIA Jetson Xavier NX — a purpose-built edge AI module designed for embedded applications:

	NVIDIA Jetson Xavier NX	Mythic M1076
AI Performance (INT8)	21 TOPS	26 TOPS
Power consumption	10–15W	3–4W
TOPS / Watt	1.4–2.1	~7
Memory	8 GB LPDDR4x (shared, external)	On-chip (no external DRAM)
Price	~$400 (module only)	Comparable to digital NPU
Module size	69.6 × 45 mm (SO-DIMM)	19 × 15.5 mm (BGA)
Board area	~31 cm² (+ carrier board required)	~3 cm² (mounts directly on PCB)
Cooling	Heatsink + fan recommended	No fan needed
Additional HW needed	Carrier board, DRAM, storage, PSU	None (single chip)
Suitable for banknote machine	Difficult	Yes

Mythic MM1076 M.2 M key card — 22×80mm form factor

The M1076 is also available as an M.2 card (22 × 80 mm), making it easy to integrate into existing systems via standard PCIe slots — the same interface used by SSDs in laptops.

Both deliver comparable raw AI performance (21 vs. 26 TOPS). But the decisive factors are price and size — the two things that determine whether AI can actually fit inside a banknote counting machine:

Price — The Xavier NX module alone costs ~$400, and still requires a carrier board, external storage, and a dedicated power supply. The total system cost would exceed the selling price of many banknote counting machines. The M1076 costs a fraction of that.
Size — The Xavier NX module measures 69.6 × 45 mm and needs a separate carrier board (~100 × 80 mm). The M1076 is just 19 × 15.5 mm and mounts directly on the existing PCB — no additional board required.
System simplicity — The Xavier NX is essentially a small computer with its own ecosystem (OS, DRAM, storage). The M1076 is a single chip that connects via PCIe to the existing Zynq 1EG — it integrates seamlessly into the machine's existing architecture.

In short, the Xavier NX is a powerful AI module, but it is too expensive and too bulky for a compact banknote counting machine. The M1076 delivers more AI performance (26 vs. 21 TOPS) at a fraction of the price and size — making embedded AI practical for the first time in this industry.

Why This Matters for Banknote Processing

Before the M1076, there was no practical way to run a deep neural network inside a banknote counting machine. High-end edge AI modules like NVIDIA's Jetson cost more than the machine itself and are far too large to fit inside the enclosure. The M1076 changes this equation entirely.

Removing the Mechanical Tape Detector

The M1076's affordable price and tiny footprint enabled a breakthrough in machine design: we replaced the bulky, expensive mechanical tape detector with the new Advanced CIS + NPU + Zynq 1EG platform. The mechanical tape detector — a spring-based sensor assembly — occupied significant space inside the machine and added considerable cost. By removing it, the freed space and budget were redirected to the AI platform.

The result: a machine that is not only better at tape detection (targeting near 100% vs. 64.58% with the mechanical sensor), but also gains full fitness sorting, advanced counterfeit detection, and OCR — all from the same AI platform, in a single forward pass.

Full Fitness in a Value Counter Form Factor

This is perhaps the most significant implication: the compact size and reasonable cost of the M1076 make it possible to embed full fitness sorting — previously available only in large, expensive sorters — into a value counter-sized machine. A machine that was once a simple counter can now classify every banknote by fitness, denomination, authenticity, and tape — without growing in size or moving to a higher price category.

Want to understand why image-based banknote analysis is so challenging — and how deep learning overcomes the limitations of the old approach? Read more in Deep Learning →

Reference Videos

For a deeper understanding of Mythic's analog compute-in-memory technology:

AI Platform

Overview Hardware Analog NPU ✦ Advanced CIS Deep Learning Performance