On-device AI agents hit a hard... Note
VentureBeat

On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.

On-device AI models were limited by DRAM capacity, restricting their size and capability. Apple's new AFM 3 foundation models address this by storing model weights in NAND flash memory instead of DRAM. The AFM 3 family includes both on-device and server-based models, developed in collaboration with Google and operating within Apple's Private Cloud Compute. The on-device AFM 3 Core Advanced is a 20-billion-parameter model that utilizes a novel architecture to overcome slow NAND-to-DRAM bandwidth. Instead of processing every token, it makes routing decisions once per prompt. This allows it to load specific "experts" from flash into DRAM for a given task. The number of active parameters can scale from 1 billion to 4 billion based on the complexity of the request. While Apple's technical report details the memory design, crucial information regarding energy, thermal constraints, and transparent offloading to the cloud is missing. This gap poses compliance challenges for regulated enterprises needing to document inference locations. The introduction of AFM 3 Core Advanced offers enterprises a significantly more capable on-device AI option. However, its large-scale deployability hinges on further details anticipated in a forthcoming technical report. The choice between on-device and cloud-based inference now becomes a more nuanced architectural decision for businesses.