Weka’s augmented-memory approach claims to extend DRAM-class memory to GPUs via the compute network, creati
Weka’s augmented-memory approach claims to extend DRAM-class memory to GPUs via the compute network, creating a larger network-accessible DRAM pool than local motherboard DRAM.
Repeated prefill to rebuild KV cache is a major source of inference waste and slowness, and an ideal is a single prefill followed by indefinite decode.
Disaggregated prefill-and-decode inference is mostly a 2025 production phenomenon despite earlier research papers.