Rosa Del Mar

Issue 1 2025-07-07

Rosa Del Mar

Daily Brief

Issue 1 2025-07-07

Weka’s augmented-memory approach claims to extend DRAM-class memory to GPUs via the compute network, creati

  • Weka’s augmented-memory approach claims to extend DRAM-class memory to GPUs via the compute network, creating a larger network-accessible DRAM pool than local motherboard DRAM.
  • Repeated prefill to rebuild KV cache is a major source of inference waste and slowness, and an ideal is a single prefill followed by indefinite decode.
  • Disaggregated prefill-and-decode inference is mostly a 2025 production phenomenon despite earlier research papers.