As enterprises race to cut AI spend, the sharpest savings are coming from architecture choices - shorter prompts, smarter routing, caching, and selective local inference.