Model theft is the unauthorized extraction or replication of an AI model’s behavior, weights, or capabilities. An attacker may query a model repeatedly to infer how it responds, copy its outputs into a substitute system, or steal the underlying parameters if they gain access to the model files or deployment environment. In practice, this can target proprietary machine-learning systems, including classifiers, recommendation engines, and large language models.
It matters because a stolen model can expose intellectual property, weaken a company’s competitive advantage, and create a cloned service that behaves like the original. In cyber security, defenders reduce this risk with access controls, rate limiting, monitoring for suspicious query patterns, secret protection, watermarking, and strict governance over model storage and APIs. Model theft is often discussed alongside prompt injection and data poisoning because all three attack the trust and value of AI systems, but model theft focuses on stealing the model itself rather than manipulating its inputs or training data.



