Open source machine learning systems are highly vulnerable to security threats

MLflow identified as most vulnerable open source machine learning platform
Directory traversal flaws allow unauthorized access to files in Weave
ZenML Cloud Access Control Issues Allow Privilege Escalation Risks

A recent analysis of the security landscape of machine learning (ML) frameworks has revealed that ML software is subject to more security vulnerabilities than more mature categories such as DevOps or web servers.

The growing adoption of machine learning across industries highlights the critical need to protect machine learning systems, as vulnerabilities can lead to unauthorized access, data breaches, and compromised operations.

The JFrog report states that machine learning projects like MLflow have seen an increase in critical vulnerabilities. In recent months, JFrog has discovered 22 vulnerabilities in 15 open source machine learning projects. Among these vulnerabilities, two categories stand out: threats targeting server-side components and privilege escalation risks within ML frameworks.

Critical Vulnerabilities in ML Frameworks

The vulnerabilities identified by JFrog affect key components often used in ML workflows, which could allow attackers to exploit tools that ML professionals often rely on for their flexibility, to gain unauthorized access to confidential files or elevate privileges within ML environments.

One of the highlighted vulnerabilities involves Weave, a popular Weights & Biases (W&B) toolkit, which helps track and visualize ML model metrics. The WANDB Weave Directory Traversal vulnerability (CVE-2024-7340) allows low-privileged users to access arbitrary files throughout the file system.

This flaw arises due to improper input validation when handling file paths, potentially allowing attackers to view sensitive files that could include administrator API keys or other privileged information. Such a breach could lead to privilege escalation, giving attackers unauthorized access to resources and compromising the security of the entire machine learning process.

ZenML, an MLOps pipeline management tool, is also affected by a critical vulnerability that compromises its access control systems. This flaw allows attackers with least access privileges to elevate their permissions within ZenML Cloud, a managed implementation of ZenML, thereby accessing restricted information, including sensitive secrets or model files.

The access control issue in ZenML exposes the system to significant risks, as escalated privileges could allow an attacker to manipulate machine learning channels, alter model data, or access sensitive operational data, potentially impacting environments. of production that depend on these channels.

Another serious vulnerability, known as Deep Lake Command Injection (CVE-2024-6507), was found in the Deep Lake database, a data storage solution optimized for AI applications. This vulnerability allows attackers to execute arbitrary commands by leveraging the way Deep Lake handles imports of external data sets.

Due to improper command sanitization, an attacker could achieve remote code execution, compromising the security of both the database and connected applications.

A notable vulnerability was also found in Vanna AI, a tool designed for the generation and visualization of natural language SQL queries. Vanna.AI prompt injection (CVE-2024-5565) allows attackers to inject malicious code into SQL prompts, which are subsequently processed by the tool. This vulnerability, which could lead to remote code execution, allows malicious actors to target Vanna AI’s SQL-to-Chart Visualization functionality to manipulate visualizations, execute SQL injections, or exfiltrate data.

Mage.AI, an MLOps tool for managing data pipelines, has been found to have multiple vulnerabilities, including unauthorized shell access, arbitrary file leaks, and weak path traversal checks.

These issues allow attackers to gain control over data pipelines, expose sensitive configurations, or even execute malicious commands. The combination of these vulnerabilities presents a high risk of privilege escalation and data integrity violations, compromising the security and stability of ML channels.

By gaining administrator access to databases or machine learning logs, attackers can embed malicious code into models, resulting in backdoors that are activated when the model is loaded. This can compromise downstream processes as models are used by multiple CI/CD teams and channels. Attackers can also leak sensitive data or perform model poisoning attacks to degrade model performance or manipulate results.

JFrog’s findings highlight an operational gap in MLOps security. Many organizations lack strong integration of AI/ML security practices with broader cybersecurity strategies, creating potential blind spots. As machine learning and artificial intelligence continue to drive major advances in the industry, safeguarding the frameworks, data sets and models that drive these innovations becomes paramount.

Must Read

Leave a Comment Cancel Reply