A vulnerability in the Python programming language that has been overlooked for 15 years is now in the limelight again as it affects more than 350,000 open-source repositories and can cause code execution.
Revealed in 2007 and tagged as CVE-2007-4559, the security issue never received a patch, the only mitigation provided in the form of a documentation update warning developers about the risk.
unpublished since 2007
the vulnerability is in python wirefile package, in code that uses un-sanitized tarfile.extract() The built-in defaults of the function or tarfile.extractall(). This is a path traversal bug that enables an attacker to overwrite arbitrary files.
Technical details for CVE-2007-4559 are available since initial report good in August 2007. Although there are no reports about the attack taking advantage of the bug, it represents a risk in the software supply chain.
Earlier this year, while investigating another security issue, CVE-2007-4559 was rediscovered by a researcher at Trelix, a new business that provides Extended Detection and Response (XDR) solutions, That Merger of McAfee Enterprise and FireEye.
The flaw stems from the fact that the code squeeze function in Python’s wirefile The module explicitly relies on the information in the TarInfo object “and binds to the path that is passed to the Extract function and the name in the TarInfo object”
less than a week after the disclosure, a Message on Python Bug Tracker announced that the issue had been closed, updating the document with a warning that “extracting archives from untrusted sources can be dangerous.”
Estimated 350,000 Projects Affected
Analyzing the impact, the Trelix researchers found that the vulnerability was present in thousands of software projects, both open and closed source.
The researchers scraped a set of 257 repositories most likely to contain vulnerable code and manually checked 175 of them to see if they had been affected. This showed that 61% of them were vulnerable.
Running an automated check on the remaining repositories increased the number of affected projects to 65%, indicating a wider problem.
However, the small sample set only serves as a baseline to come up with an estimate of all the affected repositories available on GitHub.
Using a manually verified 61% vulnerability rate, Trelix estimates there are over 350,000 vulnerable repositories, many of them used by machine learning tools (such as GitHub Copilot) that help developers complete a project faster. help to do.
Such automated tools rely on code from hundreds of thousands of repositories to provide an “auto-complete” option. If they provide unsafe code, the problem spreads to other projects without the developer knowing.
Looking further into the problem, Trelix found that open-source code is vulnerable to CVE-2007-4559 “a large number of industry spans.”
As expected, the most affected is the development sector, followed by web and machine learning technology.
one in technical blog post Today, Trelix vulnerability researcher Casimir Schulz, who rediscovered the bug, described simple steps to exploit CVE-2007-4559 in the Windows version of Spyder IDE, an open-source cross-platform integrated development environment for scientific programming. described.
The researchers showed that the vulnerability could be exploited on Linux as well. They managed to achieve file write and code execution in a test on the Polemarch IT Infrastructure Management Service.
In addition to drawing attention to the vulnerability and the risk it poses, Trelix also created patches for more than 11,000 projects. The fixes will be available in a fork of the affected repositories. Later, they will be added to the main project via pull requests.
Due to the large number of affected repositories, the researchers expect more than 70,000 projects to be fixed in the next few weeks. Achieving 100% score is a tough challenge, however, as merge requests also need to be accepted by the maintainers.
BleepingComputer has contacted the Python Software Foundation for a comment regarding CVE-2007-4559, but has not received a reply at publication time.