Skip to content

Commit

Permalink
Fix a bug that a large XML can't be parsed (#154)
Browse files Browse the repository at this point in the history
GitHub: fix GH-150

If a parsed XML is later than `2 ** 31 - 1`, we can't parse it. Because
`StringScanner`s position is stored as `int`. We can avoid the
restriction by dropping large parsed content.

Co-authored-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
naitoh and kou committed Jun 22, 2024
1 parent f704011 commit 4c28808
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 0 deletions.
2 changes: 2 additions & 0 deletions lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,8 @@ def peek depth=0

# Returns the next event. This is a +PullEvent+ object.
def pull
@source.drop_parsed_content

pull_event.tap do |event|
@listeners.each do |listener|
listener.receive event
Expand Down
7 changes: 7 additions & 0 deletions lib/rexml/source.rb
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ class Source
attr_reader :encoding

module Private
SCANNER_RESET_SIZE = 100000
PRE_DEFINED_TERM_PATTERNS = {}
pre_defined_terms = ["'", '"', "<"]
pre_defined_terms.each do |term|
Expand Down Expand Up @@ -84,6 +85,12 @@ def buffer
@scanner.rest
end

def drop_parsed_content
if @scanner.pos > Private::SCANNER_RESET_SIZE
@scanner.string = @scanner.rest
end
end

def buffer_encoding=(encoding)
@scanner.string.force_encoding(encoding)
end
Expand Down
27 changes: 27 additions & 0 deletions test/parser/test_base_parser.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# frozen_string_literal: false

require 'rexml/parsers/baseparser'

module REXMLTests
class BaseParserTester < Test::Unit::TestCase
def test_large_xml
large_text = "a" * 100_000
xml = <<-XML
<?xml version="1.0"?>
<root>
<child>#{large_text}</child>
<child>#{large_text}</child>
</root>
XML

parser = REXML::Parsers::BaseParser.new(xml)
while parser.has_next?
parser.pull
end

assert do
parser.position < xml.bytesize
end
end
end
end

0 comments on commit 4c28808

Please sign in to comment.