Skip to content

Conversation

@stephentoub
Copy link
Member

@stephentoub stephentoub commented Sep 22, 2021

Three bugs being fixed here:

  1. The SlideCursorToStartOfBuffer method is incorrectly assuming that the buffer is filled to its end (that m_Buffer.Length = m_CharsRead). As a result, two things happen. First, we copy more data than is needed to the temporary buffer; that's just unnecessary but not harmful. But second, when it issues a read to fill the remainder of the buffer, it does so at the wrong position, both leaving zeros in the buffer that end up getting parsed as real data and losing real data from the end.
  2. IncreaseBufferSize assumes m_CharsRead = m_Buffer.Length, which may not be the case. As with (1), this can result in it reading into the wrong part of the array and leaving garbage that gets processed.
  3. IncreaseBufferSize is always increasing the buffer size, even if only a small portion of the buffer is used. This can then result in the whole operation failing when it repeatedly increases the buffer size and ends up failing due to ticking over the max buffer threshold.

(1) doesn't exist in .NET Framework 4.8 but (2) and (3) do. (1) is easily triggered by a stream that sporadically doesn't produce all the data requested; (2) and (3) end up needing a stream that frequently produces much less than requested. This PR fixes all three.

Without the fixes included, here is how the tests being added here fail on main:

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Large(fieldsInQuotes: False) [FAIL]
        Assert.Equal() Failure
        Expected: String[] ["sssssss", "s", "s", "s"]
        Actual:   String[] ["\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...]
        Stack Trace:
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(431,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(383,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Large(Boolean fieldsInQuotes)

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Large(fieldsInQuotes: True) [FAIL]
        Microsoft.VisualBasic.FileIO.MalformedLineException : Line 535 cannot be parsed using the current Delimiters.
        Stack Trace:
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\src\Microsoft\VisualBasic\FileIO\TextFieldParser.vb(917,0): at Microsoft.VisualBasic.FileIO.TextFieldParser.ParseDelimitedLine()
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\src\Microsoft\VisualBasic\FileIO\TextFieldParser.vb(362,0): at Microsoft.VisualBasic.FileIO.TextFieldParser.ReadFields()
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(430,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(383,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Large(Boolean fieldsInQuotes)

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(fieldsInQuotes: False) [FAIL]
        Assert.Equal() Failure
        Expected: String[] ["s", "ss", "sssssss", "sssss"]
        Actual:   String[] ["\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...]
        Stack Trace:
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(431,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(389,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(Boolean fieldsInQuotes)
        Assert.Equal() Failure

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(fieldsInQuotes: True) [FAIL]
        Expected: String[] ["ssssss", "s", "s", "sssss"]
        Actual:   String[] ["ssssss\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., "s", "s", "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...]
        Stack Trace:
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(431,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(389,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(Boolean fieldsInQuotes)

and on .NET Framework 4.8:

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(fieldsInQuotes: False) [FAIL]
        System.InvalidOperationException : TextFieldParser is unable to complete the read operation because maximum buffer size has been exceeded.
        Stack Trace:
             at Microsoft.VisualBasic.FileIO.TextFieldParser.IncreaseBufferSize()
             at Microsoft.VisualBasic.FileIO.TextFieldParser.ReadNextLine(Int32& Cursor, ChangeBufferFunction ChangeBuffer)
             at Microsoft.VisualBasic.FileIO.TextFieldParser.PeekNextDataLine()
             at Microsoft.VisualBasic.FileIO.TextFieldParser.get_EndOfData()
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(428,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(389,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(Boolean fieldsInQuotes)

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(fieldsInQuotes: True) [FAIL]
        Assert.Equal() Failure
        Expected: String[] ["ssssss", "s", "s", "sssss"]
        Actual:   String[] ["ssssss\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., "s", "s", "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...]
        Stack Trace:
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(431,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(389,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(Boolean fieldsInQuotes)

Fixes #58857
Fixes #59371

Three bugs being fixed here:
1. The SlideCursorToStartOfBuffer method is incorrectly assuming that the buffer is filled to its end (that m_Buffer.Length = m_CharsRead).  As a result, two things happen.  First, we copy more data than is needed to the temporary buffer; that's just unnecessary but not harmful.  But second, when it issues a read to fill the remainder of the buffer, it does so at the wrong position, both leaving zeros in the buffer that end up getting parsed as real data and losing real data from the end.
2. IncreaseBufferSize assumes m_CharsRead = m_Buffer.Length, which may not be the case.  As with (1), this can result in it reading into the wrong part of the array and leaving garbage that gets processed.
3. IncreaseBufferSize is always increasing the buffer size, even if only a small portion of the buffer is used.  This can then result in the whole operation failing when it repeatedly increases the buffer size and ends up failing due to ticking over the max buffer threshold.

(1) doesn't exist in .NET Framework 4.8 but (2) and (3) do.  (1) is easily triggered by a stream that sporadically doesn't produce all the data requested; (2) and (3) end up needing a stream that frequently produces much less than requested.  This PR fixes all three.
@ghost
Copy link

ghost commented Sep 22, 2021

Tagging subscribers to this area: @cston
See info in area-owners.md if you want to be subscribed.

Issue Details

Three bugs being fixed here:

  1. The SlideCursorToStartOfBuffer method is incorrectly assuming that the buffer is filled to its end (that m_Buffer.Length = m_CharsRead). As a result, two things happen. First, we copy more data than is needed to the temporary buffer; that's just unnecessary but not harmful. But second, when it issues a read to fill the remainder of the buffer, it does so at the wrong position, both leaving zeros in the buffer that end up getting parsed as real data and losing real data from the end.
  2. IncreaseBufferSize assumes m_CharsRead = m_Buffer.Length, which may not be the case. As with (1), this can result in it reading into the wrong part of the array and leaving garbage that gets processed.
  3. IncreaseBufferSize is always increasing the buffer size, even if only a small portion of the buffer is used. This can then result in the whole operation failing when it repeatedly increases the buffer size and ends up failing due to ticking over the max buffer threshold.

(1) doesn't exist in .NET Framework 4.8 but (2) and (3) do. (1) is easily triggered by a stream that sporadically doesn't produce all the data requested; (2) and (3) end up needing a stream that frequently produces much less than requested. This PR fixes all three.

Without the fixes included, here is how the tests being added here fail on main:

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Large(fieldsInQuotes: False) [FAIL]
        Assert.Equal() Failure
        Expected: String[] ["sssssss", "s", "s", "s"]
        Actual:   String[] ["\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...]
        Stack Trace:
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(431,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(383,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Large(Boolean fieldsInQuotes)

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Large(fieldsInQuotes: True) [FAIL]
        Microsoft.VisualBasic.FileIO.MalformedLineException : Line 535 cannot be parsed using the current Delimiters.
        Stack Trace:
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\src\Microsoft\VisualBasic\FileIO\TextFieldParser.vb(917,0): at Microsoft.VisualBasic.FileIO.TextFieldParser.ParseDelimitedLine()
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\src\Microsoft\VisualBasic\FileIO\TextFieldParser.vb(362,0): at Microsoft.VisualBasic.FileIO.TextFieldParser.ReadFields()
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(430,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(383,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Large(Boolean fieldsInQuotes)

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(fieldsInQuotes: False) [FAIL]
        Assert.Equal() Failure
        Expected: String[] ["s", "ss", "sssssss", "sssss"]
        Actual:   String[] ["\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...]
        Stack Trace:
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(431,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(389,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(Boolean fieldsInQuotes)
        Assert.Equal() Failure

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(fieldsInQuotes: True) [FAIL]
        Expected: String[] ["ssssss", "s", "s", "sssss"]
        Actual:   String[] ["ssssss\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., "s", "s", "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...]
        Stack Trace:
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(431,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(389,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(Boolean fieldsInQuotes)

and on .NET Framework 4.8:

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(fieldsInQuotes: False) [FAIL]
        System.InvalidOperationException : TextFieldParser is unable to complete the read operation because maximum buffer size has been exceeded.
        Stack Trace:
             at Microsoft.VisualBasic.FileIO.TextFieldParser.IncreaseBufferSize()
             at Microsoft.VisualBasic.FileIO.TextFieldParser.ReadNextLine(Int32& Cursor, ChangeBufferFunction ChangeBuffer)
             at Microsoft.VisualBasic.FileIO.TextFieldParser.PeekNextDataLine()
             at Microsoft.VisualBasic.FileIO.TextFieldParser.get_EndOfData()
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(428,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(389,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(Boolean fieldsInQuotes)

      Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(fieldsInQuotes: True) [FAIL]
        Assert.Equal() Failure
        Expected: String[] ["ssssss", "s", "s", "sssss"]
        Actual:   String[] ["ssssss\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., "s", "s", "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...]
        Stack Trace:
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(431,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream(Boolean fieldsInQuotes, Int32 maxCount)
          D:\repos\runtime\src\libraries\Microsoft.VisualBasic.Core\tests\Microsoft\VisualBasic\FileIO\TextFieldParserTests.cs(389,0): at Microsoft.VisualBasic.FileIO.Tests.TextFieldParserTests.ReadFields_PartialReadsFromStream_Small(Boolean fieldsInQuotes)
Author: stephentoub
Assignees: -
Labels:

area-Microsoft.VisualBasic

Milestone: 7.0.0

@stephentoub stephentoub merged commit a65f836 into dotnet:main Sep 22, 2021
@stephentoub stephentoub deleted the fixtextparser branch September 22, 2021 23:57
@stephentoub
Copy link
Member Author

/backport to release/6.0-rc2

@github-actions
Copy link
Contributor

Started backporting to release/6.0-rc2: https://github.com/dotnet/runtime/actions/runs/1263921843

@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

2 participants