如何:执行文本到 XML 的流式转换

处理文本文件的一种方法是编写使用 yield return 构造一次流式处理一行文本文件的扩展方法。然后可以编写以迟缓延迟方式处理文本文件的 LINQ 查询。如果之后使用 XStreamingElement 对输出进行流式处理,则可以创建占用最少量内存的从文本文件到 XML 的转换,而不管源文本文件大小如何。

关于流式转换存在一些告诫。流式转换最适用于可以一次性处理整个文件并且可以按照源文档中的行顺序处理各行的情况。如果必须多次处理文件或者必须在处理行之前对行进行排序,则将失去使用流式技术所具有的许多好处。

示例

下面的文本文件 People.txt 是本示例的源文件。

#This is a comment
1,Tai,Yee,Writer
2,Nikolay,Grachev,Programmer
3,David,Wright,Inventor

下面的代码包含以延迟方式流式处理文本文件中各行的扩展方法。

提示

下面的示例使用 C# 的 yield return 构造。在 Visual Basic 中使用实现 IEnumerable(Of XElement) 接口的类来提供等效代码。有关在 Visual Basic 中实现 IEnumerable(Of T) 的示例,请参见演练:在 Visual Basic 中实现 IEnumerable(Of T)

public static class StreamReaderSequence
{
    public static IEnumerable<string> Lines(this StreamReader source)
    {
        String line;

        if (source == null)
            throw new ArgumentNullException("source");
        while ((line = source.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

class Program
{
    static void Main(string[] args)
    {
        StreamReader sr = new StreamReader("People.txt");
        XStreamingElement xmlTree = new XStreamingElement("Root",
            from line in sr.Lines()
            let items = line.Split(',')
            where !line.StartsWith("#")
            select new XElement("Person",
                       new XAttribute("ID", items[0]),
                       new XElement("First", items[1]),
                       new XElement("Last", items[2]),
                       new XElement("Occupation", items[3])
                   )
        );
        Console.WriteLine(xmlTree);
        sr.Close();
    }
}
Module Module1
    Sub Main()
        Dim sr = New IO.StreamReader("..\..\People.txt")
        Dim xmlTree = New XStreamingElement("Root",
            From line In sr.Lines()
            Let items = Split(line, ",")
            Where Not line.StartsWith("#")
            Select <Person ID=<%= items(0) %>>
                       <First><%= items(1) %></First>
                       <Last><%= items(2) %></Last>
                       <Occupation><%= items(3) %></Occupation>
                   </Person>
                   )

        Console.WriteLine(xmlTree)
        sr.Close()
    End Sub
End Module

Module StreamReaderSequence
    <System.Runtime.CompilerServices.Extension()>
    Public Function Lines(ByRef source As IO.StreamReader) As IEnumerable(Of String)
        If source Is Nothing Then Throw New ArgumentNullException("source")
        Return New StreamReaderEnumerable(source)
    End Function
End Module


Public Class StreamReaderEnumerable
    Implements IEnumerable(Of String)

    Private _source As IO.StreamReader

    Public Sub New(ByVal source As IO.StreamReader)
        _source = source
    End Sub

    Public Function GetEnumerator() As Generic.IEnumerator(Of String) Implements IEnumerable(Of String).GetEnumerator
        Return New StreamReaderEnumerator(_source)
    End Function

    Public Function GetEnumerator1() As IEnumerator Implements IEnumerable.GetEnumerator
        Return Me.GetEnumerator()
    End Function
End Class

Public Class StreamReaderEnumerator
    Implements IEnumerator(Of String)

    Private _current As String
    Private _source As IO.StreamReader

    Public Sub New(ByVal source As IO.StreamReader)
        _source = source
    End Sub


    Public ReadOnly Property Current As String Implements Generic.IEnumerator(Of String).Current
        Get
            Return _current
        End Get
    End Property

    Public ReadOnly Property Current1 As Object Implements IEnumerator.Current
        Get
            Return Me.Current
        End Get
    End Property

    Public Function MoveNext() As Boolean Implements IEnumerator.MoveNext
        _current = _source.ReadLine()
        Return If(_current IsNot Nothing, True, False)
    End Function

    Public Sub Reset() Implements IEnumerator.Reset
        _current = Nothing
        _source.DiscardBufferedData()
        _source.BaseStream.Seek(0, IO.SeekOrigin.Begin)
    End Sub


    Public Sub Dispose() Implements IDisposable.Dispose

    End Sub

End Class

此示例产生以下输出:

<Root>
  <Person ID="1">
    <First>Tai</First>
    <Last>Yee</Last>
    <Occupation>Writer</Occupation>
  </Person>
  <Person ID="2">
    <First>Nikolay</First>
    <Last>Grachev</Last>
    <Occupation>Programmer</Occupation>
  </Person>
  <Person ID="3">
    <First>David</First>
    <Last>Wright</Last>
    <Occupation>Inventor</Occupation>
  </Person>
</Root>

请参见

参考

XStreamingElement

概念

高级查询技术 (LINQ to XML)