方法: LINQ を使用して文字列に対しクエリを実行する

[アーティクル]
05/02/2024

文字列は文字のシーケンスとして保存されます。文字のシーケンスとして、LINQ を使用してクエリを実行できます。この記事には、さまざまな文字または単語の文字列に対してクエリを実行したり、文字列をフィルター処理したり、クエリを正規表現と組み合わせたりするクエリの例がいくつか含まれます。

文字列内の文字を照会する方法

次の例では、文字列を対象にクエリを実行して、その文字列に含まれる数字の数を特定します。

string aString = "ABCDE99F-J74-12-89A";

// Select only those characters that are numbers
var stringQuery = from ch in aString
                  where Char.IsDigit(ch)
                  select ch;

// Execute the query
foreach (char c in stringQuery)
    Console.Write(c + " ");

// Call the Count method on the existing query.
int count = stringQuery.Count();
Console.WriteLine($"Count = {count}");

// Select all characters before the first '-'
var stringQuery2 = aString.TakeWhile(c => c != '-');

// Execute the second query
foreach (char c in stringQuery2)
    Console.Write(c);
/* Output:
  Output: 9 9 7 4 1 2 8 9
  Count = 8
  ABCDE99F
*/

上記のクエリは、文字列を一連の文字として扱う方法を示しています。

文字列での単語の出現回数をカウントする方法

この例では、LINQ クエリを使用して、指定された単語が文字列内に出現する回数をカウントする方法を示します。このカウントを実行するには、まず Split メソッドを呼び出して単語の配列を作成します。 Split メソッドを呼び出すと、パフォーマンスが低下します。文字列に対する操作が単語のカウントのみである場合は、Matches または IndexOf メソッドの使用を検討してください。

string text = """
    Historically, the world of data and the world of objects 
    have not been well integrated. Programmers work in C# or Visual Basic 
    and also in SQL or XQuery. On the one side are concepts such as classes, 
    objects, fields, inheritance, and .NET APIs. On the other side 
    are tables, columns, rows, nodes, and separate languages for dealing with 
    them. Data types often require translation between the two worlds; there are 
    different standard functions. Because the object world has no notion of query, a 
    query can only be represented as a string without compile-time type checking or 
    IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to 
    objects in memory is often tedious and error-prone. 
    """;

string searchTerm = "data";

//Convert the string into an array of words
char[] separators = ['.', '?', '!', ' ', ';', ':', ','];
string[] source = text.Split(separators, StringSplitOptions.RemoveEmptyEntries);

// Create the query.  Use the InvariantCultureIgnoreCase comparison to match "data" and "Data"
var matchQuery = from word in source
                 where word.Equals(searchTerm, StringComparison.InvariantCultureIgnoreCase)
                 select word;

// Count the matches, which executes the query.
int wordCount = matchQuery.Count();
Console.WriteLine($"""{wordCount} occurrences(s) of the search term "{searchTerm}" were found.""");
/* Output:
   3 occurrences(s) of the search term "data" were found.
*/

上記のクエリは、文字列を単語のシーケンスに分割した後に、文字列を単語のシーケンスとして表示する方法を示しています。

任意の単語またはフィールドでテキストデータを並べ替えまたはフィルター処理する方法

次の例では、コンマ区切り値などの構造化されたテキストの行を、行の任意のフィールドで並べ替える方法を示します。フィールドは、実行時に動的に指定できます。 scores.csv 内のフィールドは、学生の ID 番号と、それに続いて 4 回のテストの点数を表しているものとします。

111, 97, 92, 81, 60
112, 75, 84, 91, 39
113, 88, 94, 65, 91
114, 97, 89, 85, 82
115, 35, 72, 91, 70
116, 99, 86, 90, 94
117, 93, 92, 80, 87
118, 92, 90, 83, 78
119, 68, 79, 88, 92
120, 99, 82, 81, 79
121, 96, 85, 91, 60
122, 94, 92, 91, 91

次のクエリは、2 番目の列に保存されている最初の試験の点数に基づいて行を並べ替えます。

// Create an IEnumerable data source
string[] scores = File.ReadAllLines("scores.csv");

// Change this to any value from 0 to 4.
int sortField = 1;

Console.WriteLine($"Sorted highest to lowest by field [{sortField}]:");

// Split the string and sort on field[num]
var scoreQuery = from line in scores
                 let fields = line.Split(',')
                 orderby fields[sortField] descending
                 select line;

foreach (string str in scoreQuery)
{
    Console.WriteLine(str);
}
/* Output (if sortField == 1):
   Sorted highest to lowest by field [1]:
    116, 99, 86, 90, 94
    120, 99, 82, 81, 79
    111, 97, 92, 81, 60
    114, 97, 89, 85, 82
    121, 96, 85, 91, 60
    122, 94, 92, 91, 91
    117, 93, 92, 80, 87
    118, 92, 90, 83, 78
    113, 88, 94, 65, 91
    112, 75, 84, 91, 39
    119, 68, 79, 88, 92
    115, 35, 72, 91, 70
 */

上記のクエリは、文字列をフィールドに分割し、個々のフィールドにクエリを実行することで、文字列を操作する方法を示しています。

特定の単語の文を照会する方法

次の例は、指定された一連の単語のそれぞれと一致する文言を含む文をテキストファイルから検索する方法を示しています。検索語句の配列をハードコーディングしていますが、実行時に動的に設定することもできます。この例のクエリを実行すると、"Historically"、"data"、"integrated" という単語を含む文が返されます。

string text = """
Historically, the world of data and the world of objects 
have not been well integrated. Programmers work in C# or Visual Basic 
and also in SQL or XQuery. On the one side are concepts such as classes, 
objects, fields, inheritance, and .NET APIs. On the other side 
are tables, columns, rows, nodes, and separate languages for dealing with 
them. Data types often require translation between the two worlds; there are 
different standard functions. Because the object world has no notion of query, a 
query can only be represented as a string without compile-time type checking or 
IntelliSense support in the IDE. Transferring data from SQL tables or XML trees to 
objects in memory is often tedious and error-prone.
""";

// Split the text block into an array of sentences.
string[] sentences = text.Split(['.', '?', '!']);

// Define the search terms. This list could also be dynamically populated at run time.
string[] wordsToMatch = [ "Historically", "data", "integrated" ];

// Find sentences that contain all the terms in the wordsToMatch array.
// Note that the number of terms to match is not specified at compile time.
char[] separators = ['.', '?', '!', ' ', ';', ':', ','];
var sentenceQuery = from sentence in sentences
                    let w = sentence.Split(separators,StringSplitOptions.RemoveEmptyEntries)
                    where w.Distinct().Intersect(wordsToMatch).Count() == wordsToMatch.Count()
                    select sentence;

foreach (string str in sentenceQuery)
{
    Console.WriteLine(str);
}
/* Output:
Historically, the world of data and the world of objects have not been well integrated
*/

クエリでは最初にテキストを文に分割し、次に各文を各単語を保持する文字列の配列に分割します。その各配列について、重複する単語を Distinct メソッドですべて削除したうえで、それらの単語の配列と wordsToMatch 配列との Intersect 演算を実行します。共通部分のカウントと wordsToMatch 配列のカウントとが一致した場合、文を構成する単語内にすべての検索語句が見つかったものとして判断され、該当する文が返されます。

句読点を文字列から削除するために、Split の呼び出しでは句読点を区切り記号として使用しています。句読点を削除しなかった場合、たとえば、文字列 "Historically," が wordsToMatch 配列の "Historically" と一致しない可能性があります。ソーステキストに使われている句読点の種類によっては、別の区切り記号を使う必要があります。

LINQ クエリと正規表現を組み合わせる方法

この例では、Regex クラスを使用して正規表現を作成し、テキスト文字列内でより複雑な一致を取得する方法を示します。 LINQ クエリを使用すると、正規表現で検索する必要のあるファイルだけをフィルターで抽出したり、結果の形式を指定したりするのが簡単になります。

string startFolder = """C:\Program Files\dotnet\sdk""";
// Or
// string startFolder = "/usr/local/share/dotnet/sdk";

// Take a snapshot of the file system.
var fileList = from file in Directory.GetFiles(startFolder, "*.*", SearchOption.AllDirectories)
                let fileInfo = new FileInfo(file)
                select fileInfo;

// Create the regular expression to find all things "Visual".
System.Text.RegularExpressions.Regex searchTerm =
    new System.Text.RegularExpressions.Regex(@"microsoft.net.(sdk|workload)");

// Search the contents of each .htm file.
// Remove the where clause to find even more matchedValues!
// This query produces a list of files where a match
// was found, and a list of the matchedValues in that file.
// Note: Explicit typing of "Match" in select clause.
// This is required because MatchCollection is not a
// generic IEnumerable collection.
var queryMatchingFiles =
    from file in fileList
    where file.Extension == ".txt"
    let fileText = File.ReadAllText(file.FullName)
    let matches = searchTerm.Matches(fileText)
    where matches.Count > 0
    select new
    {
        name = file.FullName,
        matchedValues = from System.Text.RegularExpressions.Match match in matches
                        select match.Value
    };

// Execute the query.
Console.WriteLine($"""The term "{searchTerm}" was found in:""");

foreach (var v in queryMatchingFiles)
{
    // Trim the path a bit, then write
    // the file name in which a match was found.
    string s = v.name.Substring(startFolder.Length - 1);
    Console.WriteLine(s);

    // For this file, write out all the matching strings
    foreach (var v2 in v.matchedValues)
    {
        Console.WriteLine($"  {v2}");
    }
}

RegEx 検索で返された MatchCollection オブジェクトに対しクエリを実行することも可能です。この例では、一致した各文字列の値のみが結果として生成されています。しかし、LINQ を使用して、各種のフィルター処理、並べ替え、グループ化をそのコレクションに対して実行することもできます。 MatchCollection が非ジェネリック IEnumerable コレクションなので、クエリで範囲変数の型を明示的に記述する必要があります。

次の方法で共有

方法: LINQ を使用して文字列に対しクエリを実行する

文字列内の文字を照会する方法

文字列での単語の出現回数をカウントする方法

任意の単語またはフィールドでテキストデータを並べ替えまたはフィルター処理する方法

特定の単語の文を照会する方法

LINQ クエリと正規表現を組み合わせる方法

その他のリソース

次の方法で共有

方法: LINQ を使用して文字列に対しクエリを実行する

文字列内の文字を照会する方法

文字列での単語の出現回数をカウントする方法

任意の単語またはフィールドでテキスト データを並べ替えまたはフィルター処理する方法

特定の単語の文を照会する方法

LINQ クエリと正規表現を組み合わせる方法

その他のリソース

任意の単語またはフィールドでテキストデータを並べ替えまたはフィルター処理する方法