Извлечь несколько значений из строки - PullRequest
0 голосов
/ 11 сентября 2018

Мы использовали этот подход для поиска одиночного ключевого слова

Get-Content $SourceFile | Select-String -Pattern "search keyword value"

Однако нам нужно извлечь 4 значений, а именно вложенных значений в фунтах (£)(переменные суммы в валюте) и буквенные подстроки, как показано ниже:

# Sample input
$String =' in the case of a single acquisition the Total Purchase Price of which (less the amount
funded by Acceptable Funding Sources (Excluding Debt)) exceeds £5,000,000 (or its
equivalent) but is less than or equal to £10,000,000 or its equivalent, the Parent shall
supply to the Agent for the Lenders not later than the date a member of the Group
legally commits to make the relevant acquisition, a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;'

# Values to extract

$Value1 = ' in the case of a single acquisition the Total Purchase Price '

$Value2 = ' £5,000,000'

$Value3 = ' £10,000,000'

$Value4 = ' a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;'

1 Ответ

0 голосов
/ 11 сентября 2018
# Define the regex patterns to search for indidvidually, as elements of an array.
$patterns = 
    # A string literal; escape it, to be safe.
    [regex]::Escape(' in the case of a single acquisition the Total Purchase Price '),     
    # A regex that matches a currency amount in pounds.
    # (Literal ' £', followed by at least one ('+') non-whitespace char. ('\S')
    # - this could be made more stringent by matching digits and commas only.)
    ' £\S+',     
    # A string literal that *needs* escaping due to use of '(' and ')'
    # Note the use of a literal here-string (@'<newline>...<newline>'@)
    [regex]::Escape(@'
a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;
'@)

# - Use Get-Content -Raw to read the file *as a whole*
# - Use Select-String -AllMatches to find *multiple* matches (per input string)
# - ($patterns -join '|') joins the individual regexes with an alternation (|)
#   so that matches of any one of them are returned.
Get-Content -Raw $SourceFile | Select-String -AllMatches -Pattern ($patterns -join '|') |
  ForEach-Object {
    # Loop over the matches, each of which contains the captured substring
    # in index [0], and collect them in an *array*, $capturedSubstrings
    # Note: You could use `Set-Variable` to create individual variables $Variable1, ...
    #       but it's usually easier to work with an array.
    $capturedSubstrings = foreach ($match in $_.Matches) { $match[0].Value }
    # Output the array elements in diagnostic form.
    $capturedSubstrings | % { "[$_]" }
  }

Обратите внимание, что -Pattern обычно принимает массив значений, поэтому использование -Pattern $patterns должно работать (хотя и с несколько иным поведением), но с PowerShell Core 6.1.0 не из-за ошибки .

Предостережение : Предполагается, что ваш сценарий использует тот же стиль новой строки, что и $SourceFile (CRLF по сравнению с LF-only); требуется больше работы, если они отличаются, что будет выглядеть как последний шаблон (многострочный) не совпадает.

С файлом, содержащим содержимое $String выше, это дает:

[ in the case of a single acquisition the Total Purchase Price ]
[ £5,000,000]
[ £10,000,000]
[a copy of any financial due diligence
reports obtained by the Group in relation to the Acquisition Target, on a non-reliance
basis (subject to the Agent and any other relevant Reliance Party signing any required
hold harmless letter) and a copy of the acquisition agreement under which the
Acquisition Target is to be acquired;]
...