Учитывая пример строки:
"000000000463 NYC DOF OPA CONCENTRATION ACCT. *00029265 07/01/2013 AP5378 1,107,844.38 Ven000000000463 Vch:00029265"
Вот что я придумал:
match = re.search(r"(?P<amount>\$?(?:\d+,)*\d+\.\d+)", subject)
if match:
result = match.group("amount") # result will be "1,107,844.38"
else:
result = ""
, чтобы извлечь сумму.Он также обрабатывает небольшие суммы, такие как 0.38
, суммы, у которых нет разделительных запятых, таких как 123456789.38
, и суммы могут предшествовать или не начинаться со знака доллара $
.
Regex details
(?<amount>\$?(?:\d+,)*\d+\.\d+) Match the regular expression below and capture its match into backreference with name “amount”
\$? Match the character “$” literally
? Between zero and one times, as many times as possible, giving back as needed (greedy)
(?:\d+,)* Match the regular expression below
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\d+ Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
, Match the character “,” literally
\d+ Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\. Match the character “.” literally
\d+ Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)