Это выражение, вероятно, может работать здесь, даже если это не лучшая идея, для которой мы могли бы подойти к решению проблемы, используя этот метод , но если нам нужно:
data-keyword="\s*([^"]+?)\s*"
может также удалить нежелательные пробелы до и после наших желаемых данных.
TEST
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"data-keyword=\"\s*([^\"]+?)\s*\""
test_str = ("<div class=\"s-suggestion\" data-alias=\"aps\" data-crid=\"2AZHZA23OLYLF\" data-isfb=\"false\" data-issc=\"false\" data-keyword=\"aa batteries plus\" data-nid=\"\" data-reftag=\"nb_sb_ss_i_6_2\" data-store=\"\" data-type=\"a9\" id=\"issDiv5\"><span class=\"s-heavy\"></span>ab<span class=\"s-heavy\">reva cold sore treatment</span></div>\n"
"<div class=\"s-suggestion\" data-alias=\"aps\" data-crid=\"2AZHZA23OLYLF\" data-isfb=\"false\" data-issc=\"false\" data-keyword=\" aa batteries plus \" data-nid=\"\" data-reftag=\"nb_sb_ss_i_6_2\" data-store=\"\" data-type=\"a9\" id=\"issDiv5\"><span class=\"s-heavy\"></span>ab<span class=\"s-heavy\">reva cold sore treatment</span></div>")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
выход
Match 1 was found at 105-137: data-keyword="aa batteries plus"
Group 1 found at 119-136: aa batteries plus
Match 2 was found at 417-458: data-keyword=" aa batteries plus "
Group 1 found at 435-452: aa batteries plus
RegEx Circuit
jex.im визуализирует регулярные выражения:
![enter image description here](https://i.stack.imgur.com/3dkzF.png)