Используя что-то прекрасное, beautifulsoup
:
from bs4 import BeautifulSoup
a = "images src <img src=\"http://aa/6.png\" /> <img src=\"http://aa/7.png\" /> "
soup = BeautifulSoup(a, 'html.parser')
page_images = [image["src"] for image in soup.findAll("img")]
print(page_images)
Следовательно, используя dict
для хранения результатов:
from bs4 import BeautifulSoup
data = {}
a = "images src <img src=\"http://aa/6.png\" /> <img src=\"http://aa/7.png\" /> "
soup = BeautifulSoup(a, 'html.parser')
page_images = [image["src"] for image in soup.findAll("img")]
content = a.split("<")[0]
data['content'] = content
data['src'] = page_images
print(data)
ВЫХОД :
{'content': 'images src ', 'src': ['http://aa/6.png', 'http://aa/7.png']}
Но если предпочтительнее регулярное выражение:
import re
data = {}
a = "images src <img src=\"http://aa/6.png\" /> <img src=\"http://aa/7.png\" /> "
content = a.split("<")[0]
data['content'] = content
if re.search('src="([^"]+)"',a):
data['src'] = re.findall ('src="(.*?)"', a, re.DOTALL)
print(data)
ВЫХОД :
{'content': 'images src ', 'src': ['http://aa/6.png', 'http://aa/7.png']}