Вот грубый парсер для вашего CSS:
import pyparsing as pp
# punctuation is important during parsing, but just noise afterwords; suppress it
LBRACE, RBRACE = map(pp.Suppress, "{}")
# read a ':' and any following whitespace
COLON = (":" + pp.Empty()).suppress()
obj_ref = pp.Word(".", pp.alphanums+'_') | pp.Word(pp.alphas, pp.alphanums+'_')
attr_name = pp.Word(pp.alphas, pp.alphanums+'-_')
attr_spec = pp.Group(attr_name("name") + COLON + pp.restOfLine("value"))
# each of your format specifications is one or more comma-delimited lists of obj_refs,
# followed by zero or more attr_specs in {}'s
# using a pp.Dict will auto-define an associative array from the parsed keys and values
spec = pp.Group(pp.delimitedList(obj_ref)[1,...]('refs')
+ LBRACE
+ pp.Dict(attr_spec[...])("attrs")
+ RBRACE)
# the parser will parse 0 or more specs
parser = spec[...]
Парсинг вашего css источника:
result = parser.parseString(css_source)
print(result.dump())
Дает:
[['.container_12', '.container_16', [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]], ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5', [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]], ['.featured_container', '.container_12', '.grid_4', 'a', [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]]]
[0]:
['.container_12', '.container_16', [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]]
- attrs: [['margin-left', 'auto;'], ['margin-right', 'auto;'], ['width', '960px']]
- margin-left: 'auto;'
- margin-right: 'auto;'
- width: '960px'
- refs: ['.container_12', '.container_16']
[1]:
['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5', [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]]
- attrs: [['display', 'inline;'], ['float', 'left;'], ['margin-left', '10px;'], ['margin-right', '10px']]
- display: 'inline;'
- float: 'left;'
- margin-left: '10px;'
- margin-right: '10px'
- refs: ['.grid_1', '.grid_2', '.grid_3', '.grid_4', '.grid_5']
[2]:
['.featured_container', '.container_12', '.grid_4', 'a', [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]]
- attrs: [['color', '#1d1d1d;'], ['float', 'right;'], ['width', '235px;'], ['height', '40px;'], ['text-align', 'center;'], ['line-height', '40px;'], ['border', '4px solid #141a20;']]
- border: '4px solid #141a20;'
- color: '#1d1d1d;'
- float: 'right;'
- height: '40px;'
- line-height: '40px;'
- text-align: 'center;'
- width: '235px;'
- refs: ['.featured_container', '.container_12', '.grid_4', 'a']
Использование defaultdict(dict)
для накопления атрибутов по объекту CSS, на который есть ссылка:
from collections import defaultdict
accum = defaultdict(dict)
for res in result:
for name in res.refs:
accum[name].update(res.attrs)
from pprint import pprint
pprint(accum['.container_12'])
Дает:
{'border': '4px solid #141a20;',
'color': '#1d1d1d;',
'float': 'right;',
'height': '40px;',
'line-height': '40px;',
'margin-left': 'auto;',
'margin-right': 'auto;',
'text-align': 'center;',
'width': '235px;'}