Как мне прочитать приведенный ниже массив Javascript как пары значений ключей, используя python xpath? Выходом в python будет ['id', '359521', 'name', 'HO1 mini-трусы HO1' и т. Д.]
Возможно в 1 go, чтобы получить элемент списка python, где Я могу получить доступ к данным
Помощь оценена
JAVASCRIPT IN HTML
<script type="text/javascript">
var wcIsGtm = false;
var productImpressions = [];
var promoImpressions = [];
var wcGuaTrackerName = '';
var wcGuaGlobalTrackerName = 'allstores.';
var wcGuaGlobalTrackerEnabled = '0';
var referralExclusionList = [];
if(document.referrer) {
for(excludedDomain in referralExclusionList) {
if(document.referrer.indexOf(excludedDomain) != -1) {
document.referrer = '';
}
}
}
(function(w,e,b,c,oo,ki,ng){w['GoogleAnalyticsObject']=oo;w[oo]=w[oo]||function(){
(w[oo].q=w[oo].q||[]).push(arguments)},w[oo].l=1*new Date();ki=e.createElement(b),
ng=e.getElementsByTagName(b)[0];ki.async=1;ki.src=c;ng.parentNode.insertBefore(ki,ng)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-19354276-10', 'auto');
ga('require', 'ec');
ga('set', '&cu', 'EUR');
var productDetail = {
'id': '359521',
'name': 'HO1 mini-briefs HO1',
'category': 'Collection HOM Basics/Slips',
'brand': '',
'price': '10.4',
'variant': ''
};
ga('ec:addProduct', productDetail);
ga('ec:setAction', 'detail');
ga('send', 'pageview');
</script>
<meta property="og:locale" content="en_US" />
<meta property="og:title" content="HO1 mini-briefs HO1" />
<meta property="og:type" content="product" />
другим примером будет
var AWACP_CONFIG = {
mageVersion: '1.9.3.1',
useProgress : 1,
popupForAllProducts : 0,
addProductConfirmationEnabled : 1,
removeProductConfirmationEnabled : 1,
dialogsVAlign: 'center',
cartAnimation: 'opacity',
addProductCounterBeginFrom : 0,
removeProductCounterBeginFrom : 0,
hasFileOption : false };
Возможный код logi c
# todo make more robust to read JS data
var_to_find = 'productDetail'
pattern = re.compile(r"var {var_to_find}} ?= ?({.*?});", re.MULTILINE | re.DOTALL)
xpath_string = "//script[contains(text(), 'var %s')]/text()" % var_to_find
js_data = response.xpath(xpath_string)[0].rstrip()
js_data = js_data.re(pattern)[0]
json_data = json.loads(js_data)
print(json_data)
The idea is to
1. find JS variable based on some input var (we know the var name)
2. it finds the data inside {.*}
3. it strips alle spaces, newlines, comments and only keeps "var1":"data","var3":"data","var3":"data",
4. then split on , to obtain key values pairs
5. then split on : to set key and values in list, excluding " or '
Step 3 is the most complet because it needs to be robust to deal with any kind of formatting