Очистка данных, отображаемых при наведении мыши на график - PullRequest
1 голос
/ 09 марта 2020

Я заинтересован в автоматизации соскоба с веб-страниц, таких как https://www.hltv.org/team/7532/big. Точнее, я хотел бы извлечь даты и #ranking из полей, отображаемых при наведении мыши на график (см. Скриншот ниже)

Я пытался использовать python в сочетании с селеном, но я не совсем знаю как действовать дальше, хотя я прошел через различные учебники. Я чувствую, что мне нужно изменить верхнее и левое значение из атрибута style, но я не знаю, как это сделать и нужно ли использовать xpath, css selector или что-то еще. Вот фрагмент моего кода, который возвращает интересующий меня WebElement (предположительно), но мне не удалось ничего извлечь из него: (

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
executable_path=r'C:/Users/fabbe/Documents/Python Scripts/hltv/chromedriver/chromedriver.exe'
driver = webdriver.Chrome(executable_path, chrome_options=options)

driver.get("https://www.hltv.org/team/7532/big")

elements = driver.find_elements_by_xpath("//*[@id='fusioncharts-tooltip-element']")

снимок экрана

1 Ответ

1 голос
/ 10 марта 2020

Я бы выбрал другой подход для получения данных графика, так что вам не нужно наводить курсор мыши на все части графика.

Вы должны добавить следующие импорты.

import json
from lxml import html

Код:

url = "https://www.hltv.org/team/7532/BIG"
driver.get(url)
graph_data  = driver.find_element_by_css_selector('.chart-container.core-chart-container .border-box .graph').get_attribute('data-fusionchart-config')
graph_text = json.loads(graph_data)['dataSource']['dataset'][0]['data']
for graph_item in graph_text:
    tree = html.fromstring(graph_item['tooltext'])
    print("Date:" + tree.xpath("//div[@class='subtitle']//text()")[0])
    print("Rank:" + tree.xpath("(//div[@class='ranking-development-top-info']//div[@class='title'])[2]//text()")[0])
driver.close()

Здесь вы получаете содержимое графика и затем анализируете его. Затем получите только те данные, которые нам интересны, и переберите все элементы графика.

Ниже приведен результат.

Date:24th December 2018
Rank:#11
Date:31st December 2018
Rank:#11
Date:7th January 2019
Rank:#11
Date:14th January 2019
Rank:#12
Date:21st January 2019
Rank:#13
Date:28th January 2019
Rank:#13
Date:4th February 2019
Rank:#15
Date:11th February 2019
Rank:#12
Date:18th February 2019
Rank:#14
Date:25th February 2019
Rank:#15
Date:4th March 2019
Rank:#18
Date:11th March 2019
Rank:#16
Date:18th March 2019
Rank:#18
Date:25th March 2019
Rank:#18
Date:1st April 2019
Rank:#18
Date:8th April 2019
Rank:#18
Date:15th April 2019
Rank:#18
Date:22nd April 2019
Rank:#19
Date:29th April 2019
Rank:#19
Date:6th May 2019
Rank:#18
Date:13th May 2019
Rank:#18
Date:20th May 2019
Rank:#20
Date:27th May 2019
Rank:#22
Date:3rd June 2019
Rank:#22
Date:10th June 2019
Rank:#22
Date:17th June 2019
Rank:#26
Date:24th June 2019
Rank:#30
Date:1st July 2019
Rank:#34
Date:8th July 2019
Rank:#23
Date:15th July 2019
Rank:#27
Date:22nd July 2019
Rank:#22
Date:29th July 2019
Rank:#23
Date:5th August 2019
Rank:#28
Date:12th August 2019
Rank:#25
Date:19th August 2019
Rank:#24
Date:26th August 2019
Rank:#26
Date:2nd September 2019
Rank:#28
Date:9th September 2019
Rank:#24
Date:16th September 2019
Rank:#22
Date:23rd September 2019
Rank:#22
Date:30th September 2019
Rank:#21
Date:7th October 2019
Rank:#27
Date:14th October 2019
Rank:#24
Date:21st October 2019
Rank:#26
Date:28th October 2019
Rank:#24
Date:4th November 2019
Rank:#24
Date:11th November 2019
Rank:#24
Date:18th November 2019
Rank:#28
Date:25th November 2019
Rank:#26
Date:2nd December 2019
Rank:#26
Date:9th December 2019
Rank:#29
Date:16th December 2019
Rank:#33
Date:23rd December 2019
Rank:#40
Date:30th December 2019
Rank:#39
Date:6th January 2020
Rank:#46
Date:13th January 2020
Rank:#46
Date:20th January 2020
Rank:#46
Date:27th January 2020
Rank:#22
Date:3rd February 2020
Rank:#22
Date:10th February 2020
Rank:#23
Date:17th February 2020
Rank:#25
Date:24th February 2020
Rank:#26
Date:2nd March 2020
Rank:#21
Date:9th March 2020
Rank:#20
...