Я написал код для извлечения деталей профиля LinkedIn, но иногда весь HTML не загружается для некоторых профилей пользователей.
Я уже использовал классические механизмы ожидания, например
driver.implicitly_wait(10)
time.sleep(10)
element_present = EC.presence_of_element_located((By.CLASS_NAME, '.pv-profile-section__card-item-v2.pv-profile-section.pv-position-entity.ember-view'))
WebDriverWait(driver, 300).until(element_present)
но ни один из них, похоже, не работает.
Фрагмент моего кода:
firstName = urllib.parse.quote(userFirstName)
lastName = urllib.parse.quote(userLastName)
company = urllib.parse.quote(userCompany)
driver.get('https://www.linkedin.com/search/results/people/?company='+company+'&firstName='+firstName+'&lastName='+lastName+'&origin=FACETED_SEARCH')
results = len(driver.find_elements_by_css_selector('.name.actor-name'))
for i in range(1):
print(i)
driver.find_elements_by_css_selector('.name.actor-name')[i].click()
time.sleep(10)
print(driver.current_url)
content = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
driver.implicitly_wait(2)
soup = BeautifulSoup(content, "html.parser")
#print(soup)
companyList = soup.findAll('section',{'class':'pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view'})
print("Company list length: "+str(len(companyList)))
Код дает список компаний для многих пользователей, но в некоторых случаях он просто дает сбой. Я проверил эти профили в своем браузере, и элементы в коде действительно существуют.
Буду признателен за любую помощь / прошлый опыт в этом. Я знаю, что попытка решить эту проблему также потребует усилий, поэтому спасибо заранее!
PS: Часть HTML (часть опыта, которая мне нужна):
<ul class="pv-profile-section__section-info section-info pv-profile-section__section-info--has-no-more">
<li class="pv-entity__position-group-pager pv-profile-section__list-item ember-view" id="ember394"> <section class="pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view" id="ember396"> <div class="display-flex justify-space-between full-width">
<a class="full-width ember-view" data-control-name="background_details_company" href="/search/results/index/?keywords=Aditya%20Birla%20Direct" id="ember397"> <div class="pv-entity__company-details">
<div class="pv-entity__logo company-logo">
<img alt="Aditya Birla Direct" class="pv-entity__logo-img pv-entity__logo-img EntityPhoto-square-5 lazy-image ember-view" id="ember399"/>
</div>
<div class="pv-entity__company-summary-info">
<h3 class="t-16 t-black t-bold">
<span class="visually-hidden">Company Name</span>
<span>Aditya Birla Direct</span>
</h3>
<h4 class="t-14 t-black t-normal">
<span class="visually-hidden">Total Duration</span>
<span>2 yrs 6 mos</span>
</h4>
</div>
</div>
</a>
<!-- --> </div>
<ul class="pv-entity__position-group mt2 ember-view" id="ember400"><li class="pv-entity__position-group-role-item sortable-item ember-view" id="ember402"> <div class="ember-view" id="ember403"><div class="pv-entity__role-details">
<span class="pv-entity__timeline-node"></span>
<div class="display-flex justify-space-between full-width">
<div class="pv-entity__role-container">
<div class="pv-entity__role-details-container pv-entity__role-details-container--timeline pv-entity__role-details-container--bottom-margin">
<div class="pv-entity__summary-info-v2 pv-entity__summary-info--background-section pv-entity__summary-info-margin-top">
<h3 class="t-14 t-black t-bold">
<span class="visually-hidden">Title</span>
<span>Product Designer</span>
</h3>
<!-- --> <div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates Employed</span>
<span>Jun 2018 – Present</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item-v2">1 yr 5 mos</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Location</span>
<span>Mumbai, Maharashtra, India</span>
</h4>
</div>
<!-- --> </div>
</div>
<!-- --> </div>
</div>
</div>
</li><li class="pv-entity__position-group-role-item sortable-item ember-view" id="ember405"> <div class="ember-view" id="ember406"><div class="pv-entity__role-details">
<span class="pv-entity__timeline-node"></span>
<div class="display-flex justify-space-between full-width">
<div class="pv-entity__role-container">
<div class="pv-entity__role-details-container">
<div class="pv-entity__summary-info-v2 pv-entity__summary-info--background-section pv-entity__summary-info-margin-top">
<h3 class="t-14 t-black t-bold">
<span class="visually-hidden">Title</span>
<span>UI/UX Designer</span>
</h3>
<!-- --> <div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates Employed</span>
<span>May 2017 – Present</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item-v2">2 yrs 6 mos</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Location</span>
<span>Mumbai, Maharashtra, India</span>
</h4>
</div>
<!-- --> </div>
</div>
<!-- --> </div>
</div>
</div>
</li>
</ul>
<!-- --></section>
</li><li class="pv-entity__position-group-pager pv-profile-section__list-item ember-view" id="ember408"> <section class="pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view" id="1192970710"> <div class="display-flex justify-space-between full-width">
<div class="display-flex flex-column full-width">
<a class="full-width ember-view" data-control-name="background_details_company" href="/search/results/index/?keywords=improove%20technology%20pvt%20ltd" id="ember411"> <div class="pv-entity__logo company-logo">
<img alt="improove technology pvt ltd" class="pv-entity__logo-img pv-entity__logo-img EntityPhoto-square-5 lazy-image ghost-company ember-view" id="ember413"/>
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section">
<h3 class="t-16 t-black t-bold">UI/UX Designer</h3>
<p class="visually-hidden">Company Name</p>
<p class="pv-entity__secondary-title t-14 t-black t-normal">improove technology pvt ltd</p>
<!-- -->
<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates Employed</span>
<span>May 2015 – May 2017</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item-v2">2 yrs 1 mo</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Location</span>
<span>Delhi</span>
</h4>
</div>
</a>
<!-- --> </div>
<!-- --> </div>
</section>
</li>
Мне в основном нужныНазвание компании, должность и даты работы.