Я пытаюсь l oop по списку из 5 l xml ._ Элемента.
Вот выдержка из части html, которая меня интересует:
<div style="" id="ember140" class="pv-deferred-area ember-view"> <div class="pv-deferred-area__content">
<!---->
</div>
</div>
<div id="oc-background-section" class="pv-oc ember-view"> <span class="background-details">
<div id="ember217" class="ember-view"><section id="ember218" class="pv-profile-section pv-profile-section--reorder-enabled background-section artdeco-container-card ember-view"><div id="ember219" class="pv-profile-section-pager ember-view"> <section id="experience-section" class="pv-profile-section experience-section ember-view"><header class="pv-profile-section__card-header">
<h2 class="pv-profile-section__card-heading">
Expérience
</h2>
<a data-control-name="add_position" href="/in/gregoire-de-kermel/edit/position/new/" id="ember221" class="pv-profile-section__header-add-action add-position artdeco-button artdeco-button--tertiary artdeco-button--circle ember-view"> <li-icon type="plus-icon" role="img" aria-label="Ajouter un nouveau poste"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" data-supported-dps="24x24" fill="currentColor" width="24" height="24" focusable="false">
<path d="M21 13h-8v8h-2v-8H3v-2h8V3h2v8h8v2z"></path>
</svg></li-icon>
</a></header>
<ul class="pv-profile-section__section-info section-info pv-profile-section__section-info--has-more">
<li id="ember223" class="pv-entity__position-group-pager pv-profile-section__list-item ember-view"> <section id="1571672557" class="pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view"> <div class="display-flex justify-space-between full-width">
<div class="display-flex flex-column full-width">
<a data-control-name="background_details_company" href="/company/reputation-squad/" id="ember226" class="full-width ember-view"> <div class="pv-entity__logo company-logo">
<img src="https://media-exp1.licdn.com/dms/image/C4D0BAQE__TgCl2fyUw/company-logo_100_100/0?e=1593648000&v=beta&t=VLSKEVUbJDcULtQwEdrHrH5Gxwq_j7tk2HczgAKn7YU" loading="lazy" alt="Reputation Squad" id="ember228" class="pv-entity__logo-img EntityPhoto-square-5 lazy-image loaded ember-view">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section ">
<h3 class="t-16 t-black t-bold">Data Scientist</h3>
<p class="visually-hidden">Nom de l’entreprise</p>
<p class="pv-entity__secondary-title t-14 t-black t-normal">
Reputation Squad
<span class="pv-entity__secondary-title separator">Contrat en alternance</span>
</p>
<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates d’emploi</span>
<span>janv. 2020 – Aujourd’hui</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Durée d’emploi</span>
<span class="pv-entity__bullet-item-v2">4 mois</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Lieu</span>
<span>Région de Paris, France</span>
</h4>
<!---->
</div>
</a>
<!----> </div>
<div class="pv-entity__actions">
<a data-control-name="edit_position" href="/in/gregoire-de-kermel/edit/position/1571672557/" id="ember230" class="pv-profile-section__edit-action pv-profile-section__hoverable-action artdeco-button artdeco-button--tertiary artdeco-button--circle ember-view"> <li-icon type="pencil-icon" role="img" aria-label="Modifier le poste Data Scientist"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" data-supported-dps="24x24" fill="currentColor" width="24" height="24" focusable="false">
<path d="M21.71 5L19 2.29a1 1 0 00-.71-.29 1 1 0 00-.7.29L4 15.85 2 22l6.15-2L21.71 6.45a1 1 0 00.29-.74 1 1 0 00-.29-.71zM6.87 18.64l-1.5-1.5L15.92 6.57l1.5 1.5zM18.09 7.41l-1.5-1.5 1.67-1.67 1.5 1.5z"></path>
</svg></li-icon>
</a><!----> </div>
</div>
</section>
</li><li id="ember232" class="pv-entity__position-group-pager pv-profile-section__list-item ember-view"> <section id="1516596236" class="pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view"> <div class="display-flex justify-space-between full-width">
<div class="display-flex flex-column full-width">
<a data-control-name="background_details_company" href="/company/credit-agricole-de-la-touraine-et-du-poitou-crto-/" id="ember235" class="full-width ember-view"> <div class="pv-entity__logo company-logo">
<img src="https://media-exp1.licdn.com/dms/image/C560BAQHz0qZ2RutURA/company-logo_100_100/0?e=1593648000&v=beta&t=uzqwKV9Un5c_b7X3Xo7vqA2KXcQkmBRDWpMUO5Bu1Gc" loading="lazy" alt="Crédit Agricole de la Touraine et du Poitou" id="ember237" class="pv-entity__logo-img EntityPhoto-square-5 lazy-image loaded ember-view">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section mb2">
<h3 class="t-16 t-black t-bold">Data Scientist</h3>
<p class="visually-hidden">Nom de l’entreprise</p>
<p class="pv-entity__secondary-title t-14 t-black t-normal">
Crédit Agricole de la Touraine et du Poitou
<span class="pv-entity__secondary-title separator">Contrat en alternance</span>
</p>
<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates d’emploi</span>
<span>sept. 2019 – janv. 2020</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Durée d’emploi</span>
<span class="pv-entity__bullet-item-v2">5 mois</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Lieu</span>
<span>Région de Poitiers, France</span>
</h4>
<!---->
</div>
</a>
<div id="ember239" class="pv-entity__extra-details t-14 t-black--light ember-view"><p style="line-height:2rem;max-height:8rem;" id="ember240" class="pv-entity__description t-14 t-black t-normal inline-show-more-text inline-show-more-text--is-collapsed ember-view">• Web scraping (Python)<br>• Etude de profilage client (SAS)<br>• Mise en place d'un projet de système de recommandation (Hadoop, SAS, Python)
<!----></p><!----></div>
</div>
<div class="pv-entity__actions">
<a data-control-name="edit_position" href="/in/gregoire-de-kermel/edit/position/1516596236/" id="ember241" class="pv-profile-section__edit-action pv-profile-section__hoverable-action artdeco-button artdeco-button--tertiary artdeco-button--circle ember-view"> <li-icon type="pencil-icon" role="img" aria-label="Modifier le poste Data Scientist"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" data-supported-dps="24x24" fill="currentColor" width="24" height="24" focusable="false">
<path d="M21.71 5L19 2.29a1 1 0 00-.71-.29 1 1 0 00-.7.29L4 15.85 2 22l6.15-2L21.71 6.45a1 1 0 00.29-.74 1 1 0 00-.29-.71zM6.87 18.64l-1.5-1.5L15.92 6.57l1.5 1.5zM18.09 7.41l-1.5-1.5 1.67-1.67 1.5 1.5z"></path>
</svg></li-icon>
</a><!----> </div>
</div>
</section>
</li><li id="ember243" class="pv-entity__position-group-pager pv-profile-section__list-item ember-view"> <section id="1427380111" class="pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view"> <div class="display-flex justify-space-between full-width">
<div class="display-flex flex-column full-width">
<a data-control-name="background_details_company" href="/company/weblagence/" id="ember246" class="full-width ember-view"> <div class="pv-entity__logo company-logo">
<img src="https://media-exp1.licdn.com/dms/image/C560BAQHOw0tfMPSiWA/company-logo_100_100/0?e=1593648000&v=beta&t=NqZ8eTVFqA2MK4B1ZFUSE7NgTL_ZPqBIMrexzcYnNok" loading="lazy" alt="WebL'Agence" id="ember248" class="pv-entity__logo-img EntityPhoto-square-5 lazy-image loaded ember-view">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section mb2">
<h3 class="t-16 t-black t-bold">Python & React Native developer junior</h3>
<p class="visually-hidden">Nom de l’entreprise</p>
<p class="pv-entity__secondary-title t-14 t-black t-normal">
WebL'Agence
<!----> </p>
<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates d’emploi</span>
<span>janv. 2019 – août 2019</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Durée d’emploi</span>
<span class="pv-entity__bullet-item-v2">8 mois</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Lieu</span>
<span>Région de Paris, France</span>
</h4>
<!---->
</div>
</a>
<div id="ember250" class="pv-entity__extra-details t-14 t-black--light ember-view"><p style="line-height:2rem;max-height:8rem;" id="ember251" class="pv-entity__description t-14 t-black t-normal inline-show-more-text inline-show-more-text--is-collapsed ember-view">• Création d’applications mobiles (React-Native)<br>• Développement d’un modèle d’évaluation de startup « early-stage »<br>• Web scraping (Selenium Python)<br>• Gestionnaire d’un projet de Machine-Learing/OCR+ (externalisation auprès de prestataires externes et utilisation de AWS textract)
<span class="inline-show-more-text__link-container-collapsed">
<span>…</span>
<button class="inline-show-more-text__button link" aria-expanded="false" data-ember-action="" data-ember-action-341="341">
voir plus
</button>
</span>
<!----></p><!----></div>
</div>
<div class="pv-entity__actions">
<a data-control-name="edit_position" href="/in/gregoire-de-kermel/edit/position/1427380111/" id="ember252" class="pv-profile-section__edit-action pv-profile-section__hoverable-action artdeco-button artdeco-button--tertiary artdeco-button--circle ember-view"> <li-icon type="pencil-icon" role="img" aria-label="Modifier le poste Python &amp; React Native developer junior"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" data-supported-dps="24x24" fill="currentColor" width="24" height="24" focusable="false">
<path d="M21.71 5L19 2.29a1 1 0 00-.71-.29 1 1 0 00-.7.29L4 15.85 2 22l6.15-2L21.71 6.45a1 1 0 00.29-.74 1 1 0 00-.29-.71zM6.87 18.64l-1.5-1.5L15.92 6.57l1.5 1.5zM18.09 7.41l-1.5-1.5 1.67-1.67 1.5 1.5z"></path>
</svg></li-icon>
</a><!----> </div>
</div>
</section>
</li><li id="ember254" class="pv-entity__position-group-pager pv-profile-section__list-item ember-view"> <section id="708026390" class="pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view"> <div class="display-flex justify-space-between full-width">
<div class="display-flex flex-column full-width">
<a data-control-name="background_details_company" href="/search/results/all/?keywords=Gauthier%20Associ%C3%A9s" id="ember257" class="full-width ember-view"> <div class="pv-entity__logo company-logo">
<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" loading="lazy" alt="Gauthier Associés" id="ember259" class="pv-entity__logo-img EntityPhoto-square-5 lazy-image ghost-company loaded ember-view">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section mb2">
<h3 class="t-16 t-black t-bold">Business Financial Analyst</h3>
<p class="visually-hidden">Nom de l’entreprise</p>
<p class="pv-entity__secondary-title t-14 t-black t-normal">
Gauthier Associés
<!----> </p>
<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates d’emploi</span>
<span>juil. 2015 – juin 2019</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Durée d’emploi</span>
<span class="pv-entity__bullet-item-v2">4 ans</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Lieu</span>
<span>Smarves</span>
</h4>
<!---->
</div>
</a>
<div id="ember261" class="pv-entity__extra-details t-14 t-black--light ember-view"><p style="line-height:2rem;max-height:8rem;" id="ember262" class="pv-entity__description t-14 t-black t-normal inline-show-more-text inline-show-more-text--is-collapsed ember-view">It started as a 2 months internship in which my tasks were to:<br>• Analysed the company's profitability<br>• Created the official corporate document on profitability<br>• Designed and administered a corporate customer satisfaction survey<br><br>Ever since, I am doing yearly financial and business analysis under my own business. It has been now 4 years that I am working with this company, with more and more responsibilities over the time such reporting and analysing the company's investing holdings' profitability.
<span class="inline-show-more-text__link-container-collapsed">
<span>…</span>
<button class="inline-show-more-text__button link" aria-expanded="false" data-ember-action="" data-ember-action-342="342">
voir plus
</button>
</span>
<!----></p><!----></div>
</div>
<div class="pv-entity__actions">
<a data-control-name="edit_position" href="/in/gregoire-de-kermel/edit/position/708026390/" id="ember263" class="pv-profile-section__edit-action pv-profile-section__hoverable-action artdeco-button artdeco-button--tertiary artdeco-button--circle ember-view"> <li-icon type="pencil-icon" role="img" aria-label="Modifier le poste Business Financial Analyst"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" data-supported-dps="24x24" fill="currentColor" width="24" height="24" focusable="false">
<path d="M21.71 5L19 2.29a1 1 0 00-.71-.29 1 1 0 00-.7.29L4 15.85 2 22l6.15-2L21.71 6.45a1 1 0 00.29-.74 1 1 0 00-.29-.71zM6.87 18.64l-1.5-1.5L15.92 6.57l1.5 1.5zM18.09 7.41l-1.5-1.5 1.67-1.67 1.5 1.5z"></path>
</svg></li-icon>
</a><!----> </div>
</div>
</section>
</li><li id="ember265" class="pv-entity__position-group-pager pv-profile-section__list-item ember-view"> <section id="813743952" class="pv-profile-section__card-item-v2 pv-profile-section pv-position-entity ember-view"> <div class="display-flex justify-space-between full-width">
<div class="display-flex flex-column full-width">
<a data-control-name="background_details_company" href="/company/gsma/" id="ember268" class="full-width ember-view"> <div class="pv-entity__logo company-logo">
<img src="https://media-exp1.licdn.com/dms/image/C560BAQGmHE5IziHPfw/company-logo_100_100/0?e=1593648000&v=beta&t=uonj7aae0F9Qr9Z7uDJAjX358njW5zCqaCrhF-m5wJU" loading="lazy" alt="GSMA" id="ember270" class="pv-entity__logo-img EntityPhoto-square-5 lazy-image loaded ember-view">
</div>
<div class="pv-entity__summary-info pv-entity__summary-info--background-section mb2">
<h3 class="t-16 t-black t-bold">Intern Analyst - Network 2020</h3>
<p class="visually-hidden">Nom de l’entreprise</p>
<p class="pv-entity__secondary-title t-14 t-black t-normal">
GSMA
<!----> </p>
<div class="display-flex">
<h4 class="pv-entity__date-range t-14 t-black--light t-normal">
<span class="visually-hidden">Dates d’emploi</span>
<span>mai 2016 – juil. 2016</span>
</h4>
<h4 class="t-14 t-black--light t-normal">
<span class="visually-hidden">Durée d’emploi</span>
<span class="pv-entity__bullet-item-v2">3 mois</span>
</h4>
</div>
<h4 class="pv-entity__location t-14 t-black--light t-normal block">
<span class="visually-hidden">Lieu</span>
<span>London, Royaume-Uni</span>
</h4>
<!---->
</div>
</a>
<div id="ember272" class="pv-entity__extra-details t-14 t-black--light ember-view"><p style="line-height:2rem;max-height:8rem;" id="ember273" class="pv-entity__description t-14 t-black t-normal inline-show-more-text inline-show-more-text--is-collapsed ember-view">Initially, I had the opportunity to use small and large data sets to reconcile, analyse and present to key stakeholders – developing strong excel capability in the process.<br><br>Further developing these skills, I had the opportunity to deliver a project of work (end-to-end) from developing communications for data requests, clean data collection and storage processes and then developing a methodology for estimating market size metrics in a sustainable reporting process to the Network 2020 programme.<br><br>Summary of milestones:<br>• Reconciling, analysing and presenting information to key stakeholders<br>• Request, store and model market estimations<br>• Researching, collection and storing and communicating key business metrics
<span class="inline-show-more-text__link-container-collapsed">
<span>…</span>
<button class="inline-show-more-text__button link" aria-expanded="false" data-ember-action="" data-ember-action-343="343">
voir plus
</button>
</span>
<!----></p><!----></div>
</div>
<div class="pv-entity__actions">
<a data-control-name="edit_position" href="/in/gregoire-de-kermel/edit/position/813743952/" id="ember274" class="pv-profile-section__edit-action pv-profile-section__hoverable-action artdeco-button artdeco-button--tertiary artdeco-button--circle ember-view"> <li-icon type="pencil-icon" role="img" aria-label="Modifier le poste Intern Analyst - Network 2020"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" data-supported-dps="24x24" fill="currentColor" width="24" height="24" focusable="false">
<path d="M21.71 5L19 2.29a1 1 0 00-.71-.29 1 1 0 00-.7.29L4 15.85 2 22l6.15-2L21.71 6.45a1 1 0 00.29-.74 1 1 0 00-.29-.71zM6.87 18.64l-1.5-1.5L15.92 6.57l1.5 1.5zM18.09 7.41l-1.5-1.5 1.67-1.67 1.5 1.5z"></path>
</svg></li-icon>
</a><!----> </div>
</div>
</section>
</li> </ul>
<div id="ember275" class="pv-experience-section__see-more pv-profile-section__actions-inline ember-view"><button class="pv-profile-section__see-more-inline pv-profile-section__text-truncate-toggle link link-without-hover-state" aria-expanded="false">Afficher 1 expérience de plus
<li-icon aria-hidden="true" type="chevron-down-icon" class="pv-profile-section__toggle-detail-icon" size="small"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" data-supported-dps="16x16" fill="currentColor" width="16" height="16" focusable="false">
<path d="M8 9l5.93-4L15 6.54l-6.15 4.2a1.5 1.5 0 01-1.69 0L1 6.54 2.07 5z"></path>
</svg></li-icon></button>
<!----></div>
Я сохранил извлечение в файле html и открыл его следующим образом:
def parse_html_file(filename):
f = open(filename, encoding="utf8").read()
parser = etree.HTMLParser()
tree = etree.parse(StringIO(f), parser)
return tree
tree = parse_html_file('test.html')
В моем списке 5 элементов "pv-profile-section__section-info section-info pv-profile-section__section-info--has-more"
.
Цель состоит в том, чтобы извлечь название работы, название компании и тип контракта.
До сих пор я сделал следующее:
job_location = tree.xpath(
'.//li[@class="pv-entity__position-group-pager pv-profile-section__list-item ember-view"]')
di = {}
for i in job_location:
try:
di['name'] = tree.xpath(
'//h3[@class="t-16 t-black t-bold"]/text()')
except:
di['name'] = 'None'
try:
di['name'] = tree.xpath(
'//h3[@class="t-16 t-black t-bold"]/text()')
except:
di['name'] = 'None'
try:
di['contract'] = tree.xpath(
'//span[@class="pv-entity__secondary-title separator"]/text()')
except:
di['contract'] = 'None'
print(di)
Кажется, это работает, но сейчас длина переменных "job" и "company" равна 5, а "contract_type" равна 2. Я хотел бы напечатать что-то, что внутри исходного l oop нет атрибута contract_type, как для последнего элемента. Когда ничего нет, я бы хотел отобразить «Нет» для типа контракта.
Что у меня есть:
{'name': ['Data Scientist', 'Data Scientist', 'Python & React Native developer junior', 'Business Financial Analyst', 'Intern Analyst - Network 2020'], 'contract': ['Contrat en alternance', 'Contrat en alternance']}
Что бы я хотел получить:
{'name': ['Data Scientist', 'Data Scientist', 'Python & React Native Developer младший', 'Business Financial Analyst', 'Intern Analyst - Network 2020'], 'contract': [ 'Contrat en alternance', 'Contrat en alternance', '', '', '']}
Ребята, не могли бы вы дать мне подсказку по этому заданию?