Python-запрос отсутствует часть содержимого - PullRequest
0 голосов
/ 21 ноября 2018

Я удаляю содержимое задания с веб-сайта (https://www.104.com.tw/job/?jobno=66wee).. Когда я отправляю запрос, возвращается только часть содержимого в элементе 'p'. Я хочу получить всю часть div class = "content".

мой код:

  import requests
  from bs4 import BeautifulSoup

  payload = {'jobno':'66wee'}
  headers = {'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
  r = requests.get('https://www.104.com.tw/job/',params = payload,headers = headers)
  soup=  BeautifulSoup(r.text, 'html.parser')
  contents = soup.findAll('div',{'class':'content'})  
  desctiprion = contents[0].findAll('p')[0].text.strip()
  print(desctiprion)

результат (часть описания задания отсутствует):

4. Develop tools and systems that optimize analysis process efficiency and report quality.ion tools.row and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.

но html-код этой части:

    <div class="content">
      <p>Appier is a technology company that makes it easy for businesses to use artificial intelligence to grow and succeed in a cross screen era. Appier is formed by a passionate team of computer scientists and engineers with experience in AI, data analysis, distributed systems, and marketing. Our colleagues come from Google, Intel, Yahoo, as well as renowned AI research groups in Harvard University and Stanford University. Headquartered in Taiwan, Appier serves more than 500 global brands and agencies from offices in international markets including Singapore, Japan, Australia, Hong Kong, Vietnam, India, Indonesia and South Korea.
<br>
<br>Job Description
<br>1. Perform data analysis to help Appier teams to answer business or operational questions.
<br>2. Interpret trends or patterns from complex data sets by using statistical and visualization tools.
<br>3. Conduct data analysis reports to illustrate the results and insight
<br>4. Develop tools and systems that optimize analysis process efficiency and report quality.</p>

Ответы [ 3 ]

0 голосов
/ 21 ноября 2018
import requests
from bs4 import BeautifulSoup

payload = {'jobno': '66wee'}
headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}
r = requests.get('https://www.104.com.tw/job/',
                 params=payload, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
contents = soup.findAll('div', {'class': 'content'})
for content in contents[0].findAll('p')[0].text.splitlines():
    print(content)
0 голосов
/ 21 ноября 2018

В первом теге класса content есть и другие, но при условии, что вы хотите до конца пункта 4, т.е. первого дочернего тега p, вы можете использовать комбинатор-потомок с селектором класса для родительского элемента и селектор элемента дляребенок.Снимите p с селектора, если вы действительно хотите все.

import requests
from bs4 import BeautifulSoup

url = 'https://www.104.com.tw/job/?jobno=66wee'
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select_one('.content p').text
print(s)
0 голосов
/ 21 ноября 2018

Вы получаете доступ только к первому элементу p со вторым индексированием [0]:

description = contents[0].findAll('p')[0].text.strip()

Вы должны выполнить итерацию по всем элементам p:

description = ""
for p in contents[0].findAll('p'):
    description += p.text.strip()

print(description)
...