Автоматизация с использованием Selenium и Web-скребинга с использованием Beautifulsoup в Python - PullRequest
0 голосов
/ 22 мая 2019

Я пытаюсь автоматически загрузить «Show More» на веб-сайт с помощью Selenium, а затем хочу удалить содержимое с помощью Beautifulsoup.

Мой код работает, но не дает желаемых результатов.Я знаю, что делаю что-то не так, но не могу найти это.Для Selenium: мой код нажимает кнопку «Показать больше», но это не соответствует.Как иногда это щелкает 5 раз, а иногда 10 раз.Я хочу, чтобы он работал до последнего "Показать больше".Я не понимаю, что я делаю не так.Для Beautifulsoup: Наряду с загрузкой Show More, я хочу очистить заголовок каждой статьи, но мой код останавливается только после первого щелчка.

import time

import requests

from bs4 import BeautifulSoup

from selenium import webdriver

base = "https://www.nytimes.com"

browser = webdriver.Safari(executable_path = '/usr/bin/safaridriver')

browser.get('https://www.nytimes.com/search?endDate=20190331&query=cybersecurity&sort=newest&startDate=20180401')

soup = BeautifulSoup(browser.page_source,'lxml')

for link in soup.select(".css-138we14 a"):
    resp = requests.get(base + link.get("href"))
    sauce = BeautifulSoup(resp.text, "lxml")
    title = sauce.select_one("h1.css-1j5ig2m.e1h9rw200").text
    print(title)

    while True:
        try:
            show_more = browser.find_element_by_xpath('//button[@type="button"][contains(.,"Show More")]').click()
        except Exception as e:
            print(e)
            break

print("Complete")

time.sleep(10)

browser.quit()

Как я уже говорил, я хочу, чтобы код работал допоследняя кнопка «Показать больше», и я хочу очистить заголовок всех статей (всего 335 статей).

1 Ответ

1 голос
/ 22 мая 2019

Как уже говорилось, вы, возможно, захотите, чтобы он ожидал кликабельного элемента:

Так что-то вроде этого:

import time
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

base = "https://www.nytimes.com"
browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
wait = WebDriverWait(browser, 10)
browser.get('https://www.nytimes.com/search?endDate=20190331&query=cybersecurity&sort=newest&startDate=20180401')

while True:
    try:
        time.sleep(1)
        show_more = wait.until(EC.element_to_be_clickable((By.XPATH, '//button[@type="button"][contains(.,"Show More")]')))  
        show_more.click()
    except Exception as e:
            print(e)
            break    

soup = BeautifulSoup(browser.page_source,'lxml')
search_results = soup.find('ol', {'data-testid':'search-results'})

links = search_results.find_all('a')
for link in links:
    title = link.find('h4').text
    date = link.find_next('time').text
    print(date + ': '+ title)

print("Complete")

browser.quit()

Вывод:

March 31: Bezos’ Security Consultant Accuses Saudis of Hacking the Amazon C.E.O.’s Phone
March 29: In Ukraine, Russia Tests a New Facebook Tactic in Election Tampering
March 29: Huawei Shrugs Off U.S. Clampdown With a $100 Billion Year
March 28: N.S.A. Contractor Arrested in Biggest Breach of U.S. Secrets Pleads Guilty
March 28: Grindr Is Owned by a Chinese Firm, and the U.S. Is Trying to Force It to Sell
March 28: DealBook Briefing: Saudi Arabia Wanted Cash. Aramco Just Obliged.
March 28: Huawei Security ‘Defects’ Are Found by British Authorities
March 25: As Special Counsel, Mueller Kept Such a Low Profile He Seemed Almost Invisible
March 22: Quotation of the Day: In New Age of Digital Warfare, Spies for Any Nation’s Budget
March 22: Coast Guard’s Top Officer Pledges ‘Dedicated Campaign’ to Improve Diversity
March 21: A New Age of Warfare: How Internet Mercenaries Do Battle for Authoritarian Governments
March 21: Facebook Did Not Securely Store Passwords. Here’s What You Need to Know.
March 18: Homeland Security Chief Cites Top Threat to U.S. (It’s Not the Border)
March 18: Nielsen Warns Against ‘Cyberthugs and Hackers’
March 17: U.S. Campaign to Ban Huawei Overseas Stumbles as Allies Resist
March 13: Vietnam’s Communist Party Ousts Historian Who Criticized Its China Policy
March 11: With Trump’s Budget Out, Democrats Must Now Show Their Cards
March 10: U.S. and China Near Currency Deal, but Provisions May Not Be New
March 8: Facebook Announces Plan to Curb Vaccine Misinformation
March 7: DealBook Briefing: Facebook Prioritizes Privacy. Can It Deliver?
March 7: Locking More Than the Doors as Cars Become Computers on Wheels
March 7: Huawei Sues U.S. Government Over What It Calls an Unfair Ban
March 6: Trump’s 5G Plan Is More Than a Gift to His Base
March 4: China, Huawei, Michael Jackson: Your Tuesday Briefing
March 4: Alphabet’s Security Start-Up Wants to Offer History Lessons
March 4: Huawei Said to Be Preparing to Sue the U.S. Government
March 4: Venezuela, India, North Korea: Your Monday Briefing
March 3: As Trump and Kim Met, North Korean Hackers Hit Over 100 Targets in U.S. and Ally Nations
March 2: Who’s Investigating Justin Trudeau — and What Do They Hope to Find?
March 1: The Week in Tech: How Can America Make the World Shun Huawei?
March 1: After Unpredictable Trump Meeting, Kim Returns to Scripted Form in Vietnam
Feb. 27: As Huawei’s Influence in Canada Grows, Some Fear Spying. Others Just Want Fast Internet.
Feb. 26: Was Russia Treason Trial About U.S. Election Meddling or a Convict’s Revenge?
Feb. 26: U.A.E. to Use Equipment From Huawei Despite American Pressure
Feb. 22: The Week in Tech: Chinese and Iranian Hackers Have Returned
Feb. 22: The Media Is Not the Enemy
Feb. 21: How Israel’s Moon Lander Got to the Launchpad
Feb. 20: Huawei Risks to Britain Can Be Blunted, U.K. Official Says, in a Rebuff to U.S.
Feb. 20: Russian Hackers Targeted European Research Groups, Microsoft Says
Feb. 18: Australia’s Prime Minister Blames ‘Sophisticated State Actor’ for Parliament Hack
Feb. 18: Chinese and Iranian Hackers Renew Their Attacks on U.S. Companies
Feb. 14: Can Berkeley Boycott Amazon?
Feb. 13: The Strange Experience of Being Australia’s First Tech Billionaires
Feb. 13: Turkey, Huawei, Migration: Your Wednesday Briefing
Feb. 12: Huawei Was a Czech Favorite. Now? It’s a National Security Threat.
Feb. 12: Hong Kong, North Korea, U.S.-China Trade: Your Wednesday Briefing
Feb. 11: DealBook Briefing: Brace for Another Government Shutdown
Feb. 10: These 50 Start-Ups May Be the Next ‘Unicorns’
Feb. 10: India, Jeff Bezos, Grammys: Your Monday Briefing
Feb. 8: Huawei Threatens Lawsuit Against Czech Republic After Security Warning
Feb. 8: DealBook Briefing: Jeff Bezos, Blackmail and ‘Below the Belt’ Selfies
Feb. 7: Key Senator Warns of Dangers of Chinese Investment in 5G Networks
Feb. 4: How to Safeguard Your Tech, and Your Money, While Traveling
Jan. 31: Russia’s Playbook for Social Media Disinformation Has Gone Global
Jan. 31: Securing Our Data
Jan. 30: Learning With: ‘In 5G Race With China, U.S. Pushes Allies to Fight Huawei’
Jan. 29: Cybersecurity, Polar Vortex, Kamala Harris: Your Tuesday Evening Briefing
Jan. 29: No People. No Process. No Policy.
Jan. 28: The Case of the Bumbling Spy: A Watchdog Group Gets Him on Camera
Jan. 28: Two-Factor Authentication Might Not Keep You Safe
Jan. 27: Another Side of #MeToo: Male Managers Fearful of Mentoring Women
Jan. 27: In 5G Race With China, U.S. Pushes Allies to Fight Huawei
Jan. 25: The Week in Tech: Silicon Valley Hobnobs in Davos
Jan. 23: World Leaders at Davos Call for Global Rules on Tech
Jan. 23: Lessons for Corporate Boardrooms From Yahoo’s Cybersecurity Settlement
Jan. 22: Did Australia Hurt Phone Security Around the World?
Jan. 22: How Huawei Wooed Europe With Sponsorships, Investments and Promises
Jan. 21: If 5G Is So Important, Why Isn’t It Secure?
Jan. 18: D.N.C. Says It Was Targeted Again by Russian Hackers After ’18 Election
Jan. 17: Facebook Identifies Russia-Linked Misinformation Campaign
Jan. 17: Only One House Republican Represents the Borderland, and He Opposes a Wall
Jan. 15: Hacker for Hire
Jan. 11: E.T.F.s Try to Lure Investors Into Ever Narrower Niches
Jan. 11: Poland Arrests 2, Including Huawei Employee, Accused of Spying for China
Jan. 11: El Chapo Trial: Why His I.T. Guy Had a Nervous Breakdown
Jan. 9: A Border Wall to Stop Terrorists? Experts Say That Makes Little Sense
Jan. 8: DealBook Briefing: A Model to Alleviate Student Debt Gains Traction
Jan. 8: German Man Confesses to Hacking Politicians’ Data, Officials Say
Jan. 8: No Tuition, but You Pay a Percentage of Your Income (if You Find a Job) 
Jan. 7: Democrats Faked Online Push to Outlaw Alcohol in Alabama Race
Jan. 6: Who Wants a Market Downturn? These Investors Actually Do
Jan. 5: Is America’s Political Future in San Antonio?
Jan. 4: Marriott Concedes 5 Million Passport Numbers Lost to Hackers Were Not Encrypted
Jan. 4: Hackers Leak Details of German Lawmakers, Except Those on Far Right
Jan. 3: Devices That Will Invade Your Life in 2019 (and What’s Overhyped)
Jan. 2: Why the World Needs America and China to Get Along
Jan. 2: DealBook Briefing: What Could Go Wrong in 2019? Plenty
Dec. 27, 2018: LinkedIn Co-Founder Apologizes for Deception in Alabama Senate Race
Dec. 27, 2018: Our Cellphones Aren’t Safe
Dec. 21, 2018: In 2018, Did Business Get Too Big?
Dec. 21, 2018: The Week in Tech: Hostages in the U.S.-China Tech Cold War
Dec. 20, 2018: U.S. Accuses Chinese Nationals of Infiltrating Corporate and Government Technology
Dec. 19, 2018: Google’s Marketing of Children’s Apps Misleads Parents, Consumer Groups Say
Dec. 19, 2018: ‘I Can English Understand,’ New Official Says. The Swiss Have Their Doubts.
Dec. 19, 2018: DealBook Briefing: Inside Facebook’s Huge Data Giveaway to Its Big Tech Brethren
Dec. 18, 2018: Michael Flynn, Shutdown, China Trade: Your Tuesday Evening Briefing
Dec. 18, 2018: How You Can Help Fight the Information Wars
Dec. 18, 2018: President Xi, K-Pop, Huawei: Your Wednesday Briefing
Dec. 18, 2018: DealBook Briefing: Did Big Tech Lie to Congress About Russian Interference?
Dec. 18, 2018: Russian Trolls Came for Instagram, Too
Dec. 18, 2018: Sprint, T-Mobile Deal Gets Green Light From U.S. Regulators
Dec. 18, 2018: Yes, Russian Trolls Helped Elect Trump
Dec. 18, 2018: Facebook, Twitter and YouTube Withheld Russia Data, Reports Say
Dec. 17, 2018: What We Now Know About Russian Disinformation
Dec. 17, 2018: Five Takeaways From New Reports on Russia’s Social Media Operations
Dec. 17, 2018: How to Make the Trade War Even Worse
Dec. 17, 2018: Voter Suppression and Racial Targeting: In Facebook’s and Twitter’s Words
Dec. 17, 2018: Russian 2016 Influence Operation Targeted African-Americans on Social Media
Dec. 12, 2018: Cohen Sentencing, Brexit, China Trade: Your Wednesday Evening Briefing
Dec. 12, 2018: Theresa May, China, Michael Cohen: Your Thursday Briefing
Dec. 12, 2018: DealBook Briefing: How Trump Plans to Keep China In Line on Trade
Dec. 12, 2018: China Says Detained Canadian Worked for Group Without Legal Registration
Dec. 11, 2018: Marriott Data Breach Is Traced to Chinese Hackers as U.S. Readies Crackdown on Beijing
Dec. 7, 2018: The Week in Tech: Facebook Is in the News. Again.
Dec. 7, 2018: U.S.-China Friction Threatens to Undercut the Fight Against Climate Change
Dec. 6, 2018: Teenagers in The Times: November 2018
Dec. 5, 2018: Rudy Giuliani Says Twitter Sabotaged His Tweet. Actually, He Did It Himself.
Dec. 4, 2018: House Republican Campaign Committee Says It Was Hacked This Year
Dec. 3, 2018: Kicked Out of Port Authority, Bieber Bus Got a Prime Stop on a Crowded Curb
Nov. 30, 2018: G-20, Marriott, Immigration: Your Friday Evening Briefing
Nov. 30, 2018: Marriott Hacking Exposes Data of Up to 500 Million Guests
Nov. 29, 2018: DealBook Briefing: The Fed’s Chair Sent the Markets Soaring
Nov. 29, 2018: N.Y. Today: Trump vs. Cuomo, Not So Much
Nov. 29, 2018: After a Hiatus, China Accelerates Cyberspying Efforts to Obtain U.S. Technology
Nov. 28, 2018: Iranians Accused in Cyberattacks, Including One That Hobbled Atlanta
Nov. 28, 2018: A Plan to Turn New York Into a Capital of Cybersecurity
Nov. 22, 2018: Time to Make the Donates!
Nov. 22, 2018: How Facebook’s P.R. Firm Brought Political Trickery to Tech
Nov. 21, 2018: Manufacturers Remain Slow to Recognize Cybersecurity Risks
Nov. 20, 2018: A Perfect Target for Cybercriminals 
Nov. 19, 2018: DealBook Briefing: Nissan’s Chairman Faces Criminal Charges Over Secret Compensation
Nov. 16, 2018: Justin Trudeau’s Official Fixer-Upper
Nov. 16, 2018: What Facebook Knew and Tried to Hide
Nov. 16, 2018: Brexit, Macedonia, Facebook: Your Friday Briefing
Nov. 15, 2018: Brexit, Saudi Arabia, Chinese Hospitals: Your Friday Briefing
Nov. 15, 2018: Minister in Charge of Japan’s Cybersecurity Says He Has Never Used a Computer
Nov. 14, 2018: Learning to Attack the Cyberattackers Can’t Happen Fast Enough
Nov. 14, 2018: How Do You Get Students to Think Like Criminals?
Nov. 13, 2018: Georgia’s Shaky Voting System
Nov. 13, 2018: DealBook Briefing: WeWork Might Be Too Big to Fail
Nov. 11, 2018: How a Former Canadian Spy Helps Wall Street Mavens Think Smarter
Nov. 11, 2018: This Week’s Wedding Announcements
Nov. 11, 2018: Ioanna Kefalas, Alexander Niejelow
Nov. 8, 2018: DealBook Briefing: Why Corporate America Is Content With the Midterms
Nov. 7, 2018: The Mad Dash to Find a Cybersecurity Force
Nov. 7, 2018: Russian Trolls Were at It Again Before Midterms, Facebook Says
Nov. 7, 2018: Antonio Delgado Upsets John Faso as 3 House Republicans Fall to N.Y. Democrats
Nov. 6, 2018: Russians Meddling in the Midterms? Here’s the Data
Nov. 6, 2018: Georgia Governor’s Race Is Hurtling Toward Election Day, and Passions Are Rising
Nov. 4, 2018: Consulting Firms Keep Lucrative Saudi Alliance, Shaping Crown Prince’s Vision
Nov. 1, 2018: Mystery of the Midterm Elections: Where Are the Russians?
Nov. 1, 2018: ‘I Am Not an Internet Troll’
Oct. 30, 2018: Chinese Military May Gain From Western University Ties, Report Warns
Oct. 25, 2018: 4 Women Try to Unseat House Republicans in N.Y.; Donors and Celebrities Take Notice
Oct. 24, 2018: Workforce Trends Impacting Deals: Are You Ready?
Oct. 23, 2018: Hack of Saudi Petrochemical Plant Was Coordinated From Russian Institute
Oct. 23, 2018: U.S. Begins First Cyberoperation Against Russia Aimed at Protecting Elections
Oct. 22, 2018: Trump May Revive the Cold War, but China Could Change the Dynamics
Oct. 22, 2018: DealBook Briefing: It’s Tough to Quit Saudi Arabia
Oct. 21, 2018: This Week’s Wedding Announcements
Oct. 21, 2018: Elena Welt, Jason Burke
Oct. 20, 2018: America’s Elections Could Be Hacked. Go Vote Anyway.
Oct. 19, 2018: Saudi Arabia Says Jamal Khashoggi Was Killed in Consulate Fight
Oct. 19, 2018: Five Artificial Intelligence Insiders in Their Own Words
Oct. 16, 2018: Why It’s So Hard to Punish Companies for Data Breaches
Oct. 15, 2018: IBM Takes Cybersecurity Training on the Road
Oct. 15, 2018: A Genocide Incited on Facebook, With Posts From Myanmar’s Military
Oct. 12, 2018: U.S. Stocks Became Expensive. Are Other Countries Better Bets?
Oct. 12, 2018: Facebook Hack Included Search History and Location Data of Millions
Oct. 11, 2018: Internet Hacking Is About to Get Much Worse
Oct. 10, 2018: New U.S. Weapons Systems Are a Hackers’ Bonanza, Investigators Find
Oct. 10, 2018: DealBook Briefing: Sears May Be on the Brink of Bankruptcy
Oct. 9, 2018: She’s a Gun-Owning Democrat. Her Opponent Calls Her an Extreme Liberal.
Oct. 8, 2018: Google Plus Will Be Shut Down After User Information Was Exposed
Oct. 8, 2018: The S.E.C. Dusts Off a Never-Used Cyber Enforcement Tool
Oct. 8, 2018: Australia Should Reverse Its Huawei 5G Ban
Oct. 6, 2018: Hackers, Good and Bad
Oct. 5, 2018: Cybersecurity Risks Should Weigh on Investors’ Minds More Often
Oct. 5, 2018: Will China Hack the U.S. Midterms?
Oct. 4, 2018: Kavanaugh, China, the Nobel Peace Prize: Your Friday Briefing
Oct. 3, 2018: Setting Up Your Tech on the Assumption You’ll Be Hacked
Oct. 3, 2018: DealBook Briefing: How Trump Reaped Riches From His Father
Oct. 2, 2018: Trump’s Reckless Cybersecurity Strategy
Sept. 30, 2018: This Week’s Wedding Announcements
Sept. 30, 2018: Jennifer Berry, Travis Jarae
Sept. 28, 2018: Facebook Security Breach Exposes Accounts of 50 Million Users
Sept. 27, 2018: Your Thursday News Briefing: Child Poverty, Brett Kavanaugh, United Nations
Sept. 26, 2018: Our Investigative Reporters Explain the Trump-Russia Story 
Sept. 26, 2018: DealBook Briefing: Trump Rails Against Globalism
Sept. 26, 2018: Brett Kavanaugh, Bill Cosby, Dunkin’ Donuts: Your Wednesday Briefing
Sept. 26, 2018: The Crisis of Election Security
Sept. 25, 2018: Is a New Russian Meddling Tactic Hiding in Plain Sight?
Sept. 24, 2018: When Reporting on Defcon, Avoid Stereotypes and A.T.M.s
Sept. 22, 2018: For Hackers, Anonymity Was Once Critical. That’s Changing.
Sept. 22, 2018: Billionaire Backer of Maria Butina Had Russian Security Ties
Sept. 21, 2018: Tran Dai Quang, Hard-Line Vietnamese President, Dies at 61
Sept. 21, 2018: DealBook Briefing: Does Bank of America Care About Investment Banking?
Sept. 20, 2018: The Plot to Subvert an Election: Unraveling the Russia Story So Far
Sept. 20, 2018: The Plot to Subvert an Election: Unraveling the Russia Story So Far
Sept. 19, 2018: Inside Facebook’s Election ‘War Room’
Sept. 17, 2018: Can Ethiopia’s New Leader, a Political Insider, Change It From the Inside Out?
Sept. 10, 2018: Role Models Tell Girls That STEM’s for Them in New Campaign
Sept. 7, 2018: A Security Expert Tied to WikiLeaks Vanishes, and the Internet Is Abuzz
Sept. 5, 2018: AnchorFree, Maker of a Top Online Privacy App, Raises $295 Million
Sept. 5, 2018: ‘Five Eyes’ Nations Quietly Demand Government Access to Encrypted Data
Sept. 4, 2018: Australia Wants to Take Government Surveillance to the Next Level
Aug. 31, 2018: Once Bipartisan, an Election Security Bill Collapses in Rancor
Aug. 29, 2018: The Fourth Season of ‘Mr. Robot’ Will Be Its Last
Aug. 28, 2018: In Melbourne tech firms take the first crack at tomorrow
Aug. 28, 2018: Corrections: August 28, 2018
Aug. 26, 2018: This Week’s Wedding Announcements
Aug. 26, 2018: Evita Almassi, Christopher Main
Aug. 25, 2018: For a Working-Mom Reporter, ‘The Juggle’ Is Real
Aug. 24, 2018: The Week in Tech: Democracy Under Siege
Aug. 24, 2018: California Today: A Rare Look Inside Steve Jobs’s Family
Aug. 23, 2018: Jeff Sessions, Hawaii, Reality Winner: Your Thursday Evening Briefing
Aug. 23, 2018: Malcolm Turnbull, Trade War, Amazon Tribe: Your Friday Briefing
Aug. 23, 2018: Google Deletes 39 YouTube Channels Linked to Iranian Influence Operation
Aug. 23, 2018: Attempted Hacking of Voter Database Was a False Alarm, Democratic Party Says
Aug. 23, 2018: Paul Manafort, Hawaii, Urban Meyer: Your Thursday Briefing
Aug. 23, 2018: How FireEye Helped Facebook Spot a Disinformation Campaign
Aug. 22, 2018: Democratic Party Says It Thwarted Attempted Hack of Voter Database
Aug. 22, 2018: Donald Trump, Duncan Hunter, Hawaii: Your Wednesday Briefing
Aug. 22, 2018: Facebook Identifies New Influence Operations Spanning Globe
Aug. 21, 2018: New Russian Hacking Targeted Republican Groups, Microsoft Says
Aug. 17, 2018: The Week in Tech: When to Tweet
Aug. 15, 2018: Hold the Phone! My Unsettling Discoveries About How Our Gestures Online Are Tracked
Aug. 14, 2018: Uber Picks N.S.A. Veteran to Fix Troubled Security Team
Aug. 13, 2018: Tesla Board Surprised by Elon Musk’s Tweet on Taking Carmaker Private
Aug. 11, 2018: Brian Kemp, Enemy of Democracy 
...
...