Селен здесь не нужен.Просто потяните данные, итерируйте их, чтобы очистить и напечатать:
import requests
import bs4
url = "http://www.mortgagenewsdaily.com/directory/mortgage/alabama"
data=requests.get(url)
soup=bs4.BeautifulSoup(data.text,'html.parser')
page = soup.find_all('div', class_="BusinessListingUser")
for each in page:
content = each.find('div', class_='ListingDetails').text.split('\n')
content = [ text.strip() for text in content if text.strip() != '' ]
for strings in content:
print (strings)
print ('\n')
Вывод:
Tyler Tullis
-
Montgomery, Alabama 36117
| (334) 322-3707
Nathan Stotlar
Mortgage Production Manager - PrimeLending, a PlainsCapital Company
Fitchburg, Wisconsin 53717
phone: (608) 467-4249
nathanstotlar.com
Anna Mendonca
Mortgage Loan Originator - CrossCountry Mortgage, Inc
Wakefield , Massachusetts 01880
phone: (781) 618-3154 | (781) 290-6383
myccmhomeloan.com/Default.aspx
Pouyan Broukhim
Owner - Probate Funding, Inc.
Los Angeles, California 90048
phone: (323) 935-5577
probatefunding.com
...
ДОПОЛНИТЕЛЬНО:
import requests
import bs4
import pandas as pd
url = "http://www.mortgagenewsdaily.com/directory/mortgage/alabama"
data=requests.get(url)
soup=bs4.BeautifulSoup(data.text,'html.parser')
page = soup.find_all('div', class_="BusinessListingUser")
results = pd.DataFrame()
for each in page:
content = each.find('div', class_='ListingDetails').text.split('\n')
content = [ text.strip() for text in content if text.strip() != '' ]
try:
name = content[0]
except:
name = 'N/A'
try:
company = content[1]
except:
company = 'N/A'
try:
location = content[2]
except:
location = 'N/A'
try:
phone = content[3]
except:
phone = 'N/A'
try:
website = content[4]
except:
website = 'N/A'
temp_df = pd.DataFrame([[name,company,location,phone,website]], columns = ['name','company','location','phone','website'])
results = results.append(temp_df).reset_index(drop=True)
results.to_excel('C:/file.xlsx', index=False)