текст строки в CSV - PullRequest
       1

текст строки в CSV

0 голосов
/ 01 октября 2019

Я хочу отформатировать строку в CSV. Я удалил данные с веб-сайта с помощью BeautifulSoup и получил полную строку.

результат Зачистил:

Business Objective\n
464 Wholesale of household goods\n
Main Business Activities\n
46493 Wholesale of stationery, books, magazines and newspapers\n

Я пробовал много способов, как:

  1. result = re.findall(r'(?==Business Objective=)(.*)(?=Main Business Activities=)', string)

  2. с использованием объединения

    3. Использование замены строки

Код:

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import requests
import  time
import re
import numpy
import csv
companyName = "MONUMENT BOOKS CO  LTD"
SourceAppCode = "-- Any register --"
browser = webdriver.Chrome("D:\KHIHORT_PROJECTS\YUON_LOTO\chromedriver_win32\chromedriver")
browser.get('https://www.businessregistration.moc.gov.kh/cambodia-master/relay.html?url=https%3A%2F%2Fwww.businessregistration.moc.gov.kh%2Fcambodia-master%2Fservice%2Fcreate.html%3FtargetAppCode%3Dcambodia-master%26targetRegisterAppCode%3Dcambodia-br-companies%26service%3DregisterItemSearch&target=cambodia-master')
browser.find_elements_by_xpath("//input[@name='QueryString']")[0].send_keys(companyName)
time.sleep(0.5)
browser.find_elements_by_xpath("//select[@name='SourceAppCode']")[0].send_keys(SourceAppCode)
time.sleep(0.5)
browser.find_elements_by_xpath("/html[1]/body[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/form[1]/div[1]/div[1]/div[1]/div[1]/div[2]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/a[3]")[0].click()
time.sleep(0.5)
browser.find_elements_by_xpath("//a[@class='registerItemSearch-results-page-line-ItemBox-resultLeft-viewMenu appMenu appMenuItem appMenuDepth0 noSave appItemSearchResult viewInstanceUpdateStackPush appReadOnly appIndex0']")[0].click()
time.sleep(0.5)
ww=browser.find_elements_by_xpath("/html[1]/body[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/form[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[7]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]")
time.sleep(0.5) 

Мойожидаемый результат:

Business Objective,Main Business Activities
464 Wholesale of household goods,"46493 Wholesale of stationery, books, magazines and newspapers"
"581 Publishing of books, periodicals and other publishing activities","58110 Publishing of books, brochures and other publications(2)"

1 Ответ

0 голосов
/ 01 октября 2019

Лучше использовать функцию ожидания селена, а не спать. Но Вы можете вытащить эти строки, выбросить их в кадр данных и записать в виде csv:

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import requests
import  time
import re
import numpy
import csv
companyName = "MONUMENT BOOKS CO  LTD"
SourceAppCode = "-- Any register --"
browser = webdriver.Chrome("C:/chromedriver_win32/chromedriver.exe")
browser.get('https://www.businessregistration.moc.gov.kh/cambodia-master/relay.html?url=https%3A%2F%2Fwww.businessregistration.moc.gov.kh%2Fcambodia-master%2Fservice%2Fcreate.html%3FtargetAppCode%3Dcambodia-master%26targetRegisterAppCode%3Dcambodia-br-companies%26service%3DregisterItemSearch&target=cambodia-master')
browser.find_elements_by_xpath("//input[@name='QueryString']")[0].send_keys(companyName)
time.sleep(0.5)
browser.find_elements_by_xpath("//select[@name='SourceAppCode']")[0].send_keys(SourceAppCode)
time.sleep(0.5)
browser.find_elements_by_xpath("/html[1]/body[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/form[1]/div[1]/div[1]/div[1]/div[1]/div[2]/div[2]/div[1]/div[1]/div[1]/div[2]/div[1]/a[3]")[0].click()
time.sleep(0.5)
browser.find_elements_by_xpath("//a[@class='registerItemSearch-results-page-line-ItemBox-resultLeft-viewMenu appMenu appMenuItem appMenuDepth0 noSave appItemSearchResult viewInstanceUpdateStackPush appReadOnly appIndex0']")[0].click()
time.sleep(0.5)
ww=browser.find_elements_by_xpath("/html[1]/body[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/form[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[5]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/div[7]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]")
time.sleep(0.5) 


soup = BeautifulSoup(browser.page_source, 'html.parser')
ba = soup.find_all('div',{'class':'appRepeaterContent'})[1]

rows = ba.find_all('div',{'class':'appRecordChildren appBlockChildren'})



results = pd.DataFrame()
for row in rows:
    bo = row.find('div',{'class':'appAttrValue'})
    mba = bo.findNext('div',{'class':'appAttrValue'})

    temp_df = pd.DataFrame([[bo.text, mba.text]], columns=['Business Objective','Main Business Activies'])
    results = results.append(temp_df, sort=True).reset_index(drop=True)

results.to_csv('file.csv', index=False)

Вывод:

print (results)
                                   Business Objective                             Main Business Activies
0                    464 Wholesale of household goods  46493 Wholesale of stationery, books, magazine...
1   581 Publishing of books, periodicals and other...  58110 Publishing of books, brochures and other...
2   581 Publishing of books, periodicals and other...  58120 Publishing of mailing lists, telephone b...
3   581 Publishing of books, periodicals and other...  58130 Publishing of newspapers, journals, maga...
4   581 Publishing of books, periodicals and other...  58190 Publishing of catalogs, photos, engravin...
5                 469 Non-specialized wholesale trade  46900 Wholesale of a variety of goods without ...
6                    464 Wholesale of household goods  46431 Wholesale of pharmaceutical and medical ...
7                         521 Warehousing and storage             52100 Warehousing and storage services
8              421 Construction of roads and railways  42101 Construction of streets, roads, bridges ...
9   681 Real estate activities with own or leased ...  68101 Buying, selling, renting and operating o...
10                                854 Other education                     85499 Other education n.e.c(6)
11                                    731 Advertising                               73100 Advertising(1)
12            551 Short term accommodation activities                     55101 Hotels and resort hotels
13  561 Restaurants and mobile food service activi...   56101 Restaurants and restaurant cum night clubs
14     791 Travel agency and tour operator activities                  79110 Travel agency activities(1)
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...