Не можете получить некоторые теги с помощью soup.findAll? - PullRequest
0 голосов
/ 07 декабря 2018

Это HTML-код, как вы можете видеть, есть два тега, т.е. <code>, <img>.Теперь я хочу обратить ваше внимание на то, что когда вы прокрутите маленький вправо, вы увидите тег code сразу после тега img.

Проблема

Теперь главная проблема в том, что я хочу все теги кода, для этого я использую bs4, но Я могу получить теги кода, которые находятся сразу после тегов изображения. Не знаю почему?Любая идея?

<code style="display: none" id="bpr-guid-1535430">
      {&quot;data&quot;:{&quot;mediaConfig&quot;:{&quot;mprConfig&quot;:{&quot;sizes&quot;:[{&quot;width&quot;:60,&quot;height&quot;:30,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:60,&quot;height&quot;:36,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:90,&quot;height&quot;:45,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:90,&quot;height&quot;:54,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:100,&quot;height&quot;:50,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:100,&quot;height&quot;:60,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:100,&quot;height&quot;:100,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:120,&quot;height&quot;:60,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:120,&quot;height&quot;:72,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:127,&quot;height&quot;:30,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:127,&quot;height&quot;:46,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:150,&quot;height&quot;:75,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:150,&quot;height&quot;:90,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:191,&quot;height&quot;:45,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:191,&quot;height&quot;:69,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:200,&quot;height&quot;:100,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:200,&quot;height&quot;:120,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:200,&quot;height&quot;:200,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:254,&quot;height&quot;:60,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:254,&quot;height&quot;:92,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:337,&quot;height&quot;:120,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:400,&quot;height&quot;:400,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:506,&quot;height&quot;:180,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:674,&quot;height&quot;:240,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;},{&quot;width&quot;:750,&quot;height&quot;:750,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorSize&quot;}],&quot;filters&quot;:{&quot;cover&quot;:&quot;https://media.licdn.com/mpr/mpr/shrinknp_{width}_{height}{+id}&quot;,&quot;contain&quot;:&quot;https://media.licdn.com/mpr/mpr/shrinknp_{width}_{height}{+id}&quot;,&quot;original&quot;:&quot;https://media.licdn.com/media{+id}&quot;,&quot;fill&quot;:&quot;https://media.licdn.com/mpr/mpr/shrink_{width}_{height}{+id}&quot;,&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorFilters&quot;},&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaProcessorConfig&quot;},&quot;$type&quot;:&quot;com.linkedin.voyager.common.MediaConfig&quot;},&quot;$type&quot;:&quot;com.linkedin.voyager.common.Configuration&quot;},&quot;included&quot;:[]}
    </code>

<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" style="display: none" class="datalet-bpr-guid-1535430"><code style="display: none" id="bpr-guid-1535431">
  {&quot;data&quot;:{&quot;canBrowseProfiles&quot;:false,&quot;reactivationFeaturesEligible&quot;:false,&quot;canViewJobAnalytics&quot;:false,&quot;canViewWVMP&quot;:false,&quot;premiumFreeTrialEligible&quot;:true,&quot;canViewCompanyInsights&quot;:false,&quot;$type&quot;:&quot;com.linkedin.voyager.premium.FeatureAccess&quot;},&quot;included&quot;:[]}
</code>

<code style="display: none" id="datalet-bpr-guid-1535431">
  {"request":"/voyager/api/premium/featureAccess?name\u003DreactivationFeaturesEligible","status":200,"body":"bpr-guid-1535431"}
</code>

<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" style="display: none" class="datalet-bpr-guid-1535431"><code style="display: none" id="bpr-guid-1535432">
  {&quot;data&quot;:{&quot;companies&quot;:[],&quot;$deletedFields&quot;:[&quot;paidProducts&quot;,&quot;postJobsEnabled&quot;],&quot;memberGroup&quot;:&quot;FREE&quot;,&quot;showStaticLearning&quot;:false,&quot;$type&quot;:&quot;com.linkedin.voyager.common.Nav&quot;,&quot;$id&quot;:&quot;M8x5UY0Zt6eGdBCiy+iKhA&#61;&#61;,root&quot;},&quot;included&quot;:[]}
</code>

<code style="display: none" id="datalet-bpr-guid-1535432">
  {"request":"/voyager/api/nav","status":200,"body":"bpr-guid-1535432"}
</code>

Ниже приведен код, который я использую в Python.

h = HTMLParser()

companyname = sys.argv[1]

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0',

}
url = 'https://www.linkedin.com/search/results/all/?keywords='+companyname+'&origin=GLOBAL_SEARCH_HEADER'
req = requests.get(url, headers=headers)
finding = BeautifulSoup(req.content, 'lxml')



for x in finding.findAll('code'):
    print x
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...