Извлечение CDATA из RSS-канала с помощью NodeJS - PullRequest
0 голосов
/ 10 сентября 2018

Я использую feedparser версии 2.2.9 для разбора ленты: "https://www.veganlifemag.com/feed/".

Что касается тега описания RSS-канала, он содержит контент HTML (CDATA) и теги, которые заключают в скобки контент, который мне нужен для извлечения. Мне было интересно, есть ли способ извлечь контент или конкретный контент в CDATA.

спасибо заранее,

Jerry

Пример RSS-канала

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
<title>VegNews.com (News)</title>
<description></description>
<link>https://vegnews.com/news</link>
<language>en</language>
<item>
  <title>London Fashion Week Will Be Fur-Free This Year for the First Time</title>
  <category>News</category>
  <pubDate>Mon, 10 Sep 2018 01:50:00 -0700</pubDate>
  <link>https://vegnews.com/2018/9/london-fashion-week-will-be-fur-free-this-year-for-the-first-time</link>
  <guid>https://vegnews.com/2018/9/london-fashion-week-will-be-fur-free-this-year-for-the-first-time</guid>
  <description>
    <![CDATA[<img src="https://vegnews.com/media/W1siZiIsIjEyOTE1L1ZlZ05ld3MuRmFzaGlvbkxvbmRvbi5wbmciXSxbInAiLCJ0aHVtYiIsIjgwMHg0NzMjIix7ImZvcm1hdCI6ImpwZyJ9XSxbInAiLCJvcHRpbWl6ZSJdXQ/VegNews.FashionLondon.png?sha=ec3755007e36522e" /><p>Anticipated event London Fashion Week (LFW) kicks off September 14, this year with no fur in sight. While LFW did not impose a ban on fur, every designer that will present their collections this year has adopted a fur-free policy, including last-minute holdout Burberry. After more than a decade of pressure from animal-rights organizations, including <a href="http://www.hsi.org/" target="_blank" rel="noopener">Humane Society International UK</a> and <a href="https://www.peta.org/" target="_blank" rel="noopener">People for the Ethical Treatment of Animals</a>, Burberry announced this month that it would no longer use fur in its collections and appointed Riccardo Tisci as its new creative director to phase out any remaining fur items. &ldquo;I don&rsquo;t think it is compatible with modern luxury and with the environment in which we live, and Riccardo has a very strong view as well on this,&rdquo; LFW CEO Marco Gobbetti told <a href="https://www.businessoffashion.com/articles/professional/burberry-stops-destroying-product-and-bans-real-fur" target="_blank" rel="noopener"><em>Business of Fashion</em></a>. &ldquo;It&rsquo;s part of what Burberry is today.&rdquo; Similarly, animal fur is falling out of favor in the United States. Earlier this year, American designer <a href="https://vegnews.com/2018/3/dkny-and-donna-karan-go-fur-free" target="_blank" rel="noopener">Donna Karan</a> pledged to eliminate the material from her future collections, and the city of <a href="https://vegnews.com/2018/3/san-francisco-bans-fur-sales" target="_blank" rel="noopener">San Francisco</a> joined <a href="https://vegnews.com/2013/9/west-hollywood-says-no-to-real-fur-in-fashion" target="_blank" rel="noopener">West Hollywood</a> and <a href="https://vegnews.com/2017/4/berkeley-prohibits-fur-sales-citywide" target="_blank" rel="noopener">Berkeley</a> in banning fur sales within city limits.</p>]]>
  </description>
</item>

1 Ответ

0 голосов
/ 11 сентября 2018

CDATA просто означает «Обрабатывать этот контент как обычный текст», поэтому он игнорирует специальное значение символов, которые обычно имеют особое значение в XML (например, < означает «начало тега»).

Значение описания представляет собой фрагмент HTML.Если вы хотите извлечь из него определенный контент, запустите его через анализатор HTML.

...