Как разобрать HTML-тег, используя DOM Parser в Java - PullRequest
2 голосов
/ 15 ноября 2010

Я получаю значение от веб-службы, мой атрибут имеет вид тега html, а специальный символ может кто-нибудь сказать, как проанализировать значение

Я получаю это исключение при разборе значения с использованием dom parser

org.xml.sax.SAXParseException: Attr.value missing f. WIDOWS: (position:START_TAG <ARTICLE ARTICLE_ID='23221' HIDE_HEADER='0' MIGRATED='0' CITNART_DOC_REGION_INFO='' ISCSUSER='1' ARTICLE_TYPE_ID='31' ARTICLE_TYPE='Mobile- News and Commentary - Europe' CITN_ISSUE_NUMBER='' CITN_ARTICLE_TYPE_ID='' CITN_ARTICLE_TYPE='' SHOW_AUTH='1' LOGO_TYPE='QUEST' TITLE='Elementis - europe' DATE='2010-11-04T11:58:21.387' BODY='<span style=' WIDOWS:='null'>@1:726 in java.io.StringReader@43d85268) 

мои значения от веб-сервисов

<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><getDataResponse xmlns="http://tempuri.org/QuestIPhoneWebService/QuestIPhoneWebService"><getDataResult><ROOT xmlns:sql="urn:schemas-microsoft-com:xml-sql"><ARTICLE ARTICLE_ID="23221" HIDE_HEADER="0" MIGRATED="0" CITNART_DOC_REGION_INFO="" ISCSUSER="1" ARTICLE_TYPE_ID="31" ARTICLE_TYPE="Mobile- News and Commentary - Europe" CITN_ISSUE_NUMBER="" CITN_ARTICLE_TYPE_ID="" CITN_ARTICLE_TYPE="" SHOW_AUTH="1" LOGO_TYPE="QUEST" TITLE="Elementis - europe" DATE="2010-11-04T11:58:21.387" BODY="<span style="WIDOWS: 2; TEXT-TRANSFORM: none; TEXT-INDENT: 0px; BORDER-COLLAPSE: separate; FONT: medium 'Times New Roman'; WHITE-SPACE: normal; ORPHANS: 2; LETTER-SPACING: normal; COLOR: rgb(0,0,0); WORD-SPACING: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: none; -webkit-text-stroke-width: 0px" class="Apple-style-span"><span class="Apple-style-span">
11-15 11:39:09.949: INFO/System.out(224): <p style="LINE-HEIGHT: 11pt" class="MsoNormal"><span lang="EN-US">At the end of 2008, the FTSE350 chemical sector consisted of just two names è Johnston Matthey and Croda. Since then we have had the admission of Victrex and, as of last week, Elementis and Yule Catto. Having met management, we believe that Elementis has all the ingredients for value creation that Croda has so successfully exhibited.</span></p>
11-15 11:39:09.949: INFO/System.out(224): <p style="LINE-HEIGHT: 11pt" class="MsoNormal"><span lang="EN-US">Being promoted into the FTSE250 opens Elementis up to a whole new investment audience. It has not just got there through a cyclical bounce back either. The company has gone through a very sensible rationalisation programme, exited a low-returning business (UK Chromium), is running much more efficient levels of working capital, and crucially, is more exposed to growth markets. To give an idea of managementès resolve, instead of selling the UK Chromium business they decided to effectively bulldoze the site. This will prevent a competitor from interfering in Elementisè position in US Chromium.</span></p>
11-15 11:39:09.961: INFO/System.out(224): <p style="LINE-HEIGHT: 11pt" class="MsoNormal"><span lang="EN-US">During the credit crunch Elementis picked up an Asian-focused speciality chemicals business called Deuchem for Å£38m (Å£45m sales). Deuchem has 12 offices in<?xml:namespace prefix = st1 /><st1:country-region><st1:place>China</st1:place></st1:country-region><span class="Apple-converted-space">&nbsp;</span>and is benefiting as the Chinese customer moves up the quality/performance scale. Previously, Chinese demand was not for sophisticated products è this is changing as we type. Coatings are the main market for speciality products, with Oilfield Chemicals the next biggest category. The cost of Elementisè products per end unit remains small, typically <5%. Yet the relationship with the customer (its largest is Akzo Nobel) is generally one that has been forged over many years (even decades) and required them to work closely together. In short, it is not particularly competitive, but does require consistent delivery and performance from Elementis. We have a very conservative top-line growth forecast of 3% for specialty chemicals, yet would not be surprised if it was nearer 5%. Margin progression here is key and we expect a mid-to-high teens margin up from 9%.</span></p>
11-15 11:39:09.980: INFO/System.out(224): <p style="LINE-HEIGHT: 11pt" class="MsoNormal"><span lang="EN-US">Another growth area is shale gas. Elementis makes the lubricant for the drill bit. Typically, drilling was vertical. But, now drill bits can be turned 90 degrees accessing much more of the shale seam. This requires much more lubricant è hence H1 2010 volumes were double the year before. There is only one competitor in this area. Elsewhere in the US Elementis has its US Chromium business. This is steady, has high<span class="Apple-converted-space">&nbsp;</span><st1:country-region>US</st1:country-region><span class="Apple-converted-space">&nbsp;</span>market shares and has a superior transport advantage to competitors exporting to the<span class="Apple-converted-space">&nbsp;</span><st1:country-region><st1:place>US</st1:place></st1:country-region>. This is a solid business growing at 3% with a 15% operating margin.</span></p>
11-15 11:39:09.980: INFO/System.out(224): <p style="LINE-HEIGHT: 11pt" class="MsoNormal"><span lang="EN-US">Since the credit crunch the CFO has tightened up inventory management and creditor days. This has helped to transfer c.ţ25m of value to shareholders, a vital step in maximizing returns for shareholders. On a separate note management think there is a chance that an EU fine worth ţ21m that Elementis has paid could be reversed.</span></p>
11-15 11:39:09.990: INFO/System.out(224): <p style="LINE-HEIGHT: 11pt" class="MsoNormal"><span lang="EN-US">Weève updated the Modeller approach we used in last monthès CITN note è <a href="http://www.csquest.com/QUEST?uid=MAIL&Tp=Cn&PCF=CNAR&ID=23243" target="_blank">Itès Elementary</a>è¡. Instead of using a<span class="Apple-converted-space">&nbsp;</span><a href="http://www.csquest.com/QUEST?clpg=ART&id=13586&clid=&pg=MDL&spl=&cid=0241854" target="_blank">central valuation (100p)</a><span class="Apple-converted-space">&nbsp;</span>è half way between the<a href="http://www.csquest.com/QUEST?clpg=ART&id=13629&clid=&pg=MDL&spl=&cid=0241854" target="_blank">bull (135p)</a><span class="Apple-converted-space">&nbsp;</span>and bear (67p) scenarios è since seeing management, weère now happier using a valuation halfway between the bull case and the central case. Given this renewed confidence, we think this 118p adjusted valuation is very credible indeed. With 24% upside to Fridayès close, Elementis is a buy.</span></p>
11-15 11:39:10.000: INFO/System.out(224): <p>
11-15 11:39:10.010: INFO/System.out(224): <table style="WIDTH: 345.75pt; BORDER-COLLAPSE: collapse; MARGIN-LEFT: 4pt" class="MsoTableGrid" border="0" cellspacing="0" cellpadding="0" width="461">
11-15 11:39:10.020: INFO/System.out(224): <tbody>
11-15 11:39:10.020: INFO/System.out(224): <tr>
11-15 11:39:10.020: INFO/System.out(224): <td style="PADDING-BOTTOM: 0cm; PADDING-LEFT: 5.4pt; WIDTH: 345.75pt; PADDING-RIGHT: 5.4pt; PADDING-TOP: 0cm" valign="top" width="461">
11-15 11:39:10.029: INFO/System.out(224): <p style="LINE-HEIGHT: 11pt; MARGIN: 0.75pt 0cm 0.75pt -3.95pt" class="MsoNormal"><b><span lang="EN-US">Sales Team</span></b><span lang="EN-US"><span class="Apple-converted-space">&nbsp;</span><a href="mailto:salesteam@collinsstewart.com" target="_blank">salesteam@collinsstewart.com</a>, Tel: +44 (0) 20 7523 8493</span></p></td></tr></tbody></table></p></span></span>" IS_PROTECTED="0" PDF_NAME="" REFERENCE_CITN_ARTICLE_ID="23221" ISNEWARTICLE="5" HYPERLINK="/PATH/23221.pdf"><SUMMARY>Elementis Europe Summary</SUMMARY><AUTHORS/></ARTICLE><ASSOCIATED_COMPANIES ARTICLE_ID="23221"/><COMPANIES_WITH_AUTH context="COMPANIES"/></ROOT>
11-15 11:39:10.029: INFO/System.out(224): </getDataResult></getDataResponse></soap:Body></soap:Envelope>

Может ли кто-нибудь сказать, как анализировать специальный символ и HTML-теги?

любая помощь будет оценена

Ответы [ 3 ]

1 голос
/ 15 ноября 2010

Я скопировал ваш образец, отформатировал его и ИМХО понял проблему. Ваш XML является ответом SOAP, который содержит HTML. HTML-код удерживается атрибутом BODY тега ARTICLE.

Содержимое тегов XML не может содержать несколько запрещенных символов, таких как ", ', <,> и т. Д., Но ваш контент содержит много таких символов, потому что это HTML. Для отправки HTML у вас есть экранированные запрещенные символы, то есть заменить <от <</p>

по> ' от & "от & qout;

Я имею в виду делать это, когда вы генерируете ответ, а не когда вы его анализируете! Удачи.

1 голос
/ 11 июля 2011

Если вы имеете дело с плохо отформатированным HTML, вы можете попробовать JSoup .Тем не менее, вам следует связаться с тем, кто предоставляет веб-сервис, и попросить разъяснений о том, что вы получаете: необходимо внедрить HTML-код должным образом, иначе вы никогда не доберетесь до конца.

0 голосов
/ 15 ноября 2010

Я предполагаю, что ваша проблема в том, что ваш HTML - это просто HTML, а не XHTML, и поэтому не может быть проанализирован анализатором XML.

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...