Спасибо за подсказку, Станислав. Это мое решение:
/**
* The method gets inner HTML of given element. If the element is named <code>p-implied</code>
* or <code>content</code>, it returns null.
* @param e element
* @param d document containing given element
* @return the inner HTML of a HTML tag or null, if e is not a valid HTML tag
* @throws IOException
* @throws BadLocationException
*/
public String getInnerHtmlOfTag(Element e, Document d) throws IOException, BadLocationException {
if (e.getName().equals("p-implied") || e.getName().equals("content"))
return null;
CharArrayWriter caw = new CharArrayWriter();
int i;
final String startTag = "<" + e.getName();
final String endTag = "</" + e.getName() + ">";
final int startTagLength = startTag.length();
final int endTagLength = endTag.length();
write(caw, d, e.getStartOffset(), e.getEndOffset() - e.getStartOffset());
//we have the element but wrapped as full standalone HTML code beginning with HTML start tag
//thus we need unpack our element
StringBuffer str = new StringBuffer(caw.toString());
while (str.length() >= startTagLength) {
if (str.charAt(0) != '<')
str.deleteCharAt(0);
else if (!str.substring(0, startTagLength).equals(startTag))
str.delete(0, startTagLength);
else
break;
}
//we've found the beginning of the tag
for (i = 0; i < str.length(); i++) { //skip it...
if (str.charAt(i) == '>')
break; //we've found end position of our start tag
}
str.delete(0, i + 1); //...and eat it
//skip the content
for (i = 0; i < str.length(); i++) {
if (str.charAt(i) == '<' && i + endTagLength < str.length() && str.substring(i, i + endTagLength).equals(endTag))
break; //we've found the end position of inner HTML of our tag
}
str.delete(i, str.length()); //now just remove all from i position to the end
return str.toString().trim();
}
Этот метод может быть легко изменен для получения внешнего HTML-кода (поэтому код, содержащий весь тег).