Элемент преобразования XML появляется в неправильном месте в документе - PullRequest
1 голос
/ 08 апреля 2010

У меня проблемы с преобразованием XML, и мне нужна помощь.

Таблица стилей должна выполнять итерацию по всем элементам суффикса и размещать содержимое без тега суффикса рядом с последним текстовым узлом в своем первом элементе блока цитаты предка (см. Требуемый вывод). Это работает, когда присутствует только один суффикс, но не когда присутствует 2, когда присутствует 2, он помещает оба суффикса рядом друг с другом в последнем текстовом узле первого блока цитаты.

Есть идеи? Я пытался ограничить выборки ancestor :: quote-block [1] в разных местах, но это не дает желаемого эффекта.

Исходный XML

<paragraph>
    <para>
        <quote-block>
            <list prefix-rules="specified">
                <item prefix="“B42">
                    <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                        reached an agreement to negotiate towards a direct contract for coal haulage
                        by rail (on a DIY basis), which would replace the previous indirect E2E
                        arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
                            <quote-para>‘We did the deal with Edison Mission yesterday morning for
                                LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                                pending a contract.</quote-para>
                            <quote-para><emphasis strength="strong">Enron are now off our hands so
                                    far as Edison are concerned. The Enron flows we have left are to
                                    British Energy’s station at Eggborough; from Immingham, Redcar
                                    and Hull</emphasis>. Also to Enron’s own power station at Wilton
                                – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                                Eggborough traffic until next April when British Energy will,
                                hopefully take over their own coal procurement. <emphasis
                                    strength="strong">But we have got them out of Fiddlers Ferry and
                                    Ferrybridge – a big step forward</emphasis>.’</quote-para>
                            <suffix>(Emphasis added.)</suffix>
                        </quote-block>
                    </para>
                </item>
                <item prefix="B43">
                    <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                        EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                        indirect supplies to EME, one of the new generating companies.”</para>
                </item>
            </list>
            <suffix>(emphasis in original)</suffix>
        </quote-block>
    </para>
</paragraph>

1010 * стилевой *

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://xml.sm.com/schema/cases/report"
    xmlns:sm="http://xml.sm.com/functions" xmlns:saxon="http://saxon.sf.net/"
    xpath-default-namespace="http://sm.com/schema/cases/report"
    exclude-result-prefixes="xs sm" version="2.0">

    <xsl:output method="xml" indent="no"/>

    <xsl:template match="/">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="*">
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>

    <!-- Match quote-blocks with open or close attributes. -->
    <xsl:template match="*[*:quote-block and descendant::*:suffix]">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- Match inline quote with open or close attributes -->
    <xsl:template match="*[*:quote and descendant::*:suffix]">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- Process the quote block -->
    <xsl:template name="process-quote-block">
        <xsl:variable name="quoteBlockCopy">
            <xsl:copy-of select="."/>
        </xsl:variable>

        <xsl:apply-templates select="$quoteBlockCopy" mode="append-suffix">
            <xsl:with-param name="suffix" select="sm:get-suffix-note(.)"/>
            <xsl:with-param name="end-node" select="sm:get-last-text-node($quoteBlockCopy)"/>
        </xsl:apply-templates>
    </xsl:template>

    <!-- Match quote-blocks with open or close attributes. -->
    <xsl:template match="*[*:quote-block and descendant::*:suffix][ancestor::*:quote-block[1]]" mode="create-copy">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- Match inline quote with open or close attributes -->
    <xsl:template match="*[*:quote and descendant::*:suffix]" mode="create-copy">
        <xsl:call-template name="process-quote-block"/>
    </xsl:template>

    <!-- This will match all elements. Just copy and pass through the parameters. -->
    <xsl:template match="*" mode="append-suffix">
        <xsl:param name="suffix"/>
        <xsl:param name="end-node"/>
        <xsl:copy>
            <xsl:copy-of select="@*"/>
            <xsl:apply-templates mode="append-suffix">
                <xsl:with-param name="suffix" select="$suffix"/>
                <xsl:with-param name="end-node" select="$end-node"/>
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>

    <!-- Apply the text node to the content. If the node is equal to the last node then append the descendants of suffix  -->
    <xsl:template match="text()[normalize-space() != '']" mode="append-suffix">
        <xsl:param name="suffix"/>
        <xsl:param name="end-node"/>
        <xsl:choose>
            <xsl:when test="count(. | $end-node) = 1">
                <xsl:value-of select="."/>
                <xsl:apply-templates select="$suffix"/>
            </xsl:when>
            <xsl:otherwise>
                <!-- Or maybe neither. -->
                <xsl:value-of select="."/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <!--  Dont copy suffix as -->
    <xsl:template match="*:suffix" mode="append-suffix"/>

    <xsl:function name="sm:get-suffix-note">
        <xsl:param name="node"/>
        <xsl:sequence select="$node/descendant::*:suffix/node()"/>
    </xsl:function>

    <xsl:function name="sm:get-last-text-node">
        <!--  Finds last non-empty text() node, ignoring <suffix> elements that are a child of this specific quote-block. -->
        <xsl:param name="node"/>

        <xsl:sequence
            select="reverse($node//text()[not(ancestor::*:suffix) and normalize-space() != ''])[1]"/>
    </xsl:function>

</xsl:stylesheet>

Текущий вывод XML

<paragraph>
    <para>
        <quote-block>
            <list prefix-rules="specified">
                <item prefix="“B42">
                    <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                        reached an agreement to negotiate towards a direct contract for coal haulage
                        by rail (on a DIY basis), which would replace the previous indirect E2E
                        arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
                            <quote-para>‘We did the deal with Edison Mission yesterday morning for
                                LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                                pending a contract.</quote-para>
                            <quote-para><emphasis strength="strong">Enron are now off our hands so
                                    far as Edison are concerned. The Enron flows we have left are to
                                    British Energy’s station at Eggborough; from Immingham, Redcar
                                    and Hull</emphasis>. Also to Enron’s own power station at Wilton
                                – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                                Eggborough traffic until next April when British Energy will,
                                hopefully take over their own coal procurement. <emphasis
                                    strength="strong">But we have got them out of Fiddlers Ferry and
                                    Ferrybridge – a big step forward</emphasis>.’</quote-para>
                        </quote-block>
                    </para>
                </item>
                <item prefix="B43">
                    <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                        EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                        indirect supplies to EME, one of the new generating companies.”(Emphasis
                        added.)(emphasis in original)</para>
                </item>
            </list>

        </quote-block>
    </para>
</paragraph>

Желаемый выход

<paragraph>
    <para>
        <quote-block>
            <list prefix-rules="specified">
                <item prefix="“B42">
                    <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                        reached an agreement to negotiate towards a direct contract for coal haulage
                        by rail (on a DIY basis), which would replace the previous indirect E2E
                        arrangements that EME had in place with ECSL. An internal EWS e-mail noted: <quote-block>
                            <quote-para>‘We did the deal with Edison Mission yesterday morning for
                                LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                                pending a contract.</quote-para>
                            <quote-para><emphasis strength="strong">Enron are now off our hands so
                                    far as Edison are concerned. The Enron flows we have left are to
                                    British Energy’s station at Eggborough; from Immingham, Redcar
                                    and Hull</emphasis>. Also to Enron’s own power station at Wilton
                                – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                                Eggborough traffic until next April when British Energy will,
                                hopefully take over their own coal procurement. <emphasis
                                    strength="strong">But we have got them out of Fiddlers Ferry and
                                    Ferrybridge – a big step forward</emphasis>.’(Emphasis
                                added.)</quote-para>
                        </quote-block>
                    </para>
                </item>
                <item prefix="B43">
                    <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                        EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                        indirect supplies to EME, one of the new generating companies.”(emphasis in original)</para>
                </item>
            </list>

        </quote-block>
    </para>
</paragraph>

Ответы [ 2 ]

1 голос
/ 08 апреля 2010

Вот простое преобразование, которое решает только проблему . Как заметили другие, проблема указана очень грязно и не допускает однозначного и однозначного толкования.

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:strip-space elements="*"/>

 <xsl:key name="kLastNonSufText"
   match="*[not(self::suffix)]/text()"
   use="generate-id(ancestor::quote-block[1])"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()[ancestor::quote-block]">
  <xsl:copy-of select="."/>

  <xsl:variable name="vQBImmed" select="ancestor::quote-block[1]"/>

  <xsl:variable name="vLastText" select=
   "key('kLastNonSufText', generate-id($vQBImmed))
      [last()]"/>

  <xsl:if test="count(.|$vLastText) = 1">
      <xsl:copy-of select="($vQBImmed//suffix)[last()]/text()"/>
  </xsl:if>
 </xsl:template>

 <xsl:template match="suffix"/>
</xsl:stylesheet>

Когда это преобразование применяется к предоставленному (очень нечитаемому и плохо отформатированному) исходному XML-документу:

<paragraph>
 <para>
  <quote-block>
    <list prefix-rules="specified">
        <item prefix="“B42">
            <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                reached an agreement to negotiate towards a direct contract for coal haulage
                by rail (on a DIY basis), which would replace the previous indirect E2E
                arrangements that EME had in place with ECSL. An internal EWS e-mail noted:
                <quote-block>
                    <quote-para>‘We did the deal with Edison Mission yesterday morning for
                        LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                        pending a contract.</quote-para>
                    <quote-para>
                        <emphasis strength="strong">Enron are now off our hands so
                            far as Edison are concerned. The Enron flows we have left are to
                            British Energy’s station at Eggborough; from Immingham, Redcar
                            and Hull</emphasis>. Also to Enron’s own power station at Wilton
                        – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                        Eggborough traffic until next April when British Energy will,
                        hopefully take over their own coal procurement.
                        <emphasis
                            strength="strong">But we have got them out of Fiddlers Ferry and
                            Ferrybridge – a big step forward</emphasis>.’
                    </quote-para>
                    <suffix>(Emphasis added.)</suffix>
                </quote-block>
            </para>
        </item>
        <item prefix="B43">
            <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                indirect supplies to EME, one of the new generating companies.”</para>
        </item>
    </list>
    <suffix>(emphasis in original)</suffix>
  </quote-block>
 </para>
</paragraph>

вывод имеет необходимые суффиксы, добавленные к нужным текстовым узлам :

<?xml version="1.0" encoding="UTF-16"?><paragraph><para><quote-block><list prefix-rules="specified"><item prefix="“B42"><para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June 2000, EME and EWS
                reached an agreement to negotiate towards a direct contract for coal haulage
                by rail (on a DIY basis), which would replace the previous indirect E2E
                arrangements that EME had in place with ECSL. An internal EWS e-mail noted:
                <quote-block><quote-para>‘We did the deal with Edison Mission yesterday morning for
                        LBT-Fiddlers @ £[…]/tonne as agreed. This rate until 16th September
                        pending a contract.</quote-para><quote-para><emphasis strength="strong">Enron are now off our hands so
                            far as Edison are concerned. The Enron flows we have left are to
                            British Energy’s station at Eggborough; from Immingham, Redcar
                            and Hull</emphasis>. Also to Enron’s own power station at Wilton
                        – 250,000 tonnes/year. I think we are stuck Enron [sic] on the
                        Eggborough traffic until next April when British Energy will,
                        hopefully take over their own coal procurement.
                        <emphasis strength="strong">But we have got them out of Fiddlers Ferry and
                            Ferrybridge – a big step forward</emphasis>.’
                    (Emphasis added.)</quote-para></quote-block></para></item><item prefix="B43"><para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This e-mail is evidence of both
                EWS’s intent and, indeed, its success in stopping ECSL from carrying out
                indirect supplies to EME, one of the new generating companies.”(emphasis in original)</para></item></list></quote-block></para></paragraph>
1 голос
/ 08 апреля 2010

Чувак, ты здесь вырыл себе дыру. ;-) Вот что я придумала:

<xsl:stylesheet 
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:output method="xml" encoding="utf-8" indent="no"/>

  <!-- key to identify all non-empty, non-suffix text node descendants of
       a quote-block. We'll use that to pull out the "last one" later-on -->
  <xsl:key 
    name ="kQbText" 
    match="quote-block//text()[not(normalize-space() = '' or parent::suffix)]"
    use  ="generate-id(ancestor::quote-block[1])"
  />

  <!-- identity template to copy everything that is not otherwise handled -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*" />
    </xsl:copy>
  </xsl:template>

  <!-- special handling for text nodes that are descendants of quote-blocks -->
  <xsl:template match="quote-block//text()[not(normalize-space() = '' or parent::suffix)]">
    <xsl:variable name="qb" select="ancestor::quote-block[1]" />

    <!-- the text node gets copied regardless -->
    <xsl:copy-of select="." />

    <!-- if it is the last non-empty text node, append all suffices -->
    <xsl:if test="
      generate-id() 
      = 
      generate-id( key('kQbText', generate-id($qb))[last()] )
    ">
      <xsl:for-each select="$qb/suffix">
        <xsl:value-of select="concat(' ', .)" />
      </xsl:for-each>
    </xsl:if>
  </xsl:template>

  <!-- empty text nodes will be removed (all others are copied) -->
  <xsl:template match="text()[normalize-space() = '']" />

  <!-- suffix nodes will be deleted-->
  <xsl:template match="suffix" />

</xsl:stylesheet>

Приведенные выше результаты приводят к (отступы и разрывы строк добавляются с помощью tidy , чтобы сделать его читаемым):

<paragraph>
  <para>
    <quote-block>
      <list prefix-rules="specified">
        <item prefix="“B42">
          <para id="0a84d149-91b7-4012-ac6d-9f4eb8ed6c37">In June
          2000, EME and EWS reached an agreement to negotiate
          towards a direct contract for coal haulage by rail (on a
          DIY basis), which would replace the previous indirect E2E
          arrangements that EME had in place with ECSL. An internal
          EWS e-mail noted: 
          <quote-block>
            <quote-para>‘We did the deal with Edison Mission
            yesterday morning for LBT-Fiddlers @ £[…]/tonne as
            agreed. This rate until 16th September pending a
            contract.</quote-para>
            <quote-para>
            <emphasis strength="strong">Enron are now off our hands
            so far as Edison are concerned. The Enron flows we have
            left are to British Energy’s station at Eggborough;
            from Immingham, Redcar and Hull</emphasis>. Also to
            Enron’s own power station at Wilton – 250,000
            tonnes/year. I think we are stuck Enron [sic] on the
            Eggborough traffic until next April when British Energy
            will, hopefully take over their own coal procurement. 
            <emphasis strength="strong">But we have got them out of
            Fiddlers Ferry and Ferrybridge – a big step
            forward</emphasis>.’ (Emphasis added.)</quote-para>
          </quote-block></para>
        </item>
        <item prefix="B43">
          <para id="d64a5a72-0a02-476f-9a7b-7c07bbc93a8a">This
          e-mail is evidence of both EWS’s intent and, indeed, its
          success in stopping ECSL from carrying out indirect
          supplies to EME, one of the new generating companies.”
          (emphasis in original)</para>
        </item>
      </list>
    </quote-block>
  </para>
</paragraph>

Код XSLT здесь - XSLT 1.0, но вы можете запустить его без изменений в процессоре 2.0.

...