Я пытаюсь следовать решению для преобразования DatexII в панды Dataframe, данное в этом ответе: https://stackoverflow.com/a/47357282/5449497
Но я не могу понять, как настроить нужный файл xslt.
Мой XML-файл выглядит так:
<?xml version='1.0' encoding='UTF-8'?>
<d2LogicalModel modelBaseVersion="2" xmlns="http://datex2.eu/schema/2/2_0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datex2.eu/schema/2/2_0 http://bast.s3.amazonaws.com/schema/1412764802683/DATEXII_DaV-MDM-001_dyn.xsd" xsi:type="D2LogicalModel">
<exchange>
<supplierIdentification>
<country>de</country>
<nationalIdentifier>DE-MDM-Landesbetrieb Straßenbau NRW, Verkehrszentrale</nationalIdentifier>
</supplierIdentification>
</exchange>
<payloadPublication lang="DE" xsi:type="ElaboratedDataPublication">
<publicationTime>2018-02-17T23:59:42.364+01:00</publicationTime>
<publicationCreator>
<country>de</country>
<nationalIdentifier>DE-MDM-Landesbetrieb Straßenbau NRW, Verkehrszentrale</nationalIdentifier>
</publicationCreator>
<periodDefault>60.0</periodDefault>
<timeDefault>2018-02-17T23:59:42.364+01:00</timeDefault>
<headerInformation>
<confidentiality>noRestriction</confidentiality>
<informationStatus>real</informationStatus>
</headerInformation>
<referenceSettings>
<predefinedNonOrderedLocationGroupReference id="dav.nw.mq" targetClass="PredefinedNonOrderedLocationGroup" version="201610261425"/>
</referenceSettings>
<elaboratedData>
<basicData xsi:type="TrafficFlow">
<pertinentLocation xsi:type="LocationByReference">
<predefinedLocationReference id="mq.MQ_A1.0816_HFB_SW" targetClass="PredefinedLocation" version="201610261425"/>
</pertinentLocation>
<forVehiclesWithCharacteristicsOf>
<vehicleType>car</vehicleType>
</forVehiclesWithCharacteristicsOf>
<vehicleFlow>
<vehicleFlowRate>600</vehicleFlowRate>
</vehicleFlow>
</basicData>
</elaboratedData>
<elaboratedData>
<basicData xsi:type="TrafficFlow">
<pertinentLocation xsi:type="LocationByReference">
<predefinedLocationReference id="mq.MQ_A1.0816_HFB_SW" targetClass="PredefinedLocation" version="201610261425"/>
</pertinentLocation>
<forVehiclesWithCharacteristicsOf>
<vehicleType>lorry</vehicleType>
</forVehiclesWithCharacteristicsOf>
<vehicleFlow>
<vehicleFlowRate>0</vehicleFlowRate>
</vehicleFlow>
</basicData>
</elaboratedData>
<elaboratedData>
<basicData xsi:type="TrafficSpeed">
<pertinentLocation xsi:type="LocationByReference">
<predefinedLocationReference id="mq.MQ_A1.0816_HFB_SW" targetClass="PredefinedLocation" version="201610261425"/>
</pertinentLocation>
<forVehiclesWithCharacteristicsOf>
<vehicleType>car</vehicleType>
</forVehiclesWithCharacteristicsOf>
<averageVehicleSpeed>
<speed>108.0</speed>
</averageVehicleSpeed>
</basicData>
</elaboratedData>
</payloadPublication>
</d2LogicalModel>
Мой код Python для ноутбука Jupyter выглядит следующим образом:
from io import StringIO
import lxml.etree as et
import pandas as pd
# LOAD XML AND XSL FILES
doc = et.parse('/home/User/Desktop/DataTest/traffic.xml')
xsl = et.parse('/home/User/Desktop/DataTest/traffic.xsl')
# INITIALIZE AND RUN TRANSFORMATION
transform = et.XSLT(xsl)
# CONVERT RESULT TO STRING
result = str(transform(doc))
# IMPORT INTO DATAFRAME
df = pd.read_csv(StringIO(result))
Пока у меня есть следующий XSLT (traffic.xsl):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:pub="http://datex2.eu/schema/2/2_0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="D2LogicalModel">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="d2LogicalModel">
<xsl:apply-templates select="pub:payloadPublication"/>
</xsl:template>
<xsl:template match="pub:payloadPublication">
<xsl:apply-templates select="pub:elaboratedData"/>
</xsl:template>
<xsl:template match="pub:elaboratedData">
<xsl:value-of select="concat(ancestor::pub:payloadPublication/pub:publicationTime,',',
ancestor::pub:payloadPublication/
pub:elaboratedData/pub:basicData/@xsi:type,',',
descendant::pub:vehicleFlowRate,',',
descendant::pub:averageVehicleSpeed/@numberOfInputValuesUsed,',',
descendant::pub:speed)"/><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
И я получаю следующий вывод:
deDE-MDM-Landesbetrieb Straßenbau NRW Verkehrszentrale2018-02-17T23:59:42.364+01:00 TrafficFlow 600 Unnamed: 4 Unnamed: 5
0 2018-02-17T23:59:42.364+01:00 TrafficFlow 0.0 NaN NaN NaN
1 2018-02-17T23:59:42.364+01:00 TrafficFlow 600.0 NaN NaN NaN
Я понятия не имею, как создаются имена столбцов и как получить нужные данные в качестве вывода:
publicationTime predefinedLocationReference vehicleType vehicleFlowRate speed
2018-02-17T23:59:42.364+01:00 mq.MQ_A1.0816_HFB_SW lorry 0 NaN
2018-02-17T23:59:42.364+01:00 mq.MQ_A1.0816_HFB_SW anyvehicle 600 NaN
2018-02-17T23:59:42.364+01:00 mq.MQ_A1.0816_HFB_SW car NaN 108.0
Любая помощь будет принята с благодарностью.