Regex для продолжения предложения - PullRequest
0 голосов
/ 08 марта 2010

У меня есть следующий текст, мне нужно извлечь из файла имя исключения и продолжение предложения, но в файле есть непрерывные предложения без пробела.

??????>?????????????????????????????????????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
?????????????????????????????????????????????????????????????????????????????????????????B????!48#$%&'+-/0123????5679<=@
>?CA????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
????????????????RootEntry?????????F?|?`???__nameid_version10?????????|?`???|?`??__substg10_00020102????????????__substg1
0_00030102????????????????????????????????????????????????????????????????????????????????????????????????!????#$???????
?????????????????+????/????12????456789<=>?@ABCDEFGHIJKLMNOP????RS????UV????????????Z[????????^_????????bc????????fghijk
lmnopqrstuv????????yz?????????????????????????F?F??????????IPMNoteaws-stg-c5-feeds9aws[mazarvoiceSMTPAppender]Applicatio
nmes__substg10_00040102????????????????__substg10_10060102????__substg10_10140102????????__substg10_10150102????????????
__substg10_001A001F????__substg10_0037001F?????????????__substg10_003B0102????$__substg10_003D001F????????????????sageSM
TPBVAPPLICATION@mazarVOICECOM??@??B??+/??/O=ad/OU=ad/CN=RECIPIENTS/CN=_OPERATIONSSUPPORT_OperationsSupport?+????n?T?bvap
plication@mazarvoicecomSMTPbvapplication@mazarvoicecom__substg10_003F0102????P__substg10_0040001F????????????$__substg10
_00410102?????__substg10_0042001F????????????<bvapplication@mazarvoicecom??@??B??+/??/O=ad/OU=ad/CN=RECIPIENTS/CN=_OPERA
TIONSSUPPORT_OperationsSupportEX/O=ad/OU=ad/CN=RECIPIENTS/CN=_OPERATIONSSUPPORTEX/O=ad/OU=ad/CN=RECIPIENTS/CN=_OPERATION
SSUPPORTSMTPbvapplication@mazarvoicecom__substg10_00430102????P__substg10_0044001F????????????$__substg10_00510102????7_
_substg10_00520102????????????7__substg10_0064001F????__substg10_0065001F????????????<__substg10_0070001F?????__substg10
_00710102????????????aws-stg-c5-feeds9aws[mazarvoiceSMTPAppender]Applicationmessage??E????????A??H???EX/O=ad/OU=ad/CN=RE
CIPIENTS/CN=_OPERATIONSSUPPORTEX__substg10_0075001F????__substg10_0076001F????????????f__substg10_0077001F????__substg10
_0078001F????????????f/O=ad/OU=ad/CN=RECIPIENTS/CN=_OPERATIONSSUPPORT?+????n?T?bvapplication@mazarvoicecomSMTPbvapplicat
ion@mazarvoicecombvapplication@mazarvoicecomSMTPBVAPPLICATION@mazarVOICECOMSMTP__substg10_007D001F?????__substg10_0C1901
02????????????"?__substg10_0C1A001F&????%<__substg10_0C1D0102????????????&$MicrosoftMailInternetHeadersVersion20Received
fromdalmailadsolutionscom[17216977]byblrexchadsolutionscomwithMicrosoftSMTPSVC6037903959Sat20Feb2010213019+0530Receivedf
rombarracudaadcom[1721682]bydalmailadsolutionscomwithMicrosoftSMTPSVC6037903959Sat20Feb2010100012-0600X-ASG-Debug-ID1266
681611-1c039f4d0001-azmk4tReceivedfromna3sys009aog114obsmtpcomna3sys009aog114obsmtpcom[74125149211]bybarracudaadcomwithS
MTPidoe0BsQlWiwvBTxEofor<_OperationsSupport@adcom>Sat20Feb2010100011-0600CSTX-Barracuda-Envelope-Frombvapplication@mazar
voicecomReceivedfromsource[241551448]byna3sys009aob114postinicom[7412514812]withSMTPIDDSNKS4AHC703Mnh+uM8i9u1uucP76tMiGb
r6@postinicomSat20Feb2010080012PSTReceivedfrompsmtpcom74125149120byAUSBDCaustinmazarvoicecom100023withMicrosoftSMTPServe
rid821760Sat20Feb2010095958-0600Receivedfromsource[723214889]usingTLSv1byna3sys009amx236postinicom[7412514810]withSMTPSa
t20Feb2010080009PSTReceivedfromaws-build-systemawsaws-build-system[172200110]byc0mailmazarvoicecom8138/8138withESMTPido1
KG08md020220for<dev-log4j@mazarvoicecom>Sat20Feb2010100008-0600Receivedfromaws-stg-c5-feeds9awsaws-stg-c5-feeds9aws[1020
969139]byaws-build-systemawsPostfixwithESMTPid6D6A864294for<dev-log4j@mazarvoicecom>Sat20Feb2010100008-0600CSTReceivedfr
omaws-stg-c5-feeds9awslocalhost[127001]byaws-stg-c5-feeds9awsPostfixwithESMTPid548C1801ABfor<dev-log4j@mazarvoicecom>Sat
20Feb2010100008-0600CSTDateSat20Feb2010100008-0600From<bvapplication@mazarvoicecom>To<dev-log4j@mazarvoicecom>Message-ID
<644663928511266681608162JavaMailtomcat@aws-stg-c5-feeds9aws>X-ASG-Orig-Subjaws-stg-c5-feeds9aws[mazarvoiceSMTPAppender]
ApplicationmessageSubjectaws-stg-c5-feeds9aws[mazarvoiceSMTPAppender]ApplicationmessageMIME-Version10Content-Typemultipa
rt/mixedboundary="----=_Part_51_8002724931266681608155"X-pstn-neptune0/0/000/0X-pstn-levelsS4081870/9990000CV999000FC955
390LC955390R959108P959108M970282C986951X-Auto-Response-SuppressDROOFAutoReplyX-Barracuda-Connectna3sys009aog114obsmtpcom
[74125149211]X-Barracuda-Start-Time1266681611X-Barracuda-URLhttp//17216828000/cgi-mod/markcgiX-Virus-Scannedbybsmtpdatad
comX-Barracuda-Spam-Score001X-Barracuda-Spam-StatusNoSCORE=001usingglobalscoresofTAG_LEVEL=35QUARANTINE_LEVEL=10000KILL_
LEVEL=90tests=BSF_SC0_SA_TO_FROM_DOMAIN_MATCHNO_REAL_NAMEX-Barracuda-Spam-ReportCodeversion32rulesversion32223024Rulebre
akdownbelowptsrulenamedescription----------------------------------------------------------------------------000NO_REAL_
NAMEFromdoesnotincludearealname001BSF_SC0_SA_TO_FROM_DOMAIN_MATCHSenderDomainMatchesRecipientDomainReturn-Pathbvapplicat
ion@mazarvoicecomX-OriginalArrivalTime20Feb20101600120277UTCFILETIME=[C8AD525001CAB245]------=_Part_51_80027249312666816
08155Content-Typetext/plaincharset="us-ascii"Content-Transfer-Encoding7bit------=_Part_51_8002724931266681608155--__subs
tg10_0C1E001F!????'__substg10_0C1F001F????????????<__substg10_0E02001F$????????__substg10_0E03001F????????????????bvappl
ication@mazarvoicecomdev-log4j@mazarvoicecomaws-stg-c5-feeds9aws[mazarvoiceSMTPAppender]Applicationmessage00000002BLREXC
H/O=ad/OU=ad/cn=Recipients/cn=_OperationsSupportMicrosoftExchangeServer__substg10_0E04001F#%????4__substg10_0E1D001F????
?????????__substg10_0E28001F"????-?__substg10_0E29001F????????????0?00000002BLREXCH/O=ad/OU=ad/cn=Recipients/cn=_Operati
onsSupportMicrosoftExchangeServerZxLZFu#O?rcpg125?2CtexA???????PV?U?%Qch??set2?%?3F?03???05"`cP3d36P?0?2-?100a8WARNA?ghi
b?a?utJD@BCExce0iR???SQLHEr`r???S??!a8?????ERROR??OZ?%?rpd?%?URL="jdbcmysql//?stg-c5-m?13306@/bv2?a%0o@?nn?t=t?r__substg
10_1000001F'????"?#__substg10_10090102????????????3^__substg10_1035001F????Q?__substg10_10F3001F????????????T?02-2010000
8WARNorghibernateutilJDBCExceptionReporterSQLError0SQLState0800102-20100008ERRORorghibernateutilJDBCExceptionReporterSQL
exceptionraisedforJDBCURL="jdbcmysql//stg-c5-dbmst13306/bv2?autoReconnect=true&useUnicode=true&characterEncoding=utf-8"!
MESSAGEServerconnectionfailureduringtransactionDuetounderlyingexception'javanetSocketExceptionjavanetConnectExceptionCon
nectiontimedout'BEGINNESTEDEXCEPTIONjavanetSocketExceptionMESSAGEjavanetConnectExceptionConnectiontimedoutSTACKTRACEjava
netSocketExceptionjavanetConnectExceptionConnectiontimedoutatcommysqljdbcStandardSocketFactoryconnectStandardSocketFacto
ryjava156atcommysqljdbcMysqlIO<init>MysqlIOjava284atcommysqljdbcConnectioncreateNewIOConnectionjava2672atcommysqljdbcCon
nection<init>Connectionjava1474atcommysqljdbcNonRegisteringDriverconnectNonRegisteringDriverjava266atorgapachecommonsdbc
pDriverConnectionFactorycreateConnectionDriverConnectionFactoryjava37atorgapachecommonsdbcpPoolableConnectionFactorymake
ObjectPoolableConnectionFactoryjava291atorgapachecommonspoolimplGenericObjectPoolborrowObjectGenericObjectPooljava771ato
rgapachecommonsdbcpPoolingDataSourcegetConnectionPoolingDataSourcejava95atorgapachecommonsdbcpBasicDataSourcegetConnecti
onBasicDataSourcejava548atsunreflectGeneratedMethodAccessor530invokeUnknownSourceatsunreflectDelegatingMethodAccessorImp
linvokeDelegatingMethodAccessorImpljava25atjavalangreflectMethodinvokeMethodjava597atorgspringframeworkaopsupportAopUtil
sinvokeJoinpointUsingReflectionAopUtilsjava310atorgspringframeworkaopframeworkReflectiveMethodInvocationinvokeJoinpointR
eflectiveMethodInvocationjava182atorgspringframeworkaopframeworkReflectiveMethodInvocationproceedReflectiveMethodInvocat
ionjava149atorgspringframeworkaopframeworkadapterThrowsAdviceInterceptorinvokeThrowsAdviceInterceptorjava126atorgspringf
rameworkaopframeworkReflectiveMethodInvocationproceedReflectiveMethodInvocationjava171atorgspringframeworkaopframeworkJd
kDynamicAopProxyinvokeJdkDynamicAopProxyjava204at$Proxy20getConnectionUnknownSourceatorgspringframeworkormhibernate3Loca
lDataSourceConnectionProvidergetConnectionLocalDataSourceConnectionProviderjava82atorghibernatejdbcConnectionManageropen
ConnectionConnectionManagerjava417atorghibernatejdbcConnectionManagergetConnectionConnectionManagerjava144atorghibernate
jdbcAbstractBatcherprepareQueryStatementAbstractBatcherjava105atorghibernateloaderLoaderprepareQueryStatementLoaderjava1
561atorghibernateloaderLoaderdoQueryLoaderjava661atorghibernateloaderLoaderdoQueryAndInitializeNonLazyCollectionsLoaderj
ava224atorghibernateloaderLoaderdoListLoaderjava2145atorghibernateloaderLoaderlistIgnoreQueryCacheLoaderjava2029atorghib
ernateloaderLoaderlistLoaderjava2024atorghibernateloadercriteriaCriteriaLoaderlistCriteriaLoaderjava94atorghibernateimpl
SessionImpllistSessionImpljava1533atorghibernateimplCriteriaImpllistCriteriaImpljava283atorgspringframeworkormhibernate3
HibernateTemplate$36doInHibernateHibernateTemplatejava1061atorgspringframeworkormhibernate3HibernateTemplatedoExecuteHib
ernateTemplatejava419atorgspringframeworkormhibernate3HibernateTemplateexecuteWithNativeSessionHibernateTemplatejava374a
torgspringframeworkormhibernate3HibernateTemplatefindByCriteriaHibernateTemplatejava1051atorgspringframeworkormhibernate
3HibernateTemplatefindByCriteriaHibernateTemplatejava1044atcommazarvoiceccadaohibernateModelDAOHibernatefindModelDAOHibe
rnatejava189atcommazarvoiceccadaohibernateAbstractCriteriaDAOHibernatefindAbstractCriteriaDAOHibernatejava113atcommazarv
oiceccaloggingserviceimplLoggingConfigurationServiceImplupdateFiltersLoggingConfigurationServiceImpljava104atcommazarvoi
ceccaloggingserviceimplLoggingConfigurationServiceImplupdateLoggingConfigurationLoggingConfigurationServiceImpljava95atc
ommazarvoiceccaloggingserviceimplLoggingConfigurationServiceImpl$1runLoggingConfigurationServiceImpljava69ENDNESTEDEXCEP
TIONAttemptedreconnect3timesGivingupP&uU??$???En??g=%0f-8"!?ESSAGE?!`av?+?'?fp?@dqp0?q-?DP1??oun?ly1?''!`'java?+?tSock?%
?!`4wP+?5i6?'??%@?`t'"?"?BEGIEN/0TED!X?%?PTIO9?9?4%?"?/'67/89??ACKTRB?/??<?5???@?9%??$?0mcD?o?3D?F-??y+?JoKsD1?56H?I?Mr@
?<?it>P?M?284N/I?GXK?o$??@S?Qt6??Q?STP?UN14?7Q?I?N"g??aD/?K?\?]?U??N$Ra??%Iq`??!p]?GXKG?T?GXb?M37`O?a_bcPo??ld??KF?D?Obj
Li??f<291g?hO`?fpPG?qc?k?k??!!wk?p??M?77noiH]R$??aD?p%?gd?sv?M?95toubcB>a?wOd?|?M?54?8Z?s3flP?p?$?dMhp?Ac%??5?ppnvoD??kn
?r0?}d???HDelP?g$?a?|Ip????????Qtz^D`p?W?t7???%?9g/$asp?1rPa?w?k{??ob??pbA?0?U%Ab@??Jo?`??tU?p@??e???f?P?????????/??tI??
c?!2Q?}?????M?8V~?o????????D??d?/???49?o???aL?0]?Thr!sAdv??e???%?????o??uM?V@????????????]tO?O?_Jdk?Dy$?m???P`^xK?????mv
0Z?$????w??????I??$?3L??l}GX??????w?????M??O?$\S?Mpaw?]??0??d???M?4????/?Ow??Z&Z???J@A?b?-?|?t{?]???%??eQPK?!??????M?0z_
$k????]???????????M????????do?3??`??{????A3??P?1i?zT?LazyP?l??s??22???????L]??Z????????w?Ig????C{???????????????]?]"D0C?
U?J???y???$kp?`??????M????????_Qu???o?H$?T???Q?$?Q??????????%??`%?r?%0?????????o'WP??hN?!/???_y?g??o???3pdB???U??5????!?
?%??Wrb??a-?????K???bP????M???AO#?"??P?_???&/'??&6?????1~O-Q??pg\??qs`??pt???5#F1`g}???P?5???p??!0F?0?!???67?$?2?3?4?6?9
?8'???DA?oz?<??=?>?C7?$1?P?0B?C??c?M?ENDNESTEOEXCEPTUNN?M?At3???d?]?3???`b@G]?w?p?4}Tp<644663928511266681608162JavaMailt
omcat@aws-stg-c5-feeds9aws>aws-stg-c5-feeds9aws%3A[mazarvoiceSMTPAppender]ApplicationmessageEMLt<????G???k???__substg10_
300B0102+-????W__substg10_3FF8001F????????????X<__substg10_3FF901020????Y?__substg10_3FFA001F????????????\<bvapplication
@mazarvoicecom?+????n?T?bvapplication@mazarvoicecomSMTPbvapplication@mazarvoicecombvapplication@mazarvoicecom?+????n?T?b
vapplication@mazarvoicecomSMTPbvapplication@mazarvoicecom__substg10_3FFB0102/2????]?__substg10_8000001F????????????`2__s
ubstg10_8001001F14????a?__substg10_80020102????????????d2MicrosoftExchangeServer00000002BLREXCH/O=ad/OU=ad/cn=Recipients
/cn=_OperationsSupportMAPI//00000002/00000000@00?z?`??@00?z?`????J>???j4y?f??6__properties_version10035????e?__recip_ver
sion10_#00000000????????9?|?`???|?`??__substg10_0FF60102????????????w__substg10_0FFF010268????x?&67?@9??E??$??P?@&A??B>C
P?D&Q7?R7?de>p?q?uvhwxh}???>$?>@?+??E?????#^?5???????0????o??????>??????>???????@@@@v@y@?4???2?=???+????n?T?dev-log4j@ma
zarvoicecomSMTPdev-log4j@mazarvoicecomdev-log4j@mazarvoicecomSMTPdev-log4j@mazarvoicecomSMTPDEV-LOG4J@mazarVOICECOMdev-l
og4j@mazarvoicecom__substg10_3001001F????????????{2__substg10_3002001F7????|__substg10_3003001F????????????}2__substg10_
300B0102<????~__substg10_3A20001F????=????2__properties_version100??????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????
?????????????040040?4@??0

Мне нужно извлечь следующие ключевые слова: ConnectException, javanetConnectException и любое исключение, а также его тип исключения.

Пожалуйста, помогите мне написать регулярное выражение для этого?

1 Ответ

0 голосов
/ 20 марта 2010

Очень сложно придумать регулярное выражение, когда нет простого способа определить, где заканчивается ваше предложение. Это регулярное выражение будет соответствовать вашим двум исключениям, за которыми следует одна заглавная буква, за которой следует любое количество строчных букв:

(ConnectException|javanetConnectException)[A-Z][a-z]*

Когда я копирую / вставляю образец из вашего вопроса, вставленный текст содержит разрывы строк и пробел в конце каждой строки. Если они встречаются и в вашем реальном файле, сначала удалите их, выполнив поиск \s+ и заменив его ничем.

В вашем примере мое регулярное выражение находит 3 совпадения после пробелов и разрывов строк:

javanetConnectExceptionConnectiontimedout
javanetConnectExceptionConnectiontimedout
javanetConnectExceptionConnectiontimedoutatcommysqljdbc
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...