Regex для извлечения числового значения между совпадением строк - PullRequest
0 голосов
/ 27 апреля 2020

У меня есть символьный вектор, который в некоторых случаях может содержать несколько примеров, в которых следующие регулярные выражения сопровождаются значением цифры c и затем закрываются. Вот пример:

form4_2  <- "<transactionPricePerShare><value>31.43</value>"

Я хотел бы иметь возможность собрать значения "31,43" и любые другие значения чисел c между другими примерами совпадающей строки, содержащейся в векторе символов, а затем создать фрейм данных с результатом. Любая помощь будет оценена.

library('stringr')
form4_3 <- form4_2[which(str_detect(form4_2,"</transactionPricePerShare>")=='TRUE')-1]
form4_3 <- str_remove(form4_3,'<value>')
form4_3 <- str_remove(form4_3,'</value>')
form4_4 <- data.frame(as.numeric(form4_3))
colnames(form4_4) <- "Transacted Price ($)"

Обновлен dput

"<SEC-DOCUMENT>0001179110-20-004802.txt : 20200408<SEC-HEADER>0001179110-20-004802.hdr.sgml : 20200408<ACCEPTANCE-DATETIME>20200408162604ACCESSION NUMBER:\t\t0001179110-20-004802CONFORMED SUBMISSION TYPE:\t4PUBLIC DOCUMENT COUNT:\t\t1CONFORMED PERIOD OF REPORT:\t20200406FILED AS OF DATE:\t\t20200408DATE AS OF CHANGE:\t\t20200408REPORTING-OWNER:OWNER DATA:COMPANY CONFORMED NAME:\t\t\tSTAFFORD JOHN S IIICENTRAL INDEX KEY:\t\t\t0001218981FILING VALUES:FORM TYPE:\t\t4SEC ACT:\t\t1934 ActSEC FILE NUMBER:\t001-36182FILM NUMBER:\t\t20782220MAIL ADDRESS:STREET 1:\t\t230 SOUTH LASALLE STREET 400CITY:\t\t\tCHICAGOSTATE:\t\t\tILZIP:\t\t\t60604ISSUER:COMPANY DATA:COMPANY CONFORMED NAME:\t\t\tXencor IncCENTRAL INDEX KEY:\t\t\t0001326732STANDARD INDUSTRIAL CLASSIFICATION:\tPHARMACEUTICAL PREPARATIONS [2834]IRS NUMBER:\t\t\t\t201622502STATE OF INCORPORATION:\t\t\tDEFISCAL YEAR END:\t\t\t1231BUSINESS ADDRESS:STREET 1:\t\t111 WEST LEMON AVECITY:\t\t\tMONROVIASTATE:\t\t\tCAZIP:\t\t\t91016BUSINESS PHONE:\t\t626-305-5900MAIL ADDRESS:STREET 1:\t\t111 WEST LEMON AVECITY:\t\t\tMONROVIASTATE:\t\t\tCAZIP:\t\t\t91016</SEC-HEADER><DOCUMENT><TYPE>4<SEQUENCE>1<FILENAME>edgar.xml<DESCRIPTION>FORM 4 -<TEXT><XML><?xml version=1.0?><ownershipDocument><schemaVersion>X0306</schemaVersion><documentType>4</documentType><periodOfReport>2020-04-06</periodOfReport><notSubjectToSection16>1</notSubjectToSection16><issuer><issuerCik>0001326732</issuerCik><issuerName>Xencor Inc</issuerName><issuerTradingSymbol>XNCR</issuerTradingSymbol></issuer><reportingOwner><reportingOwnerId><rptOwnerCik>0001218981</rptOwnerCik><rptOwnerName>STAFFORD JOHN S III</rptOwnerName></reportingOwnerId><reportingOwnerAddress><rptOwnerStreet1>350 N. ORLEANS STREET</rptOwnerStreet1><rptOwnerStreet2>SUITE 2N</rptOwnerStreet2><rptOwnerCity>CHICAGO</rptOwnerCity><rptOwnerState>IL</rptOwnerState><rptOwnerZipCode>60654-1975</rptOwnerZipCode><rptOwnerStateDescription></rptOwnerStateDescription></reportingOwnerAddress><reportingOwnerRelationship><isDirector>0</isDirector><isOfficer>0</isOfficer><isTenPercentOwner>1</isTenPercentOwner><isOther>0</isOther><officerTitle></officerTitle><otherText></otherText></reportingOwnerRelationship></reportingOwner><nonDerivativeTable><nonDerivativeTransaction><securityTitle><value>Common Stock</value></securityTitle><transactionDate><value>2020-04-06</value></transactionDate><transactionCoding><transactionFormType>4</transactionFormType><transactionCode>S</transactionCode><equitySwapInvolved>0</equitySwapInvolved></transactionCoding><transactionAmounts><transactionShares><value>44771</value></transactionShares><transactionPricePerShare><value>31.84</value><footnoteId id=F1/></transactionPricePerShare><transactionAcquiredDisposedCode><value>D</value></transactionAcquiredDisposedCode></transactionAmounts><postTransactionAmounts><sharesOwnedFollowingTransaction><value>1206005</value></sharesOwnedFollowingTransaction></postTransactionAmounts><ownershipNature><directOrIndirectOwnership><value>I</value></directOrIndirectOwnership><natureOfOwnership><value>By Ronin Trading, LLC</value></natureOfOwnership></ownershipNature></nonDerivativeTransaction><nonDerivativeTransaction><securityTitle><value>Common Stock</value></securityTitle><transactionDate><value>2020-04-06</value></transactionDate><transactionCoding><transactionFormType>4</transactionFormType><transactionCode>S</transactionCode><equitySwapInvolved>0</equitySwapInvolved><footnoteId id=F2/></transactionCoding><transactionAmounts><transactionShares><value>600000</value></transactionShares><transactionPricePerShare><value>27.10</value><footnoteId id=F2/></transactionPricePerShare><transactionAcquiredDisposedCode><value>D</value></transactionAcquiredDisposedCode></transactionAmounts><postTransactionAmounts><sharesOwnedFollowingTransaction><value>606005</value></sharesOwnedFollowingTransaction></postTransactionAmounts><ownershipNature><directOrIndirectOwnership><value>I</value></directOrIndirectOwnership><natureOfOwnership><value>By Ronin Trading, LLC</value></natureOfOwnership></ownershipNature></nonDerivativeTransaction></nonDerivativeTable><derivativeTable></derivativeTable><footnotes><footnote id=F1>This transaction was executed in multiple trades at prices ranging from $31.48 to $32.30. The price reported above reflects the weighted average purchase price. The reporting person hereby undertakes to provide upon request to the SEC staff, the issuer or a security holder of the issuer full information regarding the number of shares and prices at which the transactions were effected.</footnote><footnote id=F2>The transaction was executed in a single, privately negotiated transaction with an institutional buyer.</footnote></footnotes><ownerSignature><signatureName>/s/ John S. Stafford, III</signatureName><signatureDate>2020-04-08</signatureDate></ownerSignature></ownershipDocument></XML></TEXT></DOCUMENT></SEC-DOCUMENT>"

Ответы [ 2 ]

1 голос
/ 27 апреля 2020

ОБНОВЛЕНИЕ в ответ на несколько комментариев:

Вы можете извлечь цену, используя str_extract_all, положительный взгляд за спиной (?>=...), а также прогнозный просмотр (?=...), сохранить результат в виде vector и используйте вектор в качестве столбца в кадре данных:

Transacted_price <- str_extract_all(form4_2, 
                    "(?<=(<transactionPricePerShare><value>))\\d+\\.\\d+(?=(</value>))")
df <- data.frame(unlist(Transacted_price))

Результат:

df
  unlist.Transacted_price.
1                    31.43

Данные:

form4_2 <- "<SEC-DOCUMENT>0001179110-20-004802.txt : 20200408<SEC-HEADER>0001179110-20-004802.hdr.sgml : 20200408<ACCEPTANCE-DATETIME>20200408162604ACCESSION NUMBER:\t\t0001179110-20-004802CONFORMED SUBMISSION TYPE:\t4PUBLIC DOCUMENT COUNT:\t\t1CONFORMED PERIOD OF REPORT:\t20200406FILED AS OF DATE:\t\t20200408DATE AS OF CHANGE:\t\t20200408REPORTING-OWNER:OWNER DATA:COMPANY CONFORMED NAME:\t\t\tSTAFFORD JOHN S IIICENTRAL INDEX KEY:\t\t\t0001218981FILING VALUES:FORM TYPE:\t\t4SEC ACT:\t\t1934 ActSEC FILE NUMBER:\t001-36182FILM NUMBER:\t\t20782220MAIL ADDRESS:STREET 1:\t\t230 SOUTH LASALLE STREET 400CITY:\t\t\tCHICAGOSTATE:\t\t\tILZIP:\t\t\t60604ISSUER:COMPANY DATA:COMPANY CONFORMED NAME:\t\t\tXencor IncCENTRAL INDEX KEY:\t\t\t0001326732STANDARD INDUSTRIAL CLASSIFICATION:\tPHARMACEUTICAL PREPARATIONS [2834]IRS NUMBER:\t\t\t\t201622502STATE OF INCORPORATION:\t\t\tDEFISCAL YEAR END:\t\t\t1231BUSINESS ADDRESS:STREET 1:\t\t111 WEST LEMON AVECITY:\t\t\tMONROVIASTATE:\t\t\tCAZIP:\t\t\t91016BUSINESS PHONE:\t\t626-305-5900MAIL ADDRESS:STREET 1:\t\t111 WEST LEMON AVECITY:\t\t\tMONROVIASTATE:\t\t\tCAZIP:\t\t\t91016</SEC-HEADER><DOCUMENT><TYPE>4<SEQUENCE>1<FILENAME>edgar.xml<DESCRIPTION>FORM 4 -<TEXT><XML><?xml version=1.0?><ownershipDocument><schemaVersion>X0306</schemaVersion><documentType>4</documentType><periodOfReport>2020-04-06</periodOfReport><notSubjectToSection16>1</notSubjectToSection16><issuer><issuerCik>0001326732</issuerCik><issuerName>Xencor Inc</issuerName><issuerTradingSymbol>XNCR</issuerTradingSymbol></issuer><reportingOwner><reportingOwnerId><rptOwnerCik>0001218981</rptOwnerCik><rptOwnerName>STAFFORD JOHN S III</rptOwnerName></reportingOwnerId><reportingOwnerAddress><rptOwnerStreet1>350 N. ORLEANS STREET</rptOwnerStreet1><rptOwnerStreet2>SUITE 2N</rptOwnerStreet2><rptOwnerCity>CHICAGO</rptOwnerCity><rptOwnerState>IL</rptOwnerState><rptOwnerZipCode>60654-1975</rptOwnerZipCode><rptOwnerStateDescription></rptOwnerStateDescription></reportingOwnerAddress><reportingOwnerRelationship><isDirector>0</isDirector><isOfficer>0</isOfficer><isTenPercentOwner>1</isTenPercentOwner><isOther>0</isOther><officerTitle></officerTitle><otherText></otherText></reportingOwnerRelationship></reportingOwner><nonDerivativeTable><nonDerivativeTransaction><securityTitle><value>Common Stock</value></securityTitle><transactionDate><value>2020-04-06</value></transactionDate><transactionCoding><transactionFormType>4</transactionFormType><transactionCode>S</transactionCode><equitySwapInvolved>0</equitySwapInvolved></transactionCoding><transactionAmounts><transactionShares><value>44771</value></transactionShares><transactionPricePerShare><value>31.84</value><footnoteId id=F1/></transactionPricePerShare><transactionAcquiredDisposedCode><value>D</value></transactionAcquiredDisposedCode></transactionAmounts><postTransactionAmounts><sharesOwnedFollowingTransaction><value>1206005</value></sharesOwnedFollowingTransaction></postTransactionAmounts><ownershipNature><directOrIndirectOwnership><value>I</value></directOrIndirectOwnership><natureOfOwnership><value>By Ronin Trading, LLC</value></natureOfOwnership></ownershipNature></nonDerivativeTransaction><nonDerivativeTransaction><securityTitle><value>Common Stock</value></securityTitle><transactionDate><value>2020-04-06</value></transactionDate><transactionCoding><transactionFormType>4</transactionFormType><transactionCode>S</transactionCode><equitySwapInvolved>0</equitySwapInvolved><footnoteId id=F2/></transactionCoding><transactionAmounts><transactionShares><value>600000</value></transactionShares><transactionPricePerShare><value>27.10</value><footnoteId id=F2/></transactionPricePerShare><transactionAcquiredDisposedCode><value>D</value></transactionAcquiredDisposedCode></transactionAmounts><postTransactionAmounts><sharesOwnedFollowingTransaction><value>606005</value></sharesOwnedFollowingTransaction></postTransactionAmounts><ownershipNature><directOrIndirectOwnership><value>I</value></directOrIndirectOwnership><natureOfOwnership><value>By Ronin Trading, LLC</value></natureOfOwnership></ownershipNature></nonDerivativeTransaction></nonDerivativeTable><derivativeTable></derivativeTable><footnotes><footnote id=F1>This transaction was executed in multiple trades at prices ranging from $31.48 to $32.30. The price reported above reflects the weighted average purchase price. The reporting person hereby undertakes to provide upon request to the SEC staff, the issuer or a security holder of the issuer full information regarding the number of shares and prices at which the transactions were effected.</footnote><footnote id=F2>The transaction was executed in a single, privately negotiated transaction with an institutional buyer.</footnote></footnotes><ownerSignature><signatureName>/s/ John S. Stafford, III</signatureName><signatureDate>2020-04-08</signatureDate></ownerSignature></ownershipDocument></XML></TEXT></DOCUMENT></SEC-DOCUMENT>"
0 голосов
/ 27 апреля 2020

Это извлечет что-нибудь между <value> и </value>. Вы можете обернуть результат в as.double(), если хотите, чтобы действительные числа работали с ним.

text <- "<transactionPricePerShare><value>31.43</value>"

sub(".*<value>(.*)</value>.*","\\1",text)

"31.43"

Редактировать

Это довольно специфично c и может ' Легко изменить, чтобы получить другие переменные.

PlacesFound <- gregexpr("<transactionPricePerShare>", form4_2)
ExtractedNumbers <-sapply(PlacesFound[[1]], function(x) as.numeric(substr(form4_2,x+33,x+37)))

> ExtractedNumbers
[1] 31.84 27.10
...