Необходимо преобразовать строку HTML в текст через python - PullRequest
0 голосов
/ 25 мая 2020

Вот что у меня есть! Мне нужен код, в котором я передаю всю эту строку и получаю из нее только текстовую часть! Это не страница, это просто строка, как и страница HTML с расширением txt. Пожалуйста, помогите мне со всеми другими решениями, используя красивый суп, который принимает URL-адрес, но это не веб-страница. Любая помощь будет принята с благодарностью.

b'<!DOCTYPE HTML>\r\n
<html>
   \r\n
   <head>
      \r\n
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      \r\n
      <title>TalentHire - Simplified Recruiting and Staffing</title>
      \r\n
   </head>
   \r\n        \r\n
   <body leftmargin="0" rightmargin="0" topmargin="0" bottommargin="0">
      \r\n        
      <div style="width:100%; overflow:auto; float:left; margin: auto;">
         \r\n        
         <table cellpadding="0" cellspacing="0" border="0" style="width:100%; min-width:300px;">
            \r\n                        
            <tr>
               \r\n                
               <td style=" border:none;">
                  \r\n                \t
                  <table cellpadding="0" cellspacing="0" style="width:100%; min-width:280px; margin:0 auto; border:none;">
                     \r\n                        
                     <tr>
                        \r\n                            
                        <td style="font-family: calibri,sans-serif !important; font-size:15px !important; color:#333 !important; line-height:22px; border:none;">
                           \r\n                                
                           <div id="EditorSalutationID">
                              \r\n
                              <p>Position:&nbsp; Azure Architect</p>
                              \r\n\r\n
                              <p>Location: San Antonio, Texas</p>
                              \r\n\r\n
                              <p><br />\r\nResponsibilities-</p>
                              \r\n\r\n
                              <p>Customer is implementing a new POS solution and this program is all about&nbsp; doing the integration work for the new POS along with data migration and some new web app development.<br />\r\nAll the integration and web development work will be done using azure PaaS components.<br />\r\nResponsibilities are:<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp; Provide Inputs to enterprise solution Architecture<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Design secure integration solutions/Architecture<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Implement best practices when using azure components<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Work with 3rd party vendor architects on behalf of Customer to design integration solution<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Provide recommendation to optimize azure cost<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Recommendation and best practices on using various azure resources<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Hands on set up of azure components and design patterns for development teams to follow. Hands on to .Net Technologies</p>
                              \r\n\r\n
                              <p><br />\r\nResponsible for technical solutioning and design the integration Solution in AZURE. Design, develop, and construct detailed Azure architecture. Understand current state gaps and propose secured solutions to ensure roadmap can adapt to changes and integrate with existing environment or propose changes to existing environment. Work with vendors and customers to understand new solutions&rsquo; limitations and capabilities. Work with internal delivery teams to ensure solutions align with roadmap and architecture. Lead a team of engineers and developers to design and build solutions."</p>
                              \r\n\r\n
                              <p>Regards,</p>
                              \r\n\r\n
                              <p>Manish Kumar</p>
                              \r\n\r\n
                              <p><a href="http://http/" onclick="return Webmail.Widgets.Email.Message.evLinkClick(this);" rel="noopener noreferrer" target="_blank" title="This external link will open in a new window">Email-ID:manish.kumar1@idctechnologies.com</a></p>
                              \r\n\r\n
                              <p>Desk NO:315-994-1244</p>
                              \r\n
                           </div>
                           \r\n\r\n
                           <div id="EditorSignatureID">&nbsp;</div>
                           \r\n                             
                        </td>
                        \r\n                        
                     </tr>
                     \r\n                        
                     <tr>
                        \r\n                            
                        <td style="font-family: calibri,sans-serif; font-size:14px; line-height:normal; color:#333; border:none">\r\n                                                           </td>
                        \r\n                        
                     </tr>
                     \r\n                    
                  </table>
                  \r\n                
               </td>
               \r\n            
            </tr>
            \r\n            \r\n                \t
         </table>
         \r\n        
         <p style="border:none; padding-left:10px; font-size:11px; font-family:Arial, Helvetica, sans-serif; color:#6b6c72; text-align:left; line-height:18px;text-transform: uppercase;"> To unsubscribe from future emails or to update your email preferences<a href="http://unsubscribe.idctechnologies.com/users/request_unsubscribe/217a2089eed1fd0f407ea853a29608b1cbaf9bb2/f40908d9c9fddff08cbeeb44f5678cbf48a9a840/YkgrQnRETjZscTQvT0taSDc5dzBFR0p0WXY5dmNQYjJRVDZaWnpac2Exdz0=/" style="color:#0077c5; text-decoration:underline"><b>click here </b></a>.</p>
      </div>
      \r\n<img width="1px" height="1px" alt="" src="http://clicks.mg.idctechnologies.com/o/eJwVzDsOwyAMANDTNCOyifkNLEj0GhXFJkEKRUp6f7XZ3vQ4BiL7xqVHDRrAaIOEZkWFKuVsvHM5pBSMz88HwdhU5_qVun_mMbcul6pzLHu07AkAC3CrWEKzIkTNIgmWlcEtp7RX52jd7XgKU50s_3IbpR_38gNSeihY">
   </body>
   \r\n
</html>
\r\n'

1 Ответ

0 голосов
/ 25 мая 2020
from bs4 import BeautifulSoup
data = """
b'<!DOCTYPE HTML>\r\n
<html>
   \r\n
   <head>
      \r\n
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      \r\n
      <title>TalentHire - Simplified Recruiting and Staffing</title>
      \r\n
   </head>
   \r\n        \r\n
   <body leftmargin="0" rightmargin="0" topmargin="0" bottommargin="0">
      \r\n        
      <div style="width:100%; overflow:auto; float:left; margin: auto;">
         \r\n        
         <table cellpadding="0" cellspacing="0" border="0" style="width:100%; min-width:300px;">
            \r\n                        
            <tr>
               \r\n                
               <td style=" border:none;">
                  \r\n                \t
                  <table cellpadding="0" cellspacing="0" style="width:100%; min-width:280px; margin:0 auto; border:none;">
                     \r\n                        
                     <tr>
                        \r\n                            
                        <td style="font-family: calibri,sans-serif !important; font-size:15px !important; color:#333 !important; line-height:22px; border:none;">
                           \r\n                                
                           <div id="EditorSalutationID">
                              \r\n
                              <p>Position:&nbsp; Azure Architect</p>
                              \r\n\r\n
                              <p>Location: San Antonio, Texas</p>
                              \r\n\r\n
                              <p><br />\r\nResponsibilities-</p>
                              \r\n\r\n
                              <p>Customer is implementing a new POS solution and this program is all about&nbsp; doing the integration work for the new POS along with data migration and some new web app development.<br />\r\nAll the integration and web development work will be done using azure PaaS components.<br />\r\nResponsibilities are:<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp; Provide Inputs to enterprise solution Architecture<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Design secure integration solutions/Architecture<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Implement best practices when using azure components<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Work with 3rd party vendor architects on behalf of Customer to design integration solution<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Provide recommendation to optimize azure cost<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Recommendation and best practices on using various azure resources<br />\r\n&middot; &nbsp; &nbsp; &nbsp; &nbsp;Hands on set up of azure components and design patterns for development teams to follow. Hands on to .Net Technologies</p>
                              \r\n\r\n
                              <p><br />\r\nResponsible for technical solutioning and design the integration Solution in AZURE. Design, develop, and construct detailed Azure architecture. Understand current state gaps and propose secured solutions to ensure roadmap can adapt to changes and integrate with existing environment or propose changes to existing environment. Work with vendors and customers to understand new solutions&rsquo; limitations and capabilities. Work with internal delivery teams to ensure solutions align with roadmap and architecture. Lead a team of engineers and developers to design and build solutions."</p>
                              \r\n\r\n
                              <p>Regards,</p>
                              \r\n\r\n
                              <p>Manish Kumar</p>
                              \r\n\r\n
                              <p><a href="http://http/" onclick="return Webmail.Widgets.Email.Message.evLinkClick(this);" rel="noopener noreferrer" target="_blank" title="This external link will open in a new window">Email-ID:manish.kumar1@idctechnologies.com</a></p>
                              \r\n\r\n
                              <p>Desk NO:315-994-1244</p>
                              \r\n
                           </div>
                           \r\n\r\n
                           <div id="EditorSignatureID">&nbsp;</div>
                           \r\n                             
                        </td>
                        \r\n                        
                     </tr>
                     \r\n                        
                     <tr>
                        \r\n                            
                        <td style="font-family: calibri,sans-serif; font-size:14px; line-height:normal; color:#333; border:none">\r\n                                                           </td>
                        \r\n                        
                     </tr>
                     \r\n                    
                  </table>
                  \r\n                
               </td>
               \r\n            
            </tr>
            \r\n            \r\n                \t
         </table>
         \r\n        
         <p style="border:none; padding-left:10px; font-size:11px; font-family:Arial, Helvetica, sans-serif; color:#6b6c72; text-align:left; line-height:18px;text-transform: uppercase;"> To unsubscribe from future emails or to update your email preferences<a href="http://unsubscribe.idctechnologies.com/users/request_unsubscribe/217a2089eed1fd0f407ea853a29608b1cbaf9bb2/f40908d9c9fddff08cbeeb44f5678cbf48a9a840/YkgrQnRETjZscTQvT0taSDc5dzBFR0p0WXY5dmNQYjJRVDZaWnpac2Exdz0=/" style="color:#0077c5; text-decoration:underline"><b>click here </b></a>.</p>
      </div>
      \r\n<img width="1px" height="1px" alt="" src="http://clicks.mg.idctechnologies.com/o/eJwVzDsOwyAMANDTNCOyifkNLEj0GhXFJkEKRUp6f7XZ3vQ4BiL7xqVHDRrAaIOEZkWFKuVsvHM5pBSMz88HwdhU5_qVun_mMbcul6pzLHu07AkAC3CrWEKzIkTNIgmWlcEtp7RX52jd7XgKU50s_3IbpR_38gNSeihY">
   </body>
   \r\n
</html>
\r\n'
"""

soup = BeautifulSoup(data, 'html.parser')

print(soup.text)

вывод:

b'



TalentHire - Simplified Recruiting and Staffing










Position:  Azure Architect
Location: San Antonio, Texas

Responsibilities-
Customer is implementing a new POS solution and this program is all about  doing the integration work for the new POS along with data migration and some new web app development.
All the integration and web development work will be done using azure PaaS components.
Responsibilities are:
·         Provide Inputs to enterprise solution Architecture
·        Design secure integration solutions/Architecture
·        Implement best practices when using azure components
·        Work with 3rd party vendor architects on behalf of Customer to design integration solution
·        Provide recommendation to optimize azure cost
·        Recommendation and best practices on using various azure resources  
·        Hands on set up of azure components and design patterns for development teams to follow. Hands on to .Net Technologies

Responsible for technical solutioning and design the integration Solution in 
AZURE. Design, develop, and construct detailed Azure architecture. Understand current state gaps and propose secured solutions to ensure roadmap can adapt to changes and integrate with existing environment or propose changes to existing environment. Work with vendors and customers to understand new solutions’ limitations and capabilities. Work with internal delivery teams to ensure 
solutions align with roadmap and architecture. Lead a team of engineers and developers to design and build solutions."
Regards,
Manish Kumar
Email-ID:manish.kumar1@idctechnologies.com
Desk NO:315-994-1244












 To unsubscribe from future emails or to update your email preferencesclick here .





'

...