Я написал скрипт, который очищает сайт и помещает содержимое в текстовый файл.У меня проблема, потому что, как и в приведенном ниже коде, есть два абзаца, и я хочу получить текст из обоих абзацев, но отдельно.Итак, мой вопрос: есть ли способ поиска только абзацев между двумя конкретными классами h2 или как это решить?
HTML:
<h2 class="pt-3" id="mitigation">Mitigation</h2>
<p>Access tokens are an integral part of the security system within Windows and cannot be turned off. However, an attacker must already have administrator level access on the local system to make full use of this technique; be sure to restrict users and accounts to the least privileges they require to do their job.</p><p>Any user can also spoof access tokens if they have legitimate credentials. Follow mitigation guidelines for preventing adversary use of <a href="/techniques/T1078">Valid Accounts</a>. Limit permissions so that users and user groups cannot create tokens. This setting should be defined for the local system account only. GPO: Computer Configuration > [Policies] > Windows Settings > Security Settings > Local Policies > User Rights Assignment: Create a token object. <span id="scite-ref-19-a" class="scite-citeref-number" data-reference="Microsoft Create Token"><sup><a href="https://docs.microsoft.com/windows/device-security/security-policy-settings/create-a-token-object" target="_blank" data-hasqtip="18" aria-describedby="qtip-18">[19]</a></sup></span> Also define who can create a process level token to only the local and network service through GPO: Computer Configuration > [Policies] > Windows Settings > Security Settings > Local Policies > User Rights Assignment: Replace a process level token. <span id="scite-ref-20-a" class="scite-citeref-number" data-reference="Microsoft Replace Process Token"><sup><a href="https://docs.microsoft.com/windows/device-security/security-policy-settings/replace-a-process-level-token" target="_blank" data-hasqtip="19" aria-describedby="qtip-19">[20]</a></sup></span></p><p>Also limit opportunities for adversaries to increase privileges by limiting Privilege Escalation opportunities.</p>
<h2 class="pt-3" id="detection">Detection</h2>
<p>If an adversary is using a standard command-line shell, analysts can detect token manipulation by auditing command-line activity. Specifically, analysts should look for use of the <code>runas</code> command. Detailed command-line logging is not enabled by default in Windows. <span id="scite-ref-21-a" class="scite-citeref-number" data-reference="Microsoft Command-line Logging"><sup><a href="https://technet.microsoft.com/en-us/windows-server-docs/identity/ad-ds/manage/component-updates/command-line-process-auditing" target="_blank" data-hasqtip="20" aria-describedby="qtip-20">[21]</a></sup></span></p><p>If an adversary is using a payload that calls the Windows token APIs directly, analysts can detect token manipulation only through careful analysis of user network activity, examination of running processes, and correlation with other endpoint and network behavior. </p><p>There are many Windows API calls a payload can take advantage of to manipulate access tokens (e.g., <code>LogonUser</code> <span id="scite-ref-22-a" class="scite-citeref-number" data-reference="Microsoft LogonUser"><sup><a href="https://msdn.microsoft.com/en-us/library/windows/desktop/aa378184(v=vs.85).aspx" target="_blank" data-hasqtip="21" aria-describedby="qtip-21">[22]</a></sup></span>, <code>DuplicateTokenEx</code> <span id="scite-ref-23-a" class="scite-citeref-number" data-reference="Microsoft DuplicateTokenEx"><sup><a href="https://msdn.microsoft.com/en-us/library/windows/desktop/aa446617(v=vs.85).aspx" target="_blank" data-hasqtip="22" aria-describedby="qtip-22">[23]</a></sup></span>, and <code>ImpersonateLoggedOnUser</code> <span id="scite-ref-24-a" class="scite-citeref-number" data-reference="Microsoft ImpersonateLoggedOnUser"><sup><a href="https://msdn.microsoft.com/en-us/library/windows/desktop/aa378612(v=vs.85).aspx" target="_blank" data-hasqtip="23" aria-describedby="qtip-23">[24]</a></sup></span>). Please see the referenced Windows API pages for more information.</p><p>Query systems for process and thread token information and look for inconsistencies such as user owns processes impersonating the local SYSTEM account. <span id="scite-ref-3-a" class="scite-citeref-number" data-reference="BlackHat Atkinson Winchester Token Manipulation"><sup><a href="https://www.blackhat.com/docs/eu-17/materials/eu-17-Atkinson-A-Process-Is-No-One-Hunting-For-Token-Manipulation.pdf" target="_blank" data-hasqtip="2" aria-describedby="qtip-2">[3]</a></sup></span></p>
Код:
import requests
from bs4 import BeautifulSoup
import time
from docx import Document
def linkgenerator_getlink():
link = "https://attack.mitre.org/techniques/"
for i in range(1001, 1224):
fullurl = link + "T" + str(i) + "/"
source = requests.get(fullurl).text
time.sleep(15)
soup = BeautifulSoup(source, 'lxml')
document = Document()
document.add_heading(soup.find('h1').text.strip().encode("UTF-8"), 0)
p = soup.findAll("p")
for x in p:
paragraphs = unicode(x.text)
p1 = document.add_paragraph(paragraphs)
document.save('C:\\Users\XXX\Desktop\\script\\' + (str("T%s.docx") % str(i)))
print "========== %s-es szamu doksi is ready ==========" % i
linkgenerator_getlink()