Beautiful Soup (shqyrtues i HTML)

Beautiful Soup është një paketë Python për shqyrtimin e dokumenteve HTML dhe XML (duke përfshirë shënjimin e keqformuar, dmth. etiketat jo të mbyllura, të quajtur kështu sipas supës së etiketave ). Krijon një pemë shqyrtimi për faqet e analizuara që mund të përdoret për të nxjerrë të dhëna nga HTML, ^[1] e cila është e dobishme për gërvishtjen e uebit . ^[2] ^[3]

Beautiful Soup u nis nga Leonard Richardson, i cili vazhdon të kontribuojë në projekt, ^[4] dhe mbështetet gjithashtu nga Tidelift, një abonim me pagesë për mirëmbajtjen me burim të hapur. ^[5]

Shembull kodi

Beautiful Soup përfaqëson të dhënat e analizuara si një pemë që mund të kërkohet dhe të shëtitet me laqe të zakonshëm në Python. ^[6] Shembulli i mëposhtëm përdor urllib ^[7] të librarisë standarde të Python për të ngarkuar faqen kryesore të Wikipedia -s, më pas përdor Beautiful Soup për të analizuar dokumentin dhe për të kërkuar të gjitha lidhjet brenda.

#!/usr/bin/env python3
# Anchor extraction from HTML document
from bs4 import BeautifulSoup
from urllib.request import urlopen
with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
    soup = BeautifulSoup(response, 'html.parser')
    for anchor in soup.find_all('a'):
        print(anchor.get('href', '/'))

^ Hajba, Gábor László (2018), Hajba, Gábor László (red.), "Using Beautiful Soup", Website Scraping with Python: Using BeautifulSoup and Scrapy (në anglisht), Apress, fq. 41–96, doi:10.1007/978-1-4842-3925-4_3, ISBN 978-1-4842-3925-4
^ "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself
^ Python, Real. "Beautiful Soup: Build a Web Scraper With Python – Real Python". realpython.com (në anglisht). Marrë më 2023-06-01.
^ "Code : Leonard Richardson". Launchpad (në anglishte amerikane). Marrë më 2020-09-19.{{cite web}}: Mirëmbajtja CS1: Gjendja e adresës (lidhja)
^ Tidelift. "beautifulsoup4 | pypi via the Tidelift Subscription". tidelift.com (në anglisht). Marrë më 2020-09-19.
^ "How To Scrape Web Pages with Beautiful Soup and Python 3 | DigitalOcean". www.digitalocean.com (në anglisht). Marrë më 2023-06-01.
^ Python, Real. "Python's urllib.request for HTTP Requests – Real Python". realpython.com (në anglisht). Marrë më 2023-06-01.

[1] Hajba, Gábor László (2018), Hajba, Gábor László (red.), "Using Beautiful Soup", Website Scraping with Python: Using BeautifulSoup and Scrapy (në anglisht), Apress, fq. 41–96, doi:10.1007/978-1-4842-3925-4_3, ISBN 978-1-4842-3925-4

[crummy.com-2] "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself

[3] Python, Real. "Beautiful Soup: Build a Web Scraper With Python – Real Python". realpython.com (në anglisht). Marrë më 2023-06-01.

[4] "Code : Leonard Richardson". Launchpad (në anglishte amerikane). Marrë më 2020-09-19.{{cite web}}: Mirëmbajtja CS1: Gjendja e adresës (lidhja)

[5] Tidelift. "beautifulsoup4 | pypi via the Tidelift Subscription". tidelift.com (në anglisht). Marrë më 2020-09-19.

[6] "How To Scrape Web Pages with Beautiful Soup and Python 3 | DigitalOcean". www.digitalocean.com (në anglisht). Marrë më 2023-06-01.

[7] Python, Real. "Python's urllib.request for HTTP Requests – Real Python". realpython.com (në anglisht). Marrë më 2023-06-01.

[1]

[2]

[3]

[4]

[5]

[6]

[7]