i am fairly new to bs4for that matter, but im trying to scrape a little chunk of information from a site:
but it keeps printing "None" as if the title, or any tag if i replace it, doesn't exists.
The project consits of two parts:
the looping-part: (which seems to be pretty straightforward). the parser-part: where i have some issues - see below. I'm trying to loop through an array of URLs and scrape the data below from a list of wordpress-plugins. See my loop below-
from bs4 import BeautifulSoup
import requests
#array of URLs to loop through, will be larger once I get the loop working correctly
plugins = ['https://wordpress.org/plugins/wp-job-manager', 'https://wordpress.org/plugins/ninja-forms']
this can be done like so
ttt = page_soup.find("div", {"class":"plugin-meta"})
text_nodes = [node.text.strip() for node in ttt.ul.findChildren('li')[:-1:2]]
the Output of text_nodes:
['Version: 1.9.5.12', 'Active installations: 10,000+', 'Tested up to: 5.6 ']
but if we want to fetch the data of all the wordpress-plugins and subesquently sort them to show the -let us say - latest 50 updated plugins. This would be a intereting task
- first of all we need to fetch the urls
- then we fetch the iformation and have to sort out the _newest_
Question
tarifa
hello dear all,
i am fairly new to bs4for that matter, but im trying to scrape a little chunk of information from a site:
but it keeps printing "None" as if the title, or any tag if i replace it, doesn't exists.
The project consits of two parts:
the looping-part: (which seems to be pretty straightforward). the parser-part: where i have some issues - see below. I'm trying to loop through an array of URLs and scrape the data below from a list of wordpress-plugins. See my loop below-
from bs4 import BeautifulSoup import requests #array of URLs to loop through, will be larger once I get the loop working correctly plugins = ['https://wordpress.org/plugins/wp-job-manager', 'https://wordpress.org/plugins/ninja-forms']
this can be done like so
ttt = page_soup.find("div", {"class":"plugin-meta"}) text_nodes = [node.text.strip() for node in ttt.ul.findChildren('li')[:-1:2]]
the Output of text_nodes:
['Version: 1.9.5.12', 'Active installations: 10,000+', 'Tested up to: 5.6 ']
but if we want to fetch the data of all the wordpress-plugins and subesquently sort them to show the -let us say - latest 50 updated plugins. This would be a intereting task
- first of all we need to fetch the urls
- then we fetch the iformation and have to sort out the _newest_
Link to comment
https://www.neowin.net/forum/topic/1394106-asyncio-web-scraping-fetching-multiple-urls-with-aiohttp-doable/Share on other sites
0 answers to this question
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now