Project Solution Review
Get the solution of all components of the project "Web Crawler" with a detailed explanation.
Solution: Add URLs to queue
In this component, you were required to create a celery queue and add tasks to the queue so that they can be fetched later by a worker for processing.
Following is the solution:
Click on the Run button to execute the solution code provided below.
import celery
import requests
app = celery.Celery('celery-proj',
broker='redis://localhost',
backend='redis://localhost')
@app.task()
def getURL(url_to_crawl):
dic = {}
r = requests.get(url = url_to_crawl)
text = r.text
dic['data'] = text
dic['status_code'] = r.status_code
return dic
if __name__ == '__main__':
urls = ["http://educative.io", "http://example.org/", "http://example.com"]
results = []
for url in urls:
results.append( getURL.delay(url))
for result in results:
print("Task state: %s" % result.state)Solution: Create celery queue and add tasks (URLs to fetch)
Solution explanation
The solution to this component can be divided into three sub-parts:
-
On line 4 - 6, a
celeryqueue is created that hasredisas backend and broker. We also start theredisserver in the background using commandredis-server --daemonize yes. -
On line 8 - 15, a
celerytaskis defined. This includes the declaration of the task@app.task()followed by the function definition that workers would be ...
Ask