Skip to content
Snippets Groups Projects
Commit 4c932b38 authored by Cole Walton's avatar Cole Walton
Browse files

Might use wget to capture the data fromm CC instead

parent 894ad47a
No related branches found
No related tags found
No related merge requests found
import requests
import json
def main():
resp = requests.get('http://index.commoncrawl.org/CC-MAIN-2023-23-index?url=http%3A%2F%2Fcommoncrawl.org%2Ffaqs%2F&output=json')
pages = [json.loads(x) for x in resp.content.strip().split('\n')]
for page in pages:
print(page)
if __name__ == "__main__":
main()
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment