Programming and Algorithms: Week 2

banner

Google Search

What are we doing this week?


This week we are going to look at how the FILE ANALYSIS part of GOOGLE SEARCH in PYTHON. We'll look at CHARACTER COUNT, WORD COUNT, LINE COUNT in PYTHON. We'll look at how to measure WORD FREQUENCY and look at full FILE ANALYSIS in PYTHON. We are also going to look at how to open URLs and join URLs in PYTHON. We are also going to look at the code for a WEB CRAWLER in PYTHON.
 
Python Python Logo

Powerpoint: File Analysis



Powerpoint: Web Crawling


Powerpoint: More on Content-Type

Total running time of videos is 55 minutes.

Google logo
Matt Cutts: How Search Works


TED 
Eli Pariser: Beware online "filter bubbles"





Python Page Spider Web Crawler Tutorial



Scrape Websites with Python + Beautiful Soup 4 + Requests





Links
Think Python: Word Frequency Analysis
http://greenteapress.com/thinkpython/html/thinkpython014.html

Learn Python the Hard Way: Dictionaries, Oh Lovely Dictionaries
http://learnpythonthehardway.org/book/ex39.html

Python Docs: Brief Tour of the Standard Library
https://docs.python.org/2/tutorial/stdlib.html


Scrapy - A Fast and Powerful Scraping and Web Crawling Program
http://scrapy.org/

HTML Scraping - The Hitchhiker's Guide to Python
http://docs.python-guide.org/en/latest/scenarios/scrape/



Online Interpreter:
Python Intrepreter
Sample Code:
String Pre-Processing * File Statistics * Word Frequency * Full File Analysis
Sample Files:
StarWarsScript.txtCompleteShakespeare.txt

Replacing a word in a file:
File Word Replace * Input_file.txt
Filtering a String:
Sample String Filtering

URL Open * URL Join * Web Crawler


More on the HTML Content-Type:

Powerpoint: More on Content-Type

Content Type Checker * HTMLCheckerImproved URL Open * Improved Web Crawler



Lab #2
Lab #2 is about adding options to the FULL FILE ANALYSIS program, and
about adding options to the WEB CRAWLER program.

back

If you have any suggestions, corrections, or comments, please feel free to e-mail me at:
Damian.X.Gordon(a)tudublin.ie