Skip to main content

Posts

Showing posts from July, 2015

Webscraping with AutoHotKeys and Python

Searching for novel data for a big data Power BI showcase I came accross Bilzonen.dk and Bilbasen.dk.

How do we extract data from a site without any knowledge of it's API?
Python, urllib and BeautifulSoup are commonly described as standard tools for webscraping although they lack the features and rendering qualities of real browsers. There seems to be a race for an ultimate webscraping Python package, although the web is constantly changing and most data are text based.



In the sections below I will cover three different data acquisition strategies.
In conclusion I suggest you combine tools and prepare to learn the intricacies of regular expressions.

We will use AutoHotKeys, spreadsheets and Python with packages such as re, requests and csv.


Scraping Bilbasen.dk: AutoHotKeys, Python packages requests, re and csv You can download script based automation software on this page: autohotkey.com Once installed you write AutoHotKey scripts in a text editor and save it as a file with postf…