Focused scraping component for the Statistical Scraping concept in Official Statistics.
This repo is part of the WEB-FOSS-NL project on statistical scraping. More info on statistical scraping here
- Install all required packages using
pip install -r requirements.txt
- Activate the environment
- run the following command to install modules in src as packages for proper import
pip install -e .
- Create a
config.yamlfile usingconfig_template.yaml - In the config file specify the input files:
urls: the filename with the given urls, see alsourls_template.txtkeywords: the filename with the target keywords, see alsokeywords_template.txt
- wip: main.py just fetches a few example pages, not yet what the project intends to do
- wip: crawling module is not fully tested, imports not sorted out, test code doesn't yield results