custom-scrape

Custom web scraper for screenshotting item listings using a json config.

Usage

By default running:

python custom_scrape.python

will scrape the sites defined in scrape_configs.json and save the found items in the screenshots folder.

scrape_configs.json must contain a list of objects in the following format:

[
    {
        "name": "Amiami Figure Preorders",
        "urls": ["something.com", "other.com"],
        "item_class": "item",
        "wait_for_class": "item-list"
    }
]

name: Name of the site(s) you are scraping. This is just to label them in the final output, can be anything.

urls: List is urls to scrape.

item_class: CSS class you want to scrape and screenshot on the page.

Optional

wait_for_class: Needed only if you want to wait on a specific CSS class before scraping. This defaults to item_class.

Arguments

The name of the discord text channel to send results to. Requires the DISCORD_TOKEN environment variable to be set.

--discord_notification_channel general

Change the config file to use (default is scrape_configs.json):

--config_file "some_other_file.text"

Show the web browser while it is scraping (i.e. make it non-headless):

--no_headless

Disable javascript:

--js_disabled

Adjust how long the scraper waits items to appear (default is 30):

--timeout_secs 15

Adjust how long the scraper before searching each page (default is 3):

--page_wait_secs 10

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
screenshots		screenshots
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
custom_scrape.py		custom_scrape.py
scrape_configs.json		scrape_configs.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

custom-scrape

Usage

Optional

Arguments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

danielloera/custom-scrape

Folders and files

Latest commit

History

Repository files navigation

custom-scrape

Usage

Optional

Arguments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages