When bootstrapping Nestpick in 2016, I decided to work on a web scraping and search playground so I could learn and prototype the features that we would be adding to the product later. Nestpick is an apartment mid-term rental aggregator that allow users to search on several providers at once. The technical solution relies on scraping these partners via API or web and indexing this data on a search engine component.
The project is structured in X parts:
- Web scraper that fetches the content of all bookmarked links in Pinboard.in, using Scrapy.
- Index that is stored in Solr search engine
- A search frontend built in python Flask framework that work as a Solr client.
The project README shows you how to setup it yourself so feel free to use it in case you need to play around search engines and web scraping.