Apify Discord Mirror

Updated 5 months ago

compass/crawler-google-places optimization

At a glance
The community member is interested in using a scraper more effectively and efficiently to find all places that sell alcohol, such as bars, pubs, and nightclubs. They are considering whether to do multiple runs per location with different categories or to slim down the category list to do a single run per location. They are also wondering at what point there will be no new results, as they have noticed many duplicate or missing results when searching for categories like "bar", "brewery", and "brewpub". Another question is whether data will be duplicated if they abort a run with too many categories and start a new run with a smaller set of categories in the same location. A community member suggests using one run per location with the most generic categories like "bar", "restaurant", and "nightclub" to avoid duplicates, and testing the categories on a small location first to see if the results are the same.
I like the scraper and would like to use it more frequently and effectively and I plan on upgrading to a paid plan soon. I understand I can use the orchestrator to have multiple runs that execute at the same time to best utilize my resources but I want to make sure that I am optimizing the individual runs. I have a wide range of categories that I would like to query for but I am wondering if I should do multiple runs per location and query for different categories on each run or if I should just slim the list down so that I can do a single run per location?

Also, at what point will there be no new results? For instance, I have gone through all the categories listed that I can put as an input and have picked out things like bar, brewery, brewpub, etc but I noticed that most of the entries in the log are either that there is no data for the search term or all the data is duplicate. Are there certain categories that are better than others or will encompass others?

Another question is if I abort a run because I had too many categories in the input, and then I star another run in the same location but with a smaller set of categories, will data be duplicated, or will these places be passed over because I already have them stored?

My end goal is essentially to have all the places that sell alcohol (bars, pubs, nightclubs, etc) and I want to try and do this as efficiently as possible. Sorry for the long message hopefully somebody has some insight for me
o
1 comment
Hi, for efficiency I would use one run per location and the most generic categories: bar, restaurant, nightclub...Using both "bar" and "pub" might find many duplicates. I'd test them on some small location (city center?) and see if they give you same results or not.

Also, at what point will there be no new results?...

If you enter N categories in the input than the scraper performs N searches in the specified area, one for each category. When you search for "bar", "brewery" and "brewpub", from what I observed, the scraper finds most of the breweries and brewpubs when searching for "bar" and so the "brewery" and "brewpub" search will find mostly duplicates. Hopefully this makes sense πŸ™‚

...will data be duplicated, or will these places be passed over because I already have them stored?

The runs are independent, so the second run will scrape places regardless of whether they were scraped in the previous run.
Add a reply
Sign up and join the conversation on Discord