So you want to grow your business’s list of email addresses by scraping some data from the web? Great news! Or well, is it?
Scraping email addresses to quickly and easily get your hands on hundreds (if not thousands) of valuable leads sure sounds enticing. But email scraping doesn’t come without its digital hurdles and roadblocks.
Many consider email extraction an illegal, black hat practice. And once you send your little scraper out there, you can be sure that sites will treat your bot as such. But not all email scraping is bad, nor is it impossible. So how can you scrape email addresses to grow your mailing list?
You’ll find out below.
Web scraping 101
You might know a bit about web scraping already, and if so, you can skip this part. But before diving into email extraction, it’s important to understand the basics of web scraping.
Data scraping or harvesting is an automated way of gathering and extracting information from a web page (or group of web pages). Data scraping is done with a robot – a web crawler – which visits a given URL, fetches the desired data and extracts it into a database or other format (like a spreadsheet).
The crawling part of this process is what search engine robots (like Bingbot and Googlebot) do to retrieve website information they use to index pages in the search engine. Web scraping goes a step further by downloading and extracting data from pages.
For example, you might want to gather some data from Google’s reverse image search. Unfortunately, Google doesn’t offer a Google reverse image search API that you can use to easily retrieve the data you want. Instead, you use a web scraper to crawl Google Image and extract that data for you. If you’re looking for a tool that does exactly that, visit SERPMaster’s web page and give it a try.
And whether this extraction should be allowed or not is a matter of debate (and yes, it has even resulted in court cases, like eBay v. Bidder’s Edge). Because is the bot technically not trespassing and breaching someone else’s data without their consent?
And when it comes to personal data like email addresses, this potentially turns into an even bigger privacy breach.
But we’ll get to the legality of email scraping in a bit. Let’s first look at how email scraping works.
How email scraping works
Email scraping starts with creating your bot like you would any other web scraper. The only difference is that you create one that can detect email addresses on websites. When used like this these bots are sometimes also called harvesting bots or harvesters. So how do they work?
An email address has a particular type of format (i.e., “name”@”domain”.”com”). Both the “@” and the “.” are compulsory, which makes an email address quite easily identifiable in a block of text.
So, you could create a bot that can detect text that fits the format “first_word [@] domain [dot] com”. Once created, you can feed your scraper a bunch of URLs to scrape. The email scraper will extract all text on the pages that fits that format and present it to you in whatever format you’ve chosen (like a spreadsheet).
More elaborate email scrapers are even programmed to send a test email to the scraped addresses first, just to make sure the scraped address is valid. That’s because your bot might return some false positives as well, as most bots aren’t 100% accurate.
And that’s email scraping in a nutshell. You can decide to build your own scraper, which, especially if you’re comfortable with coding, is probably the cheapest option. However, if you’re not that experienced, it might be easier to go with an email scraper tool. Some of these are free, while others require a monthly fee.
How email scraping can help your business
Now you know a bit more about email scraping, you probably have an idea of how it can help your business.
For most companies, the main benefit of email scraping is saving time and resources. If generating leads – in the form of email addresses – is an integral part of your business, just imagine how much easier it would be to have a robot do the work for you.
Instead of browsing the web hunting for email addresses of people of interest, you can just program an email harvester to do the heavy lifting for you.
The benefits become more questionable, however, if you’re a scammer trying to gather email addresses to send spam emails to. And that’s where the problem (and illegality) of email scraping comes in.
The legality of email harvesting
Email scraping – as web scraping in general – is a relative grey zone.
When you just want to automate the otherwise manual process of looking for publicly available addresses on company websites, are you really doing any harm? After all, you would otherwise retrieve the exact same information yourself, just manually, right?
But when you create a bot that rapidly extracts thousands of email addresses across the web to send mass spam emails to, is it then still so harmless?
The answer seems an obvious one. As long as done ethically and without malicious intent, email scraping there essentially isn’t anything wrong or bad about it.
But no matter your intent, right now, email scraping still remains a legal grey area. So if you do plan to harvest some emails, it’s good to be aware of the potential risks.