Scrape yellow pages using PHP

Yellow Pages is a business directory website that contains listings for businesses, organizations, and individuals. It was originally created as a printed directory in the late 19th century, with business listings arranged by category and alphabetically within each category. The name “Yellow Pages” comes from the color of the paper used for the directory.

We can use the data available on the website to build a cold call list in order to sell our products to target customers. We are going to scrape the name of the restaurant and its contact number from this page. And then save the data to a CSV file for easy access.

Setting up the prerequisites

I am assuming that you have already installed php on your machine. Apart from this, we will use built-in libraries.

  1. DOMDocument: DOMDocument is a built-in PHP library that provides an object-oriented interface for working with XML and HTML documents. It allows you to parse, manipulate, and generate XML and HTML documents.
  2. curl: For making the HTTP connection with the website.

Downloading raw data from yellowpages.com

Using the curl library we are going to download raw HTML data from the target website.

// Initialize a new cURL session
$curl = curl_init();

// Set the URL to fetch
$url = "https://www.yellowpages.com/los-angeles-ca/restaurants";
curl_setopt($curl, CURLOPT_URL, $url);

// Set the option to return the response as a string
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

// Execute the cURL request and fetch the response
$response = curl_exec($curl);

// Close the cURL session
curl_close($curl);

Once you run this code you will get the HTML content of the target website.

Before we use DOMDocument to parse the data, let’s find the location of each element.

Each box is located inside a div tag with the class result. You can check it in the image below.

Let’s find the location of the name tag.

Here we can see that the name is stored in a tag.

Now, let’s see where the number is stored.

You can find the number inside the div tag with class phones.

Now, we have the location of the data element we want to extract.

// Initialize a new cURL session
$curl = curl_init();

// Set the URL to fetch
$url = "https://www.yellowpages.com/los-angeles-ca/restaurants";
curl_setopt($curl, CURLOPT_URL, $url);

// Set the option to return the response as a string
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

// Execute the cURL request and fetch the response
$response = curl_exec($curl);

// Close the cURL session
curl_close($curl);

// Create a new DOMDocument object and load the HTML data
$dom = new DOMDocument();
$dom->loadHTML($response);

// Find all the restaurant listings on the page
$listings = $dom->getElementsByTagName('div');
foreach ($listings as $listing) {
  if ($listing->getAttribute('class') == 'result') {
      // Extract the name and phone number of the restaurant
      $name = trim($listing->getElementsByTagName('a')[0]->nodeValue);
  $phone = trim($listing->getElementsByTagName('div')[0]->getElementsByTagName('p')[0]->nodeValue);

   }

}

Here we created a constructor for DOMDocument and then we are loading the raw data in it.

Let’s now create a CSV file to save the data.

// Initialize a new cURL session
$curl = curl_init();

// Set the URL to fetch
$url = "https://www.yellowpages.com/los-angeles-ca/restaurants";
curl_setopt($curl, CURLOPT_URL, $url);

// Set the option to return the response as a string
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

// Execute the cURL request and fetch the response
$response = curl_exec($curl);

// Close the cURL session
curl_close($curl);

// Create a new DOMDocument object and load the HTML data
$dom = new DOMDocument();
$dom->loadHTML($response);

// Create a new CSV file and write the header row
$fp = fopen('restaurants.csv', 'w');
fputcsv($fp, array('Name', 'Phone Number'));

// Find all the restaurant listings on the page
$listings = $dom->getElementsByTagName('div');
foreach ($listings as $listing) {
  if ($listing->getAttribute('class') == 'result') {
      // Extract the name and phone number of the restaurant
      $name = trim($listing->getElementsByTagName('a')[0]->nodeValue);
      $phone = trim($listing->getElementsByTagName('div')[0]->getElementsByTagName('p')[0]->nodeValue);

       // Write the data to the CSV file
      fputcsv($fp, array($name, $phone));

   }

}

// Close the CSV file
fclose($fp);

Summary of the complete code:

In this code, we first fetch the HTML data using cURL. We then create a new DOMDocument object and load the HTML data into it using the loadHTML() method.

Next, we create a new CSV file using the fopen() function and write the header row to it using the fputcsv() function.

We then find all the restaurant listings on the page using the getElementsByTagName() method and loop through them. For each listing, we extract the name and phone number of the restaurant using the nodeValue property and the trim() function to remove any extra whitespace.

Finally, we write the data to the CSV file using the fputcsv() function and close the file using the fclose() function.

Once you run the code you will find the name of the restaurant and their phone numbers in a CSV file. The file will be in the same folder as your main php scraper file.

Read More: Scraping Yellow Pages with Python

Conclusion

In conclusion, scraping Yellow Pages using PHP can be an effective way to access a large database of business information, gain insights into market trends and customer preferences, and generate leads for your business. By automating the data collection process, you can save time and resources while collecting valuable data.

With proper planning and execution, scraping Yellow Pages can be a powerful tool for businesses looking to gain a competitive edge and make informed business decisions.


Guest Author Information

Author Name:– Manthan Koolwal

Author Bio – Manthan loves to create web scrapers. He has been working on them for the last 10 years now. He has been creating data pipelines for multiple MNCs in past Currently, he is working on Scrapingdog. It is a web scraping API that can scrape any website without blockage at any scale. Feel free to connect him for any web scraping query. 

Author website – https://www.scrapingdog.com


5 PHP web scraping libraries that you should use

Web Scraping is evolving every day. There are new methods and new libraries every day for making web scraping fun and easy. First, let’s understand why we do web scraping, and then we will focus on the top 5 PHP libraries that can help you scrape websites quickly.

(more…)

OpenLampTech issue #56 – Substack Repost

I’ve said it before and I’ll say it again; I can only hope you enjoy reading the OpenLampTech developer newsletter as much as I do. Once again this week we have a fantastic newsletter for you (and me). Enjoy and please share the publication to help spread the word that OpenLampTech is the media coverage for MySQL, PHP, and the LAMP stack landscapes!

(more…)

OpenLampTech issue #55 – Substack Repost

Welcome back to another issue of OpenLampTech, your media coverage of all things MySQL, PHP, and the LAMP stack ecosystems. We have another fantastic issue for you this week. Be sure you are subscribed to the newsletter so you get it each Friday. Thanks so much for the support!

(more…)