PHP SEO Code – Google Algorithm Theory

Before I start writing this, I need to explain my thoughts on SEO. I am not a conformist. I do not read articles and take it in as most would. I do not believe everything I read and certainly in the area of SEO, there cannot be set rules to follow, and if there is, which many blog authors will write about, then I will test them to the max to see if there is any truth in them.

Throughout this post I will revert to, and compare against ‘website B‘. Website B is a very well structured website design company in Mansfield, Nottinghamshire, in which i have had the pleasure in working with. They have a wide contrast of clients (large and small) and are very well educated in design, coding, programming and marketing and have been in the industry a lot longer than myself. To date, website B has tens of thousands of back links, have a well authoritative domain, and for their main search key phrase they rank 9th (data from my code not using the Google API) on Google and I am ranked 5th.

All of the content written for my front page (of my previous site, I am writing this blog before I change my site) was written completely 100% from using the code below. In place it might read bad English or some of the links might not be ‘by the book’ but it proves my theory which I will explain.

Google Theory Number 1

This section is called ‘Google theory number 1′ due to it could change in the future! All sites displayed on Google have to be placed where they are for a reason. There’s no doubt about it, there are no anomalies, incorrect placements or anything along those lines that’s fact! It’s fact because it’s an algorithm, a mathematical equation using input values and getting an output. Simple. If that is true and there are no fixed rules as such, I made me think for years why is Website B placed where it is and why the current number 1 is number 1. I’ll come back to this later…

Single word search phrases are obviously the hardest to optimise your site for so I thought I would do a quick search to help prove my theory some more. If your search for “graphics” you’ll notice some big contenders, BBC, Wiki, graphics.com (hehe of course) and then 7th/8th ‘www.technologystudent.com/designpro/drawdex.htm‘ and from a quick scan I can tell you the following:

  • Title: Drawing Index Page
  • Description Tag: none
  • Keyword Tag: none
  • Coding Form: html (table structure)
  • Visual Appeal: 1/10
  • Noticeable Fact: A long list of links going elsewhere
  • Back links: around 150

From the above, I think you’ll agree that your standard SEO rules which we abide by everyday don’t seem to be in play. Now the site below is completely full of graphics, well structured title, description and keyword tags and also has over 2 million back links. So why is it there?

I can’t 100% prove why some complete randoms are where they are but that’s only because I don’t have the ranking trends for every domain name since the random-er in question was first indexed. So I’m now going to tell you my thoughts and to see if anyone can see any correlation between what I am saying and what you have seen.

I believe that all ranking is based on the current rankings within the top 10 of the specified key phrase, and a new calculation is being based on the averages of these sites.

I think this was a way of purifying results at the late stages of Back rub, early stages of Google. The only way I can begin to explain what I think is by using some random figures below.

12 - 13 - 23 - 10 - 20 - 13 - 10 - 10 - 15 - 10

Imagine the above figures is the amount of words in a title tag. It would be absolutely safe to say if I built a site with the 14 words in the title tag, it would fit in nicely with this trend. Now, I now there’s more to think about (density, relativity etc), but if I averaged out every possible thing that could be associated with the algorithm I am guaranteed to get on that 1st page, it’s a calculation so it must be true. So lets just say I did get in that top 10 at 5th place….

12 - 13 - 23 - 10 - 14 - 20 - 13 - 10 - 10 - 15

That’s lovely, the average is now setting the trend of saying that a well structured title tag should have 14 words in it… but wait, what about if I had 20 words, yet I still perfectly averaged out everything else that could be included in the algorithm… surely I would still get on that 1st page, but maybe ranked 6th or maybe still ranked 5th, I don’t know. But what I do know is that the average will now change for the title tag setting a new trend. What I’m trying to say is that it is our own sites that are writing the algorithm. What will make you really think is, what if I am right, but the so called rules and regulations were never posted on blogs or written in articles… The trend will never have been set. The millions of sites that are now making the averages up for your key phrase wouldn’t exist, and would not be indexed correctly and Google would not be displaying good acurate search results.

The Code

Enough of my thoughts, here’s the code which I wrote to work out averages to base my sites content structure on. There are 2 methods to use, one using the Google API (which is safer to use on commercial or public applications but results may differ from what you see on Google search results) or by scraping Google results.

Yet again, i had to dust this code down because of the amount of time it’s being laying on my hard drive. Take it with a pinch of salt and use it to base your own investigations on.

<?php

class true_seo {
	
	public $string, $amount;
	private $arr, $foo, $bar, $plain, $text;
	
	public function __construct(){
		include_once "simple_html_dom.php";	
	}
	
	public function set_g_key( $key ) {
		$this->g_key = $key;	
	}
	
	public function set_y_key( $key ) {
		$this->y_key = $key;	
	}
	
	public function get_back_links( $domain ) {
		$yahoourl = "http://search.yahooapis.com/SiteExplorerService/V1/inlinkData?appid=" . $this->y_key ."&query=" . $domain . "&entire_site=1&omit_inlinks=domain&results=1&output=php";
		$data = file_get_contents($yahoourl);
		$results = unserialize($data);
		return $results['ResultSet']['totalResultsAvailable'];
	}
	
	public function density( $word_count, $phrase, $keyword_count ) {
		return sprintf("%01.2f", ( ( ( $keyword_count * str_word_count( $phrase ) ) * 100) / $word_count ) );	
	} 
	
	public function set_phrase( $string ){
		if( is_string ( $string ) ) {
			$string = array( $string );	
		}
		if( is_array ( $string ) ) {
			$this->phrases = $string;
		}else{
			Throw new exception("incorect input for phrase, string or array");	
		}
	}
	
	public function get_sites_use_spider( $amount ) {
		$main_result = array();
		foreach( $this->phrases as $phrase ) {
			$APIparams = array("key" => $this->g_key, "q" => $phrase, "start" => 0, "maxResults" => $amount, "filter" => true, "restrict" => "", "safeSearch" => false, "lr" => "lang_en", "ie" => "", "oe" => ""); 
			$data = true_seo::google_search_api( $APIparams, 'http://www.google.co.uk/search', false );
			new simple_html_dom();
			$html = str_get_html( $data );
			$result = array();
			foreach( $html->find('li.g h3 a') as $g ) {
				$data = $g->parent()->nextSibling();
				$other = $data->find('span a');
				$x = 0;
				foreach( $other as $d ) {
					( $x == 0 ? $cache = $d->href : $simular = $d->href );
					$x++;
				}
				$excess_span = $data->find('span',0)->outertext;
				if( isset( $data->find('div',0)->tag ) ) {
					$excess_div = $data->find('div',0)->outertext;
					$title = str_replace( array( $excess_span, $excess_div, '<em>', '</em>', '<br>', '<b>', '</b>' ), array( '','','','','','','' ), $data->outertext );
				}else{
					$title = str_replace( array( $excess_span, '<em>', '</em>', '<br>', '<b>', '</b>' ), array( '','','','','','' ), $data->outertext );					
				}
				$result[] = array( 'link' => $g->href, 'title' => strip_tags( $title ), 'cache' => $cache, 'simular' => 'http://www.google.co.uk' . $simular );
			}
			$main_result[$phrase] = $result;
			$html->clear();
		}
		$this->non_api_data = $main_result;
	}

	public function get_sites_use_api( $amount ) {
		$arr = array();
		foreach( $this->phrases as $phrase ) {
			if( $amount > 4 ) {
				$times = $amount / 4;
			}else{
				$times = 1;	
			}
			$arg = array();
			for($x = 0; $x < $times; $x++ ) {
				$APIparams = array("key" => $this->g_key, "q" => $phrase, "start" => ($x * 4), "maxResults" => 4, "filter" => true, "restrict" => "", "safeSearch" => false, "lr" => "lang_en", "ie" => "", "oe" => ""); 
				if( $data = true_seo::google_search_api( $APIparams, 'http://ajax.googleapis.com/ajax/services/search/web' ) ) {
					$arg = array_merge($arg, $data->responseData->results);	
				}else{
					Throw new exception("Request error: no results returned from Google.");
				}
			}
			$arg = array_reverse( $arg );
			$remove = $amount % 4;
			if( $amount < 4 ) {
				$remove = 4 - $amount;	
			}
			for( $x=0; $x < $remove; $x++ ) {
				unset( $arg[$x] );
			}
			$arg = array_reverse( $arg );
			foreach( $arg as $g ) {
				$result = array( 'link' => $g->url, 'title' => strip_tags( $g->content ), 'cache' => $g->cacheUrl, 'simular' => 'na' );
				$arr[$phrase][] = $result;
			}
		}
		$this->api_data = $arr;
	}
	
	public function google_search_api($args, $url, $api = true){
		if ( !array_key_exists('v', $args) ) {
			$args['v'] = '1.0';
		}
		$url .= '?'.http_build_query($args, '', '&');
		if( $result = @file_get_contents($url) ) {
			if( $api == true ) {
				return json_decode($result);	
			}else{
				return $result;	
			}
		}else{
			Throw new exception("No data returned from url: $url");	
		}
	} 
	
	public function set_get_actual( $string ) {
		$this->actual->name = $string;
		$this->actual->data = file_get_contents( $string );	
	}
	
	private function get_dom( $str = false ){
		new simple_html_dom();
		if( ! $str ) {
			return str_get_html( $this->actual->data );
		}else{
			return str_get_html( $str );	
		}
	}
	
	private function defineVars(){
		$foo->plaintext = false;
		$foo->word_count = false;
		$foo->highlight_key_words = false;	
		return $foo;
	}
	
	private function make_list($array, $length) {
  		return true_seo::make_list_rec("", $array, $length);
	}

	private function make_list_rec($prefix, $array, $length) {
  		$ret = array();
  		$append = array();
  		foreach ($array as $a) {
			$ret[] = "$prefix $a\n";
			if ($length === 1) continue;
			$new_length = $length - 1;
			$new_prefix = $prefix . " " . $a;
			$append = array_merge($append, true_seo::make_list_rec($new_prefix, $array, $new_length));
		}
		return array_merge($ret, $append);
	}
	
	private function sort_em($a,$b){
		return strlen($b)-strlen($a);
	}
	
	private function highlight( $text, $phrases ) {
		$count = array();
		usort($phrases, 'true_seo::sort_em');
		foreach( $phrases as $phrase ) {
			$phrase = trim( $phrase );
			$class = 'highlight' . str_word_count( $phrase );
			if( preg_match_all( "#" . preg_quote( $phrase, '#' ) . "#is", $text, $matches ) ) {
				$text = preg_replace( "#" . preg_quote( $phrase, '#' ) . "#is", "<span class=\"" . $class . "\">$0</span>", $text );
				$count[$phrase] = count($matches[0]);	
			}
		}
		return array($text, $count);
	}
	
	private function get_all_phrases() {
		if( ! isset( $this->current_phrase ) ) {
			Throw new Exception( 'set_current_phrase() is not defined' );
			exit;	
		}
		$phases_array = explode( " ", $this->current_phrase );
		return true_seo::make_list($phases_array, count( $phases_array ) ); 		
	}
	
	public function set_current_phrase( $phrase ) {
		$this->current_phrase = $phrase;	
	}
	
	public function ob_from_str( $str ) {
		$html = true_seo::get_dom( $str );
		return true_seo::get_objects( $html );	
	}
	
	private function get_objects( $ob ) {
		if( isset( $ob->content ) ) {
			$plain = $ob->content;
		}else{
			$plain = $ob->plaintext;	
		}
		$foo->plaintext = $plain;
		$foo->word_count = str_word_count( $plain );
		$hData = true_seo::highlight( $plain, true_seo::get_all_phrases() );
		$foo->highlight_key_words = $hData[0];
		$foo->phrase_count = $hData[1]; 
		return $foo;	
	}
	
	public function get_actual( $tag ) {
		switch( $tag ) {
			case 'keywords' :
				$str = 'head meta[name=keywords]';
				break;
			case 'description' :
				$str = 'head meta[name=description]';
				break;
			case 'title' :
				$str = 'title';
				break;
			case 'h1' :
				$str = 'h1';
				break;
			default :
				Throw new Exception('invalid argument');
		}
		$html = true_seo::get_dom();
		if( $html->find($str,0) ) {
			$bar = $html->find($str,0);
			return true_seo::get_objects( $bar );
		}else{
			return true_seo::defineVars();
		}	
	}
}


?>

And then use the following like so:

<?php
ini_set('error_reporting', 'E_ALL & ~E_NOTICE');
// Requires a Google API key and a Yahoo API Key for backlinks
try{
	require "./classes/class_true_seo.php";
	$seo = new true_seo();
	$seo->set_g_key('ABQIAAAAsWzmZ4RXdIk0a-qpqKCBRSl_WmKnmsXGmN0kkjN2wkrfEOY-hT2sL-_x5v4NtT3DgElKNsR7FDJDQ');
	$seo->set_y_key('dj0yJmk9WWZPb29TZ3dlZ1ZLJmQ9WVdrOVFYHRNVzlsTXpnbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD03MA--');
	$seo->set_phrase(array("graphics"));
	$seo->get_sites_use_api(10);
	ob_start();
	foreach( $seo->api_data as $key => $phrase_return ){
		$seo->set_current_phrase( $key );
		echo "<h2>" . $key . "</h2>";
		foreach( $phrase_return as $rank => $results ){
			$seo->set_get_actual( $results['link'] );
			echo "<p class=\"url accordionButton\"><strong>#" . ( $rank + 1 ) . "</strong> <a class=\"urla\" href=\"" . $results['link'] . "\">" . $seo->ob_from_str( $results['link'] )->highlight_key_words . "</a></p>";
			echo "<div class=\"accordionContent\">\n";
			$get_data_array = array('title', 'description','keywords','h1');
			foreach( $get_data_array as $tag ) {
				$ob = $seo->get_actual( $tag ); 	
				echo "<div class=\"wrap-" . $tag . "\">\n";
				echo "	<h3>$tag | Word count: " . $ob->word_count . "</h3>";
				echo "	<p>" . $ob->highlight_key_words . "</p>\n";
				echo "<ul class=\"density\">\n";
				foreach( $ob->phrase_count as $ph => $am ) {
					echo "<li><small><strong>$ph</strong> density: " . true_seo::density( $ob->word_count, $ph, $am ) . "%</small></li>\n"; 	
				}
				echo "</ul>\n";
				echo "<div class=\"clear\"></div>";
				echo "</div>\n";				
			}
			echo "<p class=\"backlinks\">Total backlinks: " . $seo->get_back_links( $results['link'] ) . "</p>";
			 echo "</div>";
		}
	}
	$api_return = ob_get_clean();
}catch(Exception $err){
	$error = $err->getMessage();	
}

?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="description" content="" />
<title>Untitled Document</title>
<link href="css/cssReset.css" rel="stylesheet" type="text/css" />
<link href="css/main.css" rel="stylesheet" type="text/css" />
<link href="style/format.css" rel="stylesheet" type="text/css" />
<link href="style/text.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"> </script>
<script type="text/javascript" src="includes/javascript.js"> </script>
</head>
<body>
<div id="wrapper">
	<div id="left-content" style="width:auto;">
    	<h2 class="cont-head">Google results using Google Search AJAX API</h2>
        <?php echo $api_return; ?>
    </div>
</div>
</body>
</html>

This code also requires a class called the Simple HTML DOM Parser written by S.C. Chen who I must give the maximum credit to as I have used this class in many of my projects.

This article was written by Luke Snowden

Follow Luke Snowden on Twitter

Befriend Luke Snowden on Facebook

All context and code written within this article are the property of Luke Snowden, unless stated otherwise in which credit will be given.

This entry was posted in General. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

* Copy this password:

* Type or paste password here:

2,145 Spam Comments Blocked so far by Spam Free Wordpress

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">