Create a custom RSS 2.0 XML Feed for your WordPress blog

Earlier this year, I needed a way to expose some specific WordPress post data (using a custom query) to only a select audience. In this case, the audience was a content ingestor that would come knocking on a semi-private door, using custom request headers. Behind the door, post-specific data.

The semi-private door was created using a WP plugin so that my content ‘scraper’ (not a crawler mind you) could scrape up the post data for a content processing entity. The scraper was whipped up using python and some great RSS modules.

Note: the following code snippets are all part of the plugin – the breaks are for formatting only.

First, the plugin PHP…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
add_filter('init', 'myplugin_hook_crawler', 1);

function myplugin_hook_crawler() {
if (!isset($_SERVER['HTTP_MYSCRAPER']) || !isset($_GET['myscraper'])) {
return;
} else if (isset($_SERVER['HTTP_MYSCRAPER']) && isset($_GET['myscraper'])) {
global $wpdb;
set_error_handler('my_errhandle');
$lower = ((((int) $_SERVER['HTTP_MYPAGE'] * (int) $_SERVER['HTTP_MYPOSTS']) - (int) $_SERVER['HTTP_MYPOSTS']) + 1);
$upper = (((int) $_SERVER['HTTP_MYPOSTS']));
($lower == 1) ? ($lower = 0) : ($lower = $lower);
$posts = $wpdb->get_results($wpdb->prepare("SELECT * FROM $wpdb->posts WHERE post_type = 'post' AND post_status = 'publish' ORDER BY post_date DESC LIMIT $lower, $upper"));
$post_max = count($posts); // get max posts returned
$tag_counter = 0; // reset tag counter
header("Content-type: text/xml; charset=utf-8");
?>

Next, setup your RSS 2.0 compliant XML output…

1
2
3
4
5
6
7
8
9
10
11
<!--?php echo "<?xml version="1.0" encoding="utf-8"?-->\n"; ?&gt;

myscraper

<!--?=get_bloginfo('url')? -->myscraper makes your WP post data available for an awesome content ingestion engine

&nbsp;

<!--?=date("r");?-->

<!--?=$post_max? -->

 

Code in a php loop for processing the results of the wpdb query results…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
&nbsp;

<!--?php foreach ($posts as $post) {<br ?-->$tags = get_the_tags($post-&gt;ID);
if($tags){
$tag_max = count($tags);
$tag_counter = 1; $post_tags = '';
foreach ($tags as $tag) {
if ($tag_counter == $tag_max) {
$post_tags .= $tag-&gt;name;
} else {
$post_tags .= $tag-&gt;name . "|";
}
$tag_counter++;
}
}
?&gt;

XML format for each post…

1
2
3
4
5
6
7
8
9
10
<!--?php echo $post--->post_date;?&gt;

&nbsp;

&nbsp;

&nbsp;

<![CDATA[<?php echo $post_tags;?>]]>
<!--?php } ?-->

Cose out the plugin code…

1
2
3
<!--?php <br ?--> restore_error_handler();
exit;
}

Important Note: the use of CDATA tags. That tip, along with the suggestion to use a valid RSS 2.0 format were both time- and life-saving tips from @ka

Using a standard RSS compliant feed format was time-saving from a scraper implementation standpoint because I used python and there are great RSS processing modules available.

Using the CDATA tags was life-saving because of the sheer joy involved with mixing python and unicode chars.

%d bloggers like this: