Building an RT Twitterbot

Creating a ReTweeting Twitterbot used to be easy. My colleague Peter Smith originally pointed me towards Tweetalert, a service which scanned Twitter Search for the hashtag of your choice, then retweeted the associated posts via your own Twitter account, or another you'd created for that specific purpose.  This is how the GCPEDIA and GCconnex Twitter accounts originally operated.

With the demise of this service, I needed a replacement.  Tweetalert recommends RSS2twitter, which as you would guess is a service for converting an RSS feed to a tweetstream.  As a simple solution, this can work.  Go to Twitter Search, type in your keyword of interest, and wait for the page of results.  Then, copy the link of the RSS feed ("Feed for this query") and provide that as the link for RSS2twitter to broadcast when you sign up for an account there.  You might also want to look at my "Connecting a ReTweeting Service" section at the end of this article.

But for my bot, I wanted something more intricate.  If you don't, then stop reading now.

My reason for going the more complicated route is primarily to reduce duplicate tweets.  I want to capture all the first hand mentions of a keyword/hashtag, but none of the retweets or personal messages between users where the term is also mentioned.  Another consideration is the fact that ReTweeting services have limits.  Twitterfeed (which I've chosen over RSS2twitter) will rebroadcast tweets at a maximum rate of 5 per 30 minutes.  Removing the duplicate tweets through pre-processing is a great way to make the most of that cap, and not lose any first-run content.

I found a tutorial on YouTube.  It's extremely useful, but due to the spinning graphics it's a bit difficult to follow.  Watch the video to orient yourself with the process, then follow my steps below.



According to Yahoo!, Pipes is "a powerful composition tool to aggregate, manipulate, and mashup content from around the web".  It's a great way to take a raw RSS feed and process it to remove unwanted content.

Creating a Yahoo! Pipe

Above, I described how to grab the RSS feed from a Twitter Search. Copy that link.

Now, from the Yahoo! Pipes main page, click the "Create a pipe" button. You'll be placed on a blank graph paper-style canvas.  A variety of object modules are in the left-hand column.

Drag the "Fetch feed" module onto the canvas (anywhere, but perhaps starting in the upper left corner) and paste your Twitter Search RSS feed address in the URL box.


OPTIONAL STEP: Currently, Twitter has a limit as to the number of times Yahoo Pipes can contact it for updates. If you have access to web hosting with php, you can create a file that queries Twitter Search and provides updated results for your ReTweeting Service (below) without the limitation.

For example, if I wanted to ReTweet any mention of the term "semantic", I could create a file called semantic.php and upload it to my server. The file would contain:

<?
$url = "http://search.twitter.com/search.atom?q=semantic";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;
?>

Now, the URL to this file would be used as the address I would enter in Pipes' Fetch Feed box, instead of the direct query to Twitter shown in the graphic above.

Once you've filled in the Fetch Feed address, we can begin the filtration process...

Click on the Operators heading, to reveal those modules. Drag the "Filter" module to the canvas and drop it somewhere near the feed module. Leave a thumbwidth of space in between. We'll have to connect all these modules together later with "pipes".


This filter prevents multiple retweets of the same content by different users, ignores personal communication between users where the term from the fetched feed ("GCconnex") is mentioned, and prevents the bot (also named "GCconnex") from retweeting itself. Change item.author.uri to match the Twitter account name that you'll be using to retweet from.

Now, some additional processing:

Again, from under the Operators heading, grab the "Loop" module. It's hollow on the inside, so we need to grab another module to fill it.

Click on the String heading, to reveal those modules. Drag the "String Builder" module and drop it inside the "Loop" module. Then, fill in the boxes:


The strings are:
item.author.uri
item.y:published.year
item.content.content

Finally, we'll reformat the tweet so that Twitterfeed receives a clean, organized string to retweet:

Again, under the Operators heading, grab "Regex" and populate it as I've done. Click on the graphic if you need to enlarge it.


At this stage, Twitter's URL is stripped and replaced with the standard RT @, to attribute authorship to the original tweeter. The second line replaces the date with a colon and space (": "). You obviously can't see the space in this screen cap, so make sure you whack the spacebar once after typing your colon, or there will be no space between the user's name and their tweet.

This bit of code will need to be updated once a year to reflect the date change. It's a tiny hassle, but the results are worth it.

Connect the modules in the order we've created them, with the "Regex" module exiting to "Pipe Output". You connect modules by clicking and dragging the bottom circle of the module toward the top circle of an adjacent one to create a pipeline.

Finally, make sure you save your pipe!


Congratulations. The hard part is done. The good news is that now that you've saved this as a prototype, you can easily create additional bots by cloning it. The GCconnex bot is a clone of the GCPEDIA bot, with only minor alterations to update the raw RSS feed and filter criteria.

Connecting a ReTweeting Service

I use Twitterfeed in this example, but RSS2twitter, Twitterlive and similar services should do just fine and the configuration will be similar. Sign up for an account, then begin by giving it the processed RSS feed from Pipes.
  • You'll need to find your Pipe's feed. Click on "My Pipes", select its name from the list, then right click and copy the link location of the "Get as RSS" link. 
  • From Twitterfeed, go to the Feed Dashboard and select "Create New Feed". 
  • Name the feed, and paste the URL from step 1. 
  • Open the Advanced Settings, and make the following changes:

Make sure the feed checkbox is active, complete the steps, and save!

Be patient. It can take a bit of time to start, and I've noticed that Twitter Search will sometimes refuse to talk to Pipes temporarily if it has been queried too often, but I can personally attest that this system works well.

Cheers.

Update2011 — Using Feedburner to tweet my processed RSS feeds.
Update2012 — Still using Feedburner, but also If This Then That to tweet RSS feeds.