Translate An RSS Feed To Another Language in Python

A reader suggested that it might be useful to have a script that could get an RSS feed translate it to another language and republish that feed somewhere else. Thankfully that’s pretty easy to do in Python.

I wrote this script by taking bits and pieces from some of the other scripts that I’ve posted on this blog in the past. It’s surprising just how much of a resource this site has turned into.

It uses the Google Translate Service to convert the RSS feed content from one language to another and will simply echo out the new RSS content to the standard out. If you wanted to republish the content then you could easily direct the output to a file and upload that to your web server.

Example Usage:

$ python translateRSS.py
< ?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0"><channel><title>HalOtis Marketing</title><link>http://www.halotis.com</link><description>Esprit d&amp;#39;entreprise dans le 21?me si?cle</description>
.....
</channel></rss>

Here’s the Script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import feedparser  # available at feedparser.org
from translate import translate  # available at http://www.halotis.com/2009/07/20/translating-text-using-google-translate-and-python/
import PyRSS2Gen # avaliable at http://www.dalkescientific.com/Python/PyRSS2Gen.html
 
import datetime 
import re
 
def remove_html_tags(data):
    p = re.compile(r'< .*?>')
    return p.sub('', data)
 
def translate_rss(sl, tl, url):
 
    d = feedparser.parse(url)
 
    #unfortunately feedparser doesn't output rss so we need to create the RSS feed using PyRSS2Gen
    items = [PyRSS2Gen.RSSItem( 
        title = translate(sl, tl, x.title), 
        link = x.link, 
        description = translate(sl, tl, remove_html_tags(x.summary)), 
        guid = x.link, 
        pubDate = datetime.datetime( 
            x.modified_parsed[0], 
            x.modified_parsed[1], 
            x.modified_parsed[2], 
            x.modified_parsed[3], 
            x.modified_parsed[4], 
            x.modified_parsed[5])) 
        for x in d.entries]
 
    rss = PyRSS2Gen.RSS2( 
        title = d.feed.title, 
        link = d.feed.link, 
        description = translate(sl, tl, d.feed.description), 
        lastBuildDate = datetime.datetime.now(), 
        items = items) 
    #emit the feed 
    xml = rss.to_xml()
 
    return xml
 
if __name__ == '__main__':
  feed = translate_rss('en', 'fr', 'http://www.halotis.com/feed/')
  print feed