Tag Archives: google translate

translate_logoOk, so this isn’t my script but it’s a much nicer version of the one I wrote that scrapes the actual Google translate website to do the same thing. I’d like to thank Ashish Yadav for writing and sharing this.

Translating text is an easy way to create variations of content that is recognized as unique by the search engines. As part of a bigger SEO strategy this can make a big impact on your traffic. Or it could be used to provide an automated way to translate your website to another language.

# -*- coding: utf-8 -*-
 
import re
import sys
import urllib
import simplejson
 
baseUrl = "http://ajax.googleapis.com/ajax/services/language/translate"
 
def getSplits(text,splitLength=4500):
    '''
    Translate Api has a limit on length of text(4500 characters) that can be translated at once, 
    '''
    return (text[index:index+splitLength] for index in xrange(0,len(text),splitLength))
 
 
def translate(text,src='', to='en'):
    '''
    A Python Wrapper for Google AJAX Language API:
    * Uses Google Language Detection, in cases source language is not provided with the source text
    * Splits up text if it's longer then 4500 characters, as a limit put up by the API
    '''
 
    params = ({'langpair': '%s|%s' % (src, to),
             'v': '1.0'
             })
    retText=''
    for text in getSplits(text):
            params['q'] = text
            resp = simplejson.load(urllib.urlopen('%s' % (baseUrl), data = urllib.urlencode(params)))
            try:
                    retText += resp['responseData']['translatedText']
            except:
                    raise
    return retText
 
 
def test():
    msg = "      Write something You want to be translated to English,\n"\
        "      Enter ctrl+c to exit"
    print msg
    while True:
        text = raw_input('#>  ')
        retText = translate(text)
        print retText
 
 
if __name__=='__main__':
    try:
        test()
    except KeyboardInterrupt:
        print "\n"
        sys.exit(0)

A reader suggested that it might be useful to have a script that could get an RSS feed translate it to another language and republish that feed somewhere else. Thankfully that’s pretty easy to do in Python.

I wrote this script by taking bits and pieces from some of the other scripts that I’ve posted on this blog in the past. It’s surprising just how much of a resource this site has turned into.

It uses the Google Translate Service to convert the RSS feed content from one language to another and will simply echo out the new RSS content to the standard out. If you wanted to republish the content then you could easily direct the output to a file and upload that to your web server.

Example Usage:

$ python translateRSS.py
< ?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0"><channel><title>HalOtis Marketing</title><link>http://www.halotis.com</link><description>Esprit d&amp;#39;entreprise dans le 21?me si?cle</description>
.....
</channel></rss>

Here’s the Script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# (C) 2009 HalOtis Marketing
# written by Matt Warren
# http://halotis.com/
 
import feedparser  # available at feedparser.org
from translate import translate  # available at http://www.halotis.com/2009/07/20/translating-text-using-google-translate-and-python/
import PyRSS2Gen # avaliable at http://www.dalkescientific.com/Python/PyRSS2Gen.html
 
import datetime 
import re
 
def remove_html_tags(data):
    p = re.compile(r'< .*?>')
    return p.sub('', data)
 
def translate_rss(sl, tl, url):
 
    d = feedparser.parse(url)
 
    #unfortunately feedparser doesn't output rss so we need to create the RSS feed using PyRSS2Gen
    items = [PyRSS2Gen.RSSItem( 
        title = translate(sl, tl, x.title), 
        link = x.link, 
        description = translate(sl, tl, remove_html_tags(x.summary)), 
        guid = x.link, 
        pubDate = datetime.datetime( 
            x.modified_parsed[0], 
            x.modified_parsed[1], 
            x.modified_parsed[2], 
            x.modified_parsed[3], 
            x.modified_parsed[4], 
            x.modified_parsed[5])) 
        for x in d.entries]
 
    rss = PyRSS2Gen.RSS2( 
        title = d.feed.title, 
        link = d.feed.link, 
        description = translate(sl, tl, d.feed.description), 
        lastBuildDate = datetime.datetime.now(), 
        items = items) 
    #emit the feed 
    xml = rss.to_xml()
 
    return xml
 
if __name__ == '__main__':
  feed = translate_rss('en', 'fr', 'http://www.halotis.com/feed/')
  print feed