Translating Twitter
Note: Google deprecated the API used in this article. See the new article that uses Bing Translate instead.
Note: I've made an update on January 6th to this article after a comment by a reader.
As the author of Xdebug I am interested in finding out what people think of it, and whether they have problems or compliments. I've set-up a twitter account for Xdebug, @xdebug, and my twitter client Haunt also shows me all tweets with the search term xdebug.
However, sometimes I get tweets in a language I can't read; for example Brazilian Portuguese:
Debugando aplicações PHP com Xdebug e Eclipse PDT: http://bit.ly/ffJC4G
-
junichi_y
or Japanese:
@pomu0325 ありがとうございます!このXdebugの書き方と各場所を調べてたんですよ!こんなふうに書くんですね。
-
Ken
Once in a while, I would send these tweets through Google's language tools but then my friend Elizabeth tweeted:
Hey Lazyweb, is there a twitter client that lets me filter tweets by language?
-
Elizabeth Naramore
Instead of a manual copy and paste in into the language tools, I thought it'd be nice to embed it directly into the client when it is requested.
Sadly, tweets don't have a language associated with them, so the first step is to actually find out which language a tweet is in. Google provides a web service called "Language Detect". To use this service, you only have to query a specific URL containing the text you want to guess the language off. and parse the returned JSON structure. The Google website has an example which basically boils down to requesting the following URL: https://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=Hola,+mi+amigo
It returns the following JSON struct:
{
"responseData":
{
"language":"es",
"isReliable":false,
"confidence":0.08829542
},
"responseDetails": null,
"responseStatus": 200
}
If the responseStatus is 200, then it worked. responseData->language contains the found language, and responseData->isReliable/responseData/confidence describe how sure Google is that the language found is actually correct. The larger the text, the easier it is to find out of course. In this case, although the confidence is low, the language is guessed correctly: es, for Spanish.
Now we have the language, we can use another web service from Google to translate the text from the guessed language to our target language which in my case is English. This Translate service wants the text and a language pair for translations. Google suggests you add a key, and an userip, but this is not strictly necessary. The language pair has the format source-language-code|destination-language-code; which is in our case es|en. The service is again very simple to use as you can see in this example. It boils again down to requesting an URL, such as: https://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=Hola, mi amigo!&langpair=es|en
It returns the following JSON struct:
{
"responseData":
{
"translatedText":"Hello, my friend!"
},
"responseDetails": null,
"responseStatus": 200
}
If the responseStatus is 200, then it worked and responseData->translatedText contains the translated text.
A screenshot shows the translation feature of Haunt in action:
You can find the implementation here. Haunt also has a project page.
Update
A reader of this article, Jan, pointed out that you don't actually have to do the translation in two steps. If you simply leave out the language before the | when you pass it to language/translate, then the web service will automatically guess the original language. This URL demonstrates this: https://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=Hola, mi amigo!&langpair=|no
This translates the input text "Hola, mi amigo!" to Norwegian (language code no). The returned JSON struct has an extra element in this case too:
{
"responseData":
{
"translatedText":"Hei, min venn!",
"detectedSourceLanguage":"es"
},
"responseDetails": null,
"responseStatus": 200
}
The responseData->detectedSourceLanguage element shows which language Google thought the original text was in (es in our case). It does not however state its confidence level. I've also updated Haunt.
Comments
There are already tools for that http://www.aboutonlinetips.com/twitter-translation-tools/
You can skip the language detection step and let Google do it itself: http://code.google.com/apis/language/translate/v1/using_rest_translate.html#required_args
"To use the auto-detect source feature, leave off the source language and only specify the vertical bar followed by the destination langauge (sic) as in: langpair=%7Ces."
PS: "Please follow the reStructured Text format"? A quick overview or a link to said format would be nice.
Great work. There are lots people who suffers the same problem.
@Jan!: Ah, I didn't know that. I'll update the code (and article). As for "reStructured Text", I've added a link to it that should show up with all new posts (and updated posts).
@umpirsky: None of those work from within my twitter client though...
Of course, when you skip the step you don't get the confidence-level in detected language. You might want to decide if you want to translate or not based on a confidence-threshold.
Life Line
I've finished reading Children of Memory, the third book in the series.
Another interesting take on forms of intelligent life.
A fourth one is going to get released later this year.
Updated a post_box, a beauty shop, and a restaurant; Confirmed 2 clothes shops, 2 pet shops, and a restaurant
I walked 5.9km in 1h40m39s
Updated a bicycle_parking
Updated 2 waste_baskets
I walked 7.9km in 1h37m12s
Created 3 waste_baskets; Updated 3 bus_stops, 2 benches, and 2 waste_baskets
I walked 8.1km in 1h25m53s
I walked 1.2km in 9m31s
I walked 9.4km in 1h39m05s
Merge branch 'xdebug_3_5'
Merged pull request #1071
Fixed issue #2411: Native Path Mapping is not applied to the initial …
Created 2 waste_baskets; Updated 3 waste_baskets, 2 benches, and 2 other objects; Deleted a waste_basket
I walked 7.9km in 1h45m36s
RE: https://phpc.social/@phpc_tv/116274041642323081
Now that phpc.tv and phpc.social are part of the same umbrella, I've upped my yearly contributions to their Open Collective: https://opencollective.com/phpcommunity/projects/phpc-social
Merge branch 'xdebug_3_5'
Merged pull request #1070
I walked 7.2km in 1h10m26s
Fixed issue #2405: Handle minimum path in .xdebug directory discovery
I've published a new blog post: "Human Creations", on the difference in content generation by LLMs, and the creation of text, art and code by humans.
You can find it at https://derickrethans.nl/human-creations.html or at @blog
I walked 7.8km in 1h38m32s
RE: https://phpc.social/@afilina/116274024588235234
It's good to see that more and more people are realising that the Web can be for-good, without all the enshittification.
That's why I'm happy to see endeavours like phpc.tv springing up, and helping out where I can.
Taking back the control of how the Web is for people, by people, without big tech making it all shit.
Created a waste_basket; Updated 5 crossings and a bicycle_parking
I walked 10.7km in 2h35m10s


Shortlink
This article has a short URL available: https://drck.me/tt-8hx