Wednesday, October 2, 2013

Additional Language Resources For The Everyday Analyst

Ethnologue is one of additional language resources mentioned in this article

To conclude the three-part Linguistics blog saga, preceded by the Top 11 Online Language Learning Resources and the Top 10 Online Translation Services, here are a few excellent language resources for the intelligence analyst!
  • Linguist or not, this one is important! The Ethnologue is the international authority on living languages, maintained by SIL International, a Christian linguistics group that originally founded the site in 1951 to translate bibles into local (and lesser-known) languages. Oh, how the site has grown since then! Search by language, search by region or search by country and you will find the full linguistic breakdown of almost any geographic area.
  • Note: If you're interested in language mapping, World GeoDatasets provides the World Language Mapping System, the most current and up to date maps and GIS information on international language distribution. It is a collaboration between Global Mapping International and Ethnologue, but be warned, it is not cheap! 
  • The World Atlas of Language Structures (WALS) is a comprehensive collection of linguistic sources searchable by language (every language you could possibly think of) or by language feature. You can search for Arabic, for example, (but which one? - this site lists 21 different dialects!) or something a bit more interesting, like Achuar or Waropen (they exist, I promise, and they aren't the strangest languages on this website). You could also search for language features like Subject-Verb-Object (SVO) word order, for example, or optional double negation. Either way, your search output will be a comprehensive list of scholarly sources written and published about the language you select.
  • Omniglot is my third favorite online language resource. It is an encyclopedia of writing systems and languages. For many different languages (yes, all the strange languages you just encountered in the WALS, plus Tengwar (!), J. R. R. Tolkien's Elvish language), it provides the alphabet and the International Phonetic Alphabet (IPA) symbols. In addition to phonetic and transcription information, Omniglot provides a list of links that pertain to information about the language such as resources for learning the language and/or the writing system it employs. 
  •  Voice of America's Pronunciation Guide is ideal if you are trying to learn how to pronounce foreign names and places correctly.  It won't help much with words but if you want to meet the standard for intelligence briefings, you have to know how to say the places and people correctly and the VOA takes away your last excuse.

Monday, September 30, 2013

Top 10 Free Online Translation Services

Ed. Note:  Since last week's language learning resource blog post was so well-received, consider this a second linguistically-inspired and - hopefully - equally well-received post (of which, I have no doubt, there will be many more). 

The reality of the situation is that this is what the world looks like. We live in an increasingly multilingual society and, as analysts, this inarguably affects our jobs. Daily. 

Figure 1: Languages of the World - Source: This wonderful data visualization
So in light of this increasingly multilingual operating environment, how do we make our jobs easier? 

Below is the answer (an answer, at least) to this question: A compendium of (free!) online machine translation services rank ordered by way of a little linguistics experiment!

Top 10 Online Translation Services
(rank ordered by number of languages translated and evaluated on a five point scale in terms of error rate - 5/5 is best)

1. Google Translate. 3/5
  • I would be remiss in this post if I did not include the infamous Google Translate (for all the flack it gets, I really don't think they are too horrible). And besides, where else do you plan to translate from Azerbaijani to English? With 72 languages, they have the largest repository of the online translation services, but... then again, it's Google.
2. Translator. 3/5
  • The translator is a close second with 53 translation languages, Icelandic and Maltese among them. The only downside is that there is a 300 character limit to translation so, if you're planning on dumping paragraphs into the translator (which you really shouldn't do anyway), this one is probably not for you. 
3. Bing3/5
  • The Bing translator maintains a 44 language repository with a user interface (UI) as appealing as Google's, and the recent buzzing in the blogosphere gives Bing the edge over Google when it comes to translation services. 
4. Free Translation. 3/5
  • Free Translation (using SDL translation services) translates from 41 languages (Bengali among them!), but only into the big five (English, Spanish, Italian, French and Portugese).
5. BabelXl. 3/5
  • Babelxl translates 36 languages.
  • This one is my personal favorite! As an avid linguist and frequent translator, I consider to be the most exciting find of the day. This site translates both in and out of 35 languages across many different search engines! Your output is the same translation provided by SYSTRAN, linguatec, PROMPT and others. This allows you to compare translations for greater comprehension and accuracy. Also, the language repository includes unexpected languages such as Breton (go ahead ... look it up), EsperantoKazakh and Occitan.
7. Babylon. 3/5
8. SYSTRANet. 0/5
  • A little history: SYSTRAN is one of the oldest machine translation software (dating back to 1968) and, what do you know, they now have an online translator SYSTRANet. It translates out of 15 languages but only into the big 5. Google used SYSTRAN until circa 2007 and it is the current translation software behind the dashboard translator app for Mac OS X. 
9. Babelfish. 4/5
  • Babelfish (not to be confused with Yahoo! Babel Fish, which is now the Bing translator) translates 14 languages both ways. Again, the 300 character limit applies.
10. Reverso. 2/5
  • Reverso is an interesting site. It makes up for a clunky UI and limited language capacity (9 languages) with the option to have your full translation read to you in the target language simply by clicking a button. 
A Linguistics Experiment

Just in case you were wondering, I didn't arbitrarily invent the number out of five (after all, who am I to judge; as I said before, I use Google Translate...)

In order to evaluate the 10 translation services listed above, I took a snippet from an article written in Spanish and plugged it in to all 10 translation engines. Below are the resulting ratings out of five calculated by subtracting the number of errors from five.

This was a simple example and should in no way preclude you from trying all of these translation services at some point to determine your personal favorites, but based on this experiment, Babelfish returned the best translation.
This is what was translated into English: "Lingüistas de la Universidad de Glasgow han demostrado de que ver la televisión de manera activa puede cambiar rápidamente un determinado acento. Tal y como publican en la revista especializada Language, sus conclusiones se basan en el análisis de los efectos de la telenovela británica EastEnders, emitida por la cadena británica BBC, sobre el modo de hablar de los escoceses."
Common errors included:
  • Failing to translate the infinitive Spanish verb "ver" into the gerund English equivalent watching (as opposed to watch). 
  • Incorrect syntactical word order of adverbs in the first sentence such as actively ("de manera activa") and quickly ("rapidamente"). 
  • No service except for Google Translate managed to successfully translate "revista especializada" to journal as opposed to specialized magazine.
  • Mistranslation of the past participle "emitida" to various other English connotations such as emitted or issued.
  • No service successfully translated the final clause "sobre el modo de hablar de los escoceces," which means "about the Scottish way of speaking" or, more colloquially, "about the Scottish accent." Common translations were: "the way to speak of the Scotts," "the way of speaking about the Scotts," and "how to talk about the Scotts." This is because in this clause, "de" can be interpreted to mean either "of" (correct) or "about" (incorrect); so this error was due to a mistranslation of the preposition.
**Note: Most U.S.-based translation services translate English-Spanish better than other languages because that is the language combination historically in the most demand within the United States. In short, the more obscure the language, the less accurate the translation. This translation experiment is probably a best case scenario.

For those interested in reading the article from which the sample text came, it is from the Muy Interesante (the New Science Magazine of Latin America) titled Ver la television puede cambiar tu acento! (Watching TV Can Change Your Accent). 

Don't speak Spanish? Use one of your new-found translation services!

Know of a trusted translation service not listed here? Leave a comment!

And finally, to conclude this linguistics saga of blog posts, check back on Wednesday for a post containing other language-related resources particularly relevant to analysis!