✥﹤┈ Blog Essay- A multilingual world and its plurilingual population. ┈﹥✥

The online world is populated by users of many different linguistic backgrounds and practices. This blog essay will begin to explore the multiplicity of languages online, the popularity of languages online and the technological linguistic issues faced by developers of multilingual websites, as well as briefly covering and exploring some solutions to these challenges.

To begin I must offer a definition to those of you whose first time it is hearing the terms multilingualism and plurilingualism respectively, as well as explain the difference between the two terms, as this will be integral for us to understand the contents of this topic. We can explain multilingualism as the state of multiple languages existing alongside each other in a particular geographical location or within an organisation/institute. The term multilingual is often also used commonly to refer to individuals who are proficient in multiple languages. Some sources state that plurilingual is a more suitable term for individuals who speak many languages, as plurilingualism refers to people who are competent enough in many languages to be able to switch between them without issue. Therefore, in some locations you will find multilingualism and plurilingualism being used interchangeably or as synonyms, while in others you will see that there is a distinction in meaning being made between the two terms; with multilingual being used for inanimate things and plurilingual referring to human capability in terms of language use. For the sake of consistency and clarity in this essay I will use the latter distinction throughout the remainder of this essay.  

It is undeniable that from the online world we can discern a lingua franca- English. This language has been used as a lingua franca in the physical world since globalisation began to first take root, and the advent of the internet allowed English to gain a new foothold in intercultural and international communications online. The majority of resources and websites found online are written in English- well more accurately, catered to English speakers; the websites themselves are mostly written in HTML. Thankfully English does not have a total monopoly of online spaces, and of course many other languages exist online in various forms. Indeed, it is common for many notable websites to offer multiple language choices to its users, whether for public appeal or to offer clarity for users.  

Internet users. Source: Internet World Stats

Take Wikipedia for example- there exist many variations of the site that have evolved in parallel to accommodate multiple languages.  

(It is also worth briefly mentioning that international organisations also use multiple languages both within their organisation and online resources, some of the most notable examples being the European Union and the United Nations.)  

Some issues faced by multilingual technologies attempting to accommodate plurilingual users with various languages are word sense disambiguation and entity linking. We will look at and explain these issues respectively and discuss the steps taken by certain technologies to rectify them. Entity linking, in simple terms, refers to the process of linking a named entity that appears within various texts to a database with corresponding entries. Word sense disambiguation, again in simple terms, refers to the issue of figuring out which sense of a word is being used, in the instance that a word being used has multiple senses.  

To further discuss the issue of entity linking, I will expand on why it is an issue in the technological world. Entity linking is a complex challenge as there are many entities online that have the same name. For example, there are two products referred to as ‘Dove’ by the public; one which is Dove toiletries and another which is Dove chocolate. This is an example of polysemy, which means that one single entity name may be shared by many different entities, leading to confusion. Another issue with entity linking is that one entity may be referred to by different names or aliases- to return to our previous example of Dove chocolate, which is in fact sold under the name ‘Galaxy’ in the British Isles. The chocolate is the same and manufactured by the same American-owned company but has two different names for marketing purposes. The same idea can be applied to humans, particularly public figures. This ambiguity can lead to accuracy issues in many areas, particularly in data base information retrieval, such as search engine results which may produce false positives and/or false negatives. When we consider the use of multiple languages online this issue becomes even more complicated. To examine this further, I will look at how these issues affect popular database Wikipedia.

As a free online encyclopaedia, Wikipedia gets quite a lot of online traffic from internet browsers across many languages and nationalities. Wikipedia itself is a semi-structured database that uses knowledge bases that are built to map URIs (uniform resource identifiers) one-to-one. As it is such a large and extensive database used by many people who speak many languages, entity linking and word sense disambiguation are indeed important considerations for Wikipedia.

Graph Based entity linking. Source: Strise Platform

There are a few approaches to entity linking that have been created, some of which will be named as follows- 

  • Mathematical methods 
  • Graphing methods  
  • Text-based methods 

Let’s briefly talk about one such approach- text-based. Some scholars have suggested a ranking method via a two-step algorithm. Entities would be linked together within a certain knowledge base, with the best option being picked by the linguistic based program. This would succeed in narrowing down options and bringing clarity to named entity searches.  

A word sense disambiguation resolution, as discussed by Wu, He and Hu in their 2018 paper, can be seen in a corpus clustering method. This involves dividing words by properties and then by clarifying and assigning meaning to any newcoming additions to the corpus. This is useful as it can be used by non sense-annotated corpora- in other words entries in these corpora do not need to be given specific sense identifiers.  

These are examples of possible solutions to a few common issues found in the online plurilingual and multilingual world. We have briefly covered the most common issues that websites and online resources face in an ever changing global space- I hope that any of you reading this will find the links within the post useful to further read on the topics we have discussed here.  

Bibliography:

B. Hachey, W. Radford, J. Nothman, M. Honnibal, J. R. Curran. “Evaluating Entity Linking with Wikipedia.” Artificial Intelligence,
vol. 194, pp. 130-150, 2013. https://www.sciencedirect.com/science/article/pii/S0004370212000446. Accessed 3 January 2023

G. Wu, Y. He and X. Hu, “Entity Linking: An Issue to Extract Corresponding Entity With Knowledge Base.” IEEE Access, vol. 6, pp. 6220-6231, 2018, https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8246707. Accessed 16 December 2022.

H. Zhang, Y. Wu and Z. Xie. “Diversity or Division: Language Choices on International Organizations’ Official Websites,” IEEE Transactions on Professional Communication, vol. 63, no. 2, pp. 139-154, 2020, https://ieeexplore-ieee-org.ucc.idm.oclc.org/document/9099942. Accessed 18 December 2022.

Ofelia García, Ricardo Otheguy. “Plurilingualism and translanguaging: commonalities and divergences,” International Journal of Bilingual Education and Bilingualism, 2020, vol. 23, no. 1, pp. 17–35, https://www-tandfonline-com.ucc.idm.oclc.org/doi/full/10.1080/13670050.2019.1598932. Accessed 18 December 2022.

Unknown, “Internet World Users by Language.” Internet World Stats, Miniwatts Marketing Group, 2020, https://www.internetworldstats.com/stats7.htm. Accessed 17 December 2022.


Leave a Reply

Your email address will not be published. Required fields are marked *

css.php