Post

Comparing Monolingual and Bilingual Social Robots as Conversational Practice Companions in Language Learning

Authors

  • Alireza M. Kamelabad
    Division of Speech, Music and Hearing
    KTH Royal Institute of Technology
    Stockholm, Sweden ORCID: 0000-0001-7890-936X

  • Elin Inoue
    KTH Royal Institute of Technology
    Stockholm, Sweden
    Email: einoue@kth.se

  • Gabriel Skantze
    Division of Speech, Music and Hearing
    KTH Royal Institute of Technology
    Stockholm, Sweden
    ORCID: 0000-0002-8579-1790

Abstract

This study explores the impact of monolingual and bilingual robots in Robot-Assisted Language Learning (RALL) for non-native Swedish learners. In a within-group design, 47 participants interacted with a social robot under two conditions: a monolingual robot that communicated exclusively in Swedish and a bilingual robot capable of switching between Swedish and English. Each participant engaged in multiple role-play scenarios designed to match their language proficiency levels, and their experiences were assessed through surveys and behavioral data. The results show that the bilingual robot was generally favored by participants, leading to a more relaxed, enjoyable experience. The perceived learning was improved at the end of the experiment regardless of the condition. These findings suggest that incorporating bilingual support in language-learning robots may enhance user engagement and effectiveness, particularly for lower-proficiency learners.

Introduction

Mastery of a language involves two essential components: comprehension and production. Comprehension refers to the ability to understand language when receiving input, while production involves converting thoughts into spoken or written language, with speaking being the primary mode of language production. Consequently, conversational practice is a critical aspect of learning a new language.

In traditional language instruction, teachers typically employ one of two strategies. The first strategy involves using only the target language for all aspects of instruction, including grammar explanations, error correction, vocabulary introduction, and conversational practice. The second strategy incorporates the learner’s native or a fluent second language (L2) to varying extents, especially to explain more complex concepts or clarify meanings. The literature on language pedagogy has thus extensively examined these two approaches: monolingual teaching and translanguaging. Both methods have demonstrated benefits and drawbacks, with research suggesting that the effectiveness of each largely depends on the specific needs and characteristics of the learner (Cenoz & Gorter, 2020; Cummins, 2007; Crawford, 2004; Araujo, 2024; Pulinx et al., 2017).

Experiment setup Figure 1: Experiment setup

While these strategies are well-studied in traditional language pedagogy, their application within Human-Robot Interaction (HRI) remains underexplored (missing reference). A significant portion of Robot-Assisted Language Learning (RALL) research has focused on exclusively using L2 or mainly L1 with a limited introduction of L2 vocabulary. To make RALL systems as natural and effective as possible, it is crucial to assess various dimensions of language instruction in this new context. This is particularly important given the diverse environments language learners encounter. For example, some learners study in classrooms in their home countries, where teachers often share the same native language, making translanguaging a feasible option (missing reference). In contrast, refugees or migrants may attend language classes in their target country, where a shared native language is unlikely, thus a monolingual approach is often used by educators (missing reference).

The introduction of social robots into language learning environments offers new possibilities for flexibility in instructional approaches. Recent advancements in Artificial Intelligence (AI) and the emergence of multilingual large language models (LLMs) enable the development of RALL systems that can accommodate multiple languages simultaneously (missing reference). Consequently, understanding user experience and learner attitudes toward these systems becomes increasingly important.

This study seeks to investigate the impact of monolingual and bilingual approaches in RALL, particularly focusing on the learner’s experience and perception of each approach when deployed on a robot. We developed a conversational practice algorithm for the Furhat robot designed for Swedish learners, offering two versions: one monolingual and one bilingual. These systems were evaluated through a within-subject user study, where participants with varying levels of Swedish proficiency interacted with both versions of the robot.

Research Questions

  1. What are the differences in users’ experience and learning expectations between monolingual and bilingual social robots in a language learning context?
  2. Are there any differences in the use of the bilingual feature between participants with varying L2 proficiency levels?

Background

Translanguaging Pedagogy

Translanguaging pedagogy in language learning refers to an instructional strategy where students draw from their entire linguistic repertoire to learn and use multiple languages, rather than keeping languages rigidly separated (Cenoz & Gorter, 2020). In a translanguaging conversational practice task, a teacher might engage students in a dialogue where they discuss a topic in the target language (e.g., Swedish) but allows them to switch to their native language (e.g., Persian) to clarify complex ideas or express something challenging. For example, a student could explain a cultural tradition in Swedish but switch to Persian when describing nuanced details, helping them maintain the flow of conversation while deepening their understanding of both languages.

Translanguaging pedagogy, which allows for the flexible use of both L1 and L2 in the classroom, has been shown to support learners by scaffolding comprehension and facilitating interaction. Multiple studies suggest that while lower proficiency learners benefit more from using L1 as a bridge to understand L2, higher proficiency learners tend to rely predominantly on L2 (Lo, 2015; De La Campa & Nassaji, 2009). The selective use of L1, particularly for managing classroom interaction and explaining complex concepts, engages students and enhances their overall learning experience (Bartlett, 2017; Hall & Cook, 2012).

Learner preferences further support the incorporation of L1 in the classroom, with several studies highlighting that L1 use makes the learning process more meaningful and enjoyable (Brooks-Lewis, 2009; Kucukali, 2021). This strategy not only boosts cognitive engagement but also contributes to creating a supportive environment, helping to mitigate language anxiety (Dryden et al., 2021). In (Han & Park, 2017), the bilingual method was shown to be more effective and preferred, with the monolingual group expressing negative views on target-language-only instructions. However, some research cautions that excessive reliance on L1 could hinder L2 immersion, suggesting that a balanced approach is necessary to optimize learning outcomes (Araujo, 2024; Zulfikar, 2019).

The role of the L1 in L2 classrooms has been a central debate in second language education. Early perspectives, shaped by approaches like Communicative Language Teaching (CLT), often advocated for minimizing or even eliminating L1 use to maximize L2 exposure, arguing that high-quality and high-quantity L2 input fosters language acquisition (Duff & Polio, 1990). However, this strict L2-only ideology has faced significant scrutiny, with recent research challenging the notion that completely excluding L1 is necessary or effective. In fact, various studies suggest that strategic L1 use can facilitate L2 learning, especially in specific areas like vocabulary acquisition (Zhao & Macaro, 2016; Tian & Macaro, 2012).

Proponents of a more flexible approach argue that L1 can play a beneficial role in the classroom. For instance, using L1 to explain difficult concepts or provide translations can help learners form clearer mental representations of L2 words and concepts (Macaro, 2009). Moreover, codeswitching—the alternation between L1 and L2—occurs naturally among bilinguals and can serve as an effective communication and learning strategy in formal education (Raschka et al., 2009). Several studies have demonstrated that L1 use, when employed strategically, can enhance vocabulary retention and comprehension, particularly among lower-proficiency learners (Lee & Levine, 2020).

Educator perspectives are also shifting towards favoring translanguaging practices, recognizing the benefits of crosslinguistic pedagogy in improving classroom dynamics and fostering inclusivity (Woll, 2020). Overall, the literature points to translanguaging as a valuable tool that, when used strategically, can improve both emotional and cognitive outcomes for language learners.

Robot-Assisted Language Learning

Robot-Assisted Language Learning (RALL) presents a unique opportunity to integrate translanguaging in language education, especially since social robots have the capacity to interact in multiple languages (van den Berghe, 2022). The benefits of social robots in language learning contexts are already well-documented, particularly their ability to enhance motivation and engagement due to their physical and social presence (Belpaeme et al., 2018; Lee & Lee, 2022). However, few RALL studies have implemented translanguaging strategies effectively. A review of 83 RALL studies found that the majority of interactions were either exclusively in the L1 or L2, and only a small number of studies utilized a mixed approach where both languages were employed strategically (van den Berghe, 2022).

A review by (van den Berghe, 2022) categorizes existing studies in RALL based on the balance of L1 and L2 usage during student-robot interactions. Studies such as those by (Saerbeck et al., 2010; Tanaka & Matsuzoe, 2012) fall under the “Student-robot interaction completely in L1; target vocabulary (or structures) in L2” category, where robots interact primarily in the L1 with only key vocabulary presented in the L2, often targeting novice learners. Similarly, studies like (missing reference) are examples of a “Mix of L1 and L2 in student-robot interaction,” where robots use a bilingual approach, such as having the robot speak in one language while a teacher uses another. In contrast, the majority of studies (Kamelabad & Skantze, 2023; Wang et al., 2013; Kwon et al., 2010) use “Student-robot interaction completely in L2,” implementing an immersion approach where only the target language (L2) is used during interactions.

Although the majority of studies employ this monolingual strategy, our research aligns with the less common category of “Student-robot interaction mostly in L2; some support in L1,” which has not been explored much before, with a few exceptions (You et al., 2006; Tanaka et al., 2015). This distinction is significant given that the literature on translanguaging pedagogy prominently advocates for the benefits of a bilingual approach, suggesting that L1 support can enhance L2 learning, as seen in broader educational research (García & Wei, 2015).

Integrating translanguaging in RALL holds the potential to address challenges that traditional language education faces, such as the teacher’s lack of proficiency in students’ home languages. By using robots capable of switching languages, students can receive L1 support in learning new L2 vocabulary or concepts, thereby enhancing comprehension and retention (Leeuwestein et al., 2021). The potential for robots to serve as multilingual agents in education presents an exciting avenue for future research, particularly in promoting inclusivity and improving learning outcomes for linguistically diverse students (Kim et al., 2021; Özcan et al., 2014).

System

We implemented a fully autonomous conversational robot-assisted language learning system, focusing on verbal interaction with users (Beer et al., 2014). Two versions of the system were developed: a monolingual robot that communicated exclusively in Swedish and a bilingual robot that could converse in both Swedish and English. Both versions facilitated conversation practice through role-play exercises, providing real-time feedback to users on their spoken input. The Furhat social robot (Al Moubayed et al., 2012), equipped with an animated back-projected face, was used to create a human-like interaction environment that simulates natural verbal exchanges.

To enhance the quality of Automatic Speech Recognition (ASR), an external microphone was used during interactions. The system used the Google Cloud ASR service to convert users’ speech into text. For synthesizing the robot’s speech, the Amazon Polly Text-to-Speech (TTS) service with the “Elin-Neural” voice was selected due to its natural-sounding output in both Swedish and English.

The interaction began with the robot initiating a role-play scenario, where the user was prompted to respond verbally. Each user utterance was processed by a modular pipeline. The system first detected the language of the user’s input using the OpenAI GPT-4o model, which classified the input as either Swedish or not Swedish. In the bilingual version, this language detection step was essential for determining when to offer support in English. The GPT-4o model then analyzed the input for grammatical and contextual correctness, classifying it as either “strange,” “incorrect,” or “correct.” The “strange” tag was used for grammatically correct but semantically odd utterances, often arising from potential ASR recognition errors. For incorrect utterances, GPT-4o generated the proper form in Swedish, which the robot used to provide feedback to the user. All prompts for the language model can be found at (M. Kamelabad et al., 2025).

Feedback was provided in a semi-implicit manner to handle potential inaccuracies in ASR output. The robot repeated what it “heard” before delivering corrections, minimizing the risk of falsely identifying user mistakes due to ASR errors. For example, if the user said, “I’m sick last week,” the robot would respond, “I heard, ‘I’m sick last week.’ Did you mean to say, ‘I was sick last week’?” This approach allowed users to modify their utterances without feeling accused of making mistakes. The system refrained from giving feedback on utterances longer than 14 words to maintain the natural flow of conversation. Additionally, after providing feedback, the system temporarily ignored further mistakes for three subsequent exchanges to prevent users from getting stuck in a feedback loop.

In the bilingual system, the robot used English selectively. English was employed to introduce tasks, explain and clarify concepts, give feedback, and provide praise. This use of English was modeled on second language teaching practices, aiming to make explanations more comprehensible and reduce user anxiety. However, during role-play scenarios, the robot primarily communicated in Swedish to encourage users to practice the target language.

To support bilingual speech synthesis, the system inserted Speech Synthesis Markup Language (SSML) tags into the text to facilitate language switches between Swedish and English. The Lingua library analyzed the language of each word, and English words were enclosed in SSML tags. The robot’s speech rate was adapted to the user’s self-reported Swedish proficiency, speaking slower for lower-level learners to ensure comprehension.

The language model and TTS processing time resulted in response delays of about 3.5 seconds. To improve turn-taking, the robot’s LED ring turned blue to indicate it was listening and switched to a soft white pulse while processing the input, signaling to the user that the robot was “thinking.” These indicators were crucial for maintaining user engagement and conveying that the system was actively processing their responses.

Methodology

The experiment was designed to evaluate the effectiveness of the monolingual and bilingual robot-assisted language learning systems. A within-group design was adopted to minimize the influence of individual differences such as language proficiency or attitudes towards technology. Each participant interacted with both the monolingual and bilingual systems, allowing for direct comparisons in user experience and perceived learning outcomes across conditions.

Participants

The study involved 47 non-native Swedish learners ranging in proficiency from A1 to C1, according to the Common European Framework of Reference for Languages (CEFR). Participants were recruited to reflect a diverse spectrum of language proficiency, ensuring generalizability across skill levels. Of the participants, 25 were female (53%), 21 were male (45%), and 1 identified as non-binary (2%). Ages ranged from 17 to 53 years, with a median age of 27.

Participants represented a variety of native languages, with German being the most common (5 participants). The amount of time spent studying Swedish varied from 0 to 60 months, with an average of 17.4 months. A majority (60%) reported attending or having attended formal Swedish courses, and 57% used mobile applications to aid their learning. Most participants were more proficient in English than Swedish, although proficiency levels in Swedish varied: 23% at A1, 21% at A2, 26% at B1, 21% at B2, and 9% at C1.

Experiment Setup & Procedure

The experiment was conducted in a controlled environment designed to offer participants a comfortable and private space. The Furhat robot was positioned directly in front of the participants, and a tablet was placed on the table for interaction control. Participants could not see the experimenter, creating a natural setting for language practice. Experiment Setup

The tablet allowed participants to switch languages (English/Swedish) in the bilingual condition, terminate the session, and complete surveys. A minimum interaction period of one minute was enforced before sessions could be terminated. Audio from all interactions was recorded and logged for analysis.

The robot, introduced as “Astrid,” explained the experiment rules and scenarios. Participants alternated between monolingual and bilingual conversational sessions. Each session lasted a maximum of 7.5 minutes, with participants able to end sessions early using the tablet. Unused session time was carried over, allowing for additional alternating sessions while ensuring total interaction time did not exceed 30 minutes.

Role-play scenarios were tailored to participants’ Swedish proficiency. Beginner-level participants (A1) engaged in simple tasks like ordering food at a restaurant, while advanced-level participants (B2 and C1) encountered complex scenarios like job interviews or trip planning. Interaction order and scenarios were randomized to minimize order effects.

Measures

Demographics

Participants completed a registration form at the start of the experiment, providing demographic information and language background details. Information collected included:

  • Native language(s)
  • Proficiency in Swedish and English
  • Time spent studying Swedish
  • Methods used for learning Swedish
  • General demographic details (age, gender, occupation)

Condition Perception Survey

Participants completed a survey assessing their experiences with the monolingual and bilingual robot conditions. The survey was administered four times: after the first and final sessions with each robot. It included eight Likert-scale questions and one binary question to evaluate perceived learning, the helpfulness of explanations, emotional responses, and overall satisfaction. This allowed comparisons of participant feedback at different points in the experiment and across conditions.

Survey questions included:

  • learnt: “How much did you learn?”
  • future_learning: “How much do you think your Swedish would improve if you practiced more with this version of the robot?”
  • explanations_help: “Did the robot’s explanations help you understand?”
  • discouraging_comforting: “Discouraging-Comforting”
  • frustrating_enjoyable: “Frustrating-Enjoyable”
  • demanding_effortless: “Demanding-Effortless”
  • anxious_relaxed: “Anxious-Relaxed”
  • experience: “How would you rate your overall experience during this interaction?”
  • willingness (binary): “Would you choose to practice Swedish with this version of the robot if it was available?”

Godspeed

Participants also completed a Godspeed questionnaire (Bartneck et al., 2009), commonly used in human-robot interaction studies to assess:

  • Anthropomorphism
  • Likeability
  • Perceived intelligence
  • Perceived safety

This data provided a general view of how participants perceived the robot and allowed comparisons with other studies, rather than being used to compare conditions.

Results

We performed our analysis using R 4.4.0 (R Core Team, 2024), with the data and analysis scripts made available (anonymized). We adhered to the conventional significance level of ( \alpha = 0.05 ), and all reported Confidence Intervals (CIs) are at 95%.

Survey Data Analysis

The eight Likert-scale questions were initially designed with two overarching themes: perceived learning and general experience. Before running separate analyses on each question, we aimed to capture the general construct behind the answers to better understand overarching effects. To achieve this, we conducted a correlation analysis to explore relationships between the survey items.

Principal Component Analysis (PCA)

We began with a Principal Component Analysis (PCA) of the survey data. The PCA revealed that:

  • Dim.1 explained 54.4% of the variance.
  • Dim.2 and Dim.3 explained an additional 16.0% and 9.2% of the variance, respectively.

To interpret the data more meaningfully, we applied a Promax rotation, resulting in three Rotated Components (RCs):

  • RC1 (Positive Experience/Comfort):
    • Explained 37.6% of the variance.
    • Captured participants’ overall experience with the robot.
    • Included strong loadings from variables such as:
      • discouraging_comforting
      • demanding_effortless
      • frustrating_enjoyable
      • experience
      • explanations_help.
    • High RC1 scores indicate a positive, comfortable, and enjoyable interaction.
  • RC2 (Learning and Understanding):
    • Explained 26.7% of the variance.
    • Related to participants’ perceived learning and understanding during the interaction.
    • Included strong loadings from variables such as:
      • future_learning
      • learnt.
  • RC3 (Stress/Relaxation):
    • Explained 12.4% of the variance.
    • Focused on participants’ emotional state, particularly relaxation versus anxiety.
    • anxious_relaxed had the strongest loading on this component.

Aggregated Scores

From these components, we derived aggregated scores for:

  • Experience
  • Perceived Learning
  • Relaxation

These scores provided a broader understanding of participant experiences beyond analyzing each question independently. Additionally, separate analyses were performed for each of the eight survey questions to identify specific differences between the monolingual and bilingual conditions.

Mixed-Effects Models

To analyze the aggregated scores, we fitted linear mixed-effects models to examine the effects of:

  1. Condition: Monolingual vs. Bilingual.
  2. Phase: Start vs. End of the interaction.

Random intercepts were included for participant ID. Since RC3 (Stress/Relaxation) correlated strongly with only one variable (anxious_relaxed), we used participants’ actual responses to this question rather than the RC-aggregated scores. The detailed results of these models are presented in Table 1.

**Table 1**: *Results of Linear Mixed Models for User Experience and Learning.* | **Model** | **Effect** | **β** | **SE** | **p** | |---------------------|----------------------------|--------------------------|----------|-------------| | **User Experience** | Intercept | \(-2.78 \times 10^{-16}\) | 0.119 | 1.000 | | | Condition (Monolingual) | -0.125 | 0.048 | **0.011** | | | Phase (Start) | 0.013 | 0.048 | 0.786 | | | Condition × Phase | -0.077 | 0.048 | 0.116 | | **Learning** | Intercept | \(-1.81 \times 10^{-16}\) | 0.123 | 1.000 | | | Condition (Monolingual) | 0.012 | 0.045 | 0.780 | | | Phase (Start) | -0.128 | 0.045 | **0.005** | | | Condition × Phase | -0.047 | 0.045 | 0.294 |

**Note**: **β**: effect's estimate, **SE**: standard error, **p**: significance level. Significance (\(p < 0.05\)) is indicated as **bold**. The table summarizes the fixed effects from the linear mixed models for User Experience and Learning. Condition refers to the comparison between the bilingual and monolingual robot, while Phase refers to the comparison between the start and end sessions.

We conducted separate analyses for the individual questions related to this theme. Each question was analyzed using a cumulative link mixed-effects model with condition (monolingual vs. bilingual) and phase (start vs. end) as fixed effects and participant ID as a random effect. The results of these analyses are shown in Table 2.

**Table 2**: *Results for the individual questions across all dimensions.* | **Dimension** | **Row** | **Question** | **β (Condition)** | **SE (Condition)** | **p (Condition)** | **β (Phase)** | **SE (Phase)** | **p (Phase)** | |----------------------|---------|-------------------------------|--------------------|--------------------|-------------------|---------------|----------------|---------------| | **User Experience** | 1 | Overall Experience | 0.199 | 0.153 | 0.193 | -0.0003 | 0.152 | 0.998 | | | 2 | Demanding vs. Effortless | 0.427 | 0.149 | **0.004** | 0.039 | 0.144 | 0.788 | | | 3 | Frustrating vs. Enjoyable | 0.267 | 0.148 | *0.071* | -0.111 | 0.147 | 0.450 | | | 4 | Discouraging vs. Comforting | 0.255 | 0.149 | *0.086* | 0.097 | 0.148 | 0.510 | | **Perceived Learning** | 5 | Learnt | 0.011 | 0.143 | 0.937 | 0.634 | 0.153 | **<0.001** | | | 6 | Future Learning | -0.119 | 0.151 | 0.430 | 0.237 | 0.151 | 0.117 | | | 7 | Explanations Help | 0.283 | 0.143 | **0.048** | 0.077 | 0.142 | 0.589 | | **Anxious vs. Relaxed** | 8 | Anxious vs. Relaxed | 0.321 | 0.153 | **0.036** | 0.297 | 0.154 | *0.054* |

**Note**: **β**: effect's estimate, **SE**: standard error, **p**: significance level. Significance (\(p < .05\)) is indicated as **bold**, and marginal effects (\(p < .10\)) as *italic*.

User Experience

The model revealed a significant main effect of condition, ( \chi^2(1) = 6.64 ), ( p = 0.010 ), indicating that participants had significantly different experiences depending on whether they interacted with the monolingual or bilingual version of the robot. Specifically, the estimated effect of the monolingual condition was ( \beta = -0.125 ), ( SE = 0.048 ), ( t(138) = -2.58 ), ( p = 0.011 ), suggesting that the monolingual condition was associated with a lower overall experience score compared to the bilingual condition.

There was no significant main effect of phase (( \chi^2(1) = 0.07 ), ( p = 0.786 )), nor was the interaction between condition and phase significant (( \chi^2(1) = 2.50 ), ( p = 0.114 )). These results suggest that participants’ experience did not significantly differ between the start and end phases of the experiment, and the difference between conditions did not vary across the phases.

The analysis of individual questions for the User Experience dimension (Table 2, Rows 1–4) revealed:

  • Overall Experience (Row 1): No significant effects of condition or phase, indicating no notable differences between conditions or across phases.
  • Demanding vs. Effortless (Row 2): A significant effect of condition, with participants finding the bilingual condition more effortless than the monolingual condition. No significant effect of phase was observed.
  • Frustrating vs. Enjoyable (Row 3): A marginal effect of condition suggested that participants found the bilingual condition slightly more enjoyable than the monolingual condition. No significant phase effect was detected.
  • Discouraging vs. Comforting (Row 4): A marginal effect of condition indicated that participants found the bilingual condition more comforting than the monolingual condition. Phase had no significant effect.

In summary, the general trend suggested that participants found the bilingual condition more effortless, marginally more enjoyable, and comforting compared to the monolingual condition.


Perceived Learning

A linear mixed-effects model was used to analyze the perceived learning dimension (Learning), with condition (monolingual vs. bilingual) and phase (start vs. end) as fixed effects and participant ID as a random effect. The model did not reveal a significant main effect of condition (( \chi^2(1) = 0.078 ), ( p = 0.780 )), indicating that perceived learning did not significantly differ between the monolingual and bilingual conditions.

However, a significant main effect of phase was observed (( \chi^2(1) = 8.22 ), ( p = 0.004 )), suggesting that participants’ perception of their learning improved from the start to the end of the experiment (( \beta = -0.128 ), ( SE = 0.045 ), ( t(138) = -2.87 ), ( p = 0.005 )).

The analysis of individual questions (Table 2, Rows 5–7) revealed:

  • Learnt (Row 5): No significant effect of condition, but a significant main effect of phase indicated that participants perceived greater learning by the end of the experiment.
  • Future Learning (Row 6): No significant effects of condition or phase, suggesting that beliefs about potential future learning did not differ across conditions or phases.
  • Explanations Helpfulness (Row 7): A significant effect of condition showed that participants found the bilingual robot’s explanations more helpful compared to the monolingual. No significant phase effect was found.

These results emphasize the role of phase in shaping perceived learning and highlight the added benefit of the bilingual robot’s explanations.


Anxious-Relaxed

Given that the third rotated component (Relaxed) strongly correlated with the anxious-relaxed question alone, we focused on the CLMM for this individual question rather than the principal component scores. The CLMM revealed a significant effect of condition (( \beta = 0.321 ), ( SE = 0.153 ), ( z = 2.10 ), ( p = 0.036 )), indicating that participants felt more relaxed interacting with the bilingual robot.

A marginal effect of phase (( \beta = 0.297 ), ( SE = 0.154 ), ( z = 1.93 ), ( p = 0.054 )) suggested that participants tended to feel more relaxed by the end of the experiment, regardless of condition.


Use of Bilingual Feature

We calculated the mean proportional time that users spent speaking English in the bilingual condition and grouped the means by Swedish proficiency. As shown in Table 3, participants with lower proficiency (A1, A2) spent significantly more time speaking English compared to those with higher proficiency (B1, B2, C1).

**Table 3**: *Mean time proportion of English usage and standard deviation at each Swedish proficiency level.* | **Swedish Proficiency** | **Mean (%)** | **SD (%)** | |--------------------------|--------------|------------| | A1 | 8.31 | 10.21 | | A2 | 6.12 | 6.51 | | B1 | 3.18 | 5.77 | | B2 | 4.50 | 4.71 | | C1 | 0.51 | 1.01 |

Godspeed Questionnaire Results

The Godspeed Questionnaire Scale (GQS) was administered to evaluate participants’ perceptions of the robot across five key dimensions: Anthropomorphism, Animacy, Likeability, Intelligence, and Safety. As the GQS was applied at the end of the experiment to assess the entire system, it cannot be used to compare the two conditions. However, it provides a basis for comparison with other systems.

The mean scores and standard deviations for each dimension are reported in Table 4. Additionally, Cronbach’s Alpha was calculated for each category to assess the internal consistency of the items in each dimension. Most dimensions exhibited high internal consistency ((\alpha > 0.8)), except for the Safety dimension, which showed lower reliability ((\alpha = 0.42)). This suggests that participants’ perceptions of safety may have been influenced by factors beyond the questionnaire items alone.

**Table 4**: *Godspeed Questionnaire Results for Each Dimension.* | **Dimension** | **Mean** | **SD** | **Cronbach's α** | |--------------------|----------|--------|------------------| | Anthropomorphism | 3.15 | 0.93 | 0.87 | | Animacy | 3.31 | 0.90 | 0.88 | | Likeability | 3.97 | 0.69 | 0.84 | | Intelligence | 3.86 | 0.70 | 0.84 | | Safety | 3.50 | 0.63 | 0.42 |

The highest mean score was observed in the Likeability dimension (( M = 3.97 )), reflecting participants’ generally positive perception of the robot’s friendliness and pleasantness. Intelligence also received a high score (( M = 3.86 )), indicating that participants viewed the robot as competent and sensible.

Radar Chart of the GQS Responses
Figure 1: Radar Chart of the GQS Responses.

Conversely, Anthropomorphism (( M = 3.15 )) and Animacy (( M = 3.31 )) received lower scores, suggesting that while the robot was seen as somewhat lifelike, participants did not perceive it as highly humanlike. The Safety dimension had a moderate score (( M = 3.50 )), but its lower internal consistency (( \alpha = 0.42 )) indicates that participants’ perceptions of safety were less uniform across the questionnaire items.

Overall, these results suggest that the robot was generally well-received, especially in terms of Likeability and Intelligence. The moderate ratings for Anthropomorphism and Animacy imply that the robot’s design could still be perceived as more mechanical or artificial compared to more anthropomorphized systems.

Discussion

The results of this study provide insights into the role of monolingual and bilingual social robots in language learning, particularly within the context of translanguaging pedagogy in Robot-Assisted Language Learning (RALL). While both systems showed promise in facilitating conversational practice, several key differences emerged that highlight the potential benefits of a bilingual translanguaging approach in language learning.

First, the bilingual robot was consistently rated more favorably across multiple dimensions of user experience. Participants found the bilingual interaction to be more effortless, enjoyable, and comforting compared to the monolingual condition. These findings align with previous research suggesting that incorporating the learner’s native language can alleviate stress and enhance the overall learning experience, especially when dealing with complex or unfamiliar language content (Han & Park, 2017). The significant effect of condition on the “demanding-effortless” dimension supports the hypothesis that bilingual systems can reduce cognitive load during language practice by providing clarifications and explanations in a familiar language.

In terms of perceived learning, both systems were effective, with participants reporting improvements by the end of the experiment regardless of the condition. However, while no significant differences were observed between conditions in perceived learning, the bilingual robot was rated higher in terms of the helpfulness of its explanations. This suggests that the bilingual robot’s ability to switch languages allowed it to provide more effective feedback and clarification, particularly benefiting lower-proficiency learners. This finding is consistent with translanguaging pedagogy research, which emphasizes that switching between languages can make input more comprehensible and reduce learner anxiety (Dryden et al., 2021).

The significant effect of phase on perceived learning highlights an important outcome: participants’ confidence in their ability to learn from the robot increased by the end of the experiment. Initially, participants were cautious about how much their L2 speaking skills would improve through robot interaction. By the end, they expressed a stronger belief in the robot’s potential as a language learning tool, demonstrating the system’s ability to foster optimism about using conversational robots for language practice.

Another key finding is the bilingual robot’s impact on participants’ emotional states. Participants felt more relaxed interacting with the bilingual robot, which is crucial for language learning. Anxiety often hinders language acquisition by inhibiting learners’ willingness to engage in conversation. The bilingual robot’s ability to create a more relaxed environment could enhance learners’ confidence and increase their willingness to practice the target language.

The analysis of participants’ speech patterns revealed that English usage in the bilingual condition was limited. Most participants maintained the target language (L2) during role-play scenarios, suggesting that the bilingual robot’s support in English (L1) was primarily used for clarifications and feedback rather than as a crutch. This balance highlights the potential for bilingual robots to serve as effective language learning companions, particularly for lower-proficiency learners, who relied more on English as a reference. Importantly, the effects of the conditions did not vary based on proficiency levels, indicating consistent results across participants.

The Godspeed Questionnaire Scale (GQS) results indicate that the robot was generally well-received, particularly in Likeability and Intelligence. While ratings for Anthropomorphism and Animacy were lower, these aspects are less critical for the robot’s effectiveness as a language learning tool. High Likeability scores suggest participants felt comfortable interacting with the robot, fostering engagement in conversational practice. Similarly, positive Intelligence ratings reflect participants’ perception of the robot as a competent conversational partner capable of providing meaningful feedback.


Conclusion

This study explored the impact of monolingual and bilingual social robots on conversational practice for language learners. Using a within-subject design, 47 participants with varying Swedish proficiency levels interacted with both a monolingual robot and a bilingual robot. The bilingual robot was generally preferred, providing a more relaxed and enjoyable experience. Both systems positively influenced participants’ perceptions of their language learning progress by the end of the experiment.

The bilingual robot was particularly beneficial for lower-proficiency learners, who relied more on English for support. However, the effects of both conditions were consistent across proficiency levels, suggesting the system’s broad effectiveness. Notably, participants’ confidence in learning from the robot increased over time, highlighting the potential of such technology to enhance language acquisition. Future research should explore how bilingual robots can be further optimized for diverse learning environments.


References

  1. Cenoz, J., & Gorter, D. (2020). Pedagogical translanguaging: An introduction. System, 92, 102269. https://doi.org/10.1016/j.system.2020.102269
  2. Cummins, J. (2007). Rethinking monolingual instructional strategies in multilingual classrooms. Canadian Journal of Applied Linguistics, 10(2), 221–240.
  3. Crawford, J. (2004). Language choices in the foreign language classroom: Target language or the learners’ first language? Relc Journal, 35(1), 5–20.
  4. Araujo, A. B. V. de. (2024). Longitudinal study on limiting the use of L1 in L2 learning. Caderno Pedagógico, 21(7), e5920–e5920. https://doi.org/10.54033/cadpedv21n7-195
  5. Pulinx, R., Van Avermaet, P., & Agirdag, O. (2017). Silencing linguistic diversity: The extent, the determinants and consequences of the monolingual beliefs of Flemish teachers. International Journal of Bilingual Education and Bilingualism, 20(5), 542–556.
  6. Lo, Y. Y. (2015). How much L1 is too much? Teachers’ language use in response to students’ abilities and classroom interaction in Content and Language Integrated Learning. International Journal of Bilingual Education and Bilingualism, 18(3), 270–288. https://doi.org/10.1080/13670050.2014.988112
  7. De La Campa, J. C., & Nassaji, H. (2009). The Amount, Purpose, and Reasons for Using L1 in L2 Classrooms. Foreign Language Annals, 42(4), 742–759. https://doi.org/10.1111/j.1944-9720.2009.01052.x
  8. Bartlett, K. A. (2017). The use of L1 in L2 classrooms in Japan: a survey of university student preferences. Kwansei Gakuin University Humanities Review, 22, 71–80. https://research.usq.edu.au/item/q48q6/the-use-of-l1-in-l2-classrooms-in-japan-a-survey-of-university-student-preferences
  9. Hall, G., & Cook, G. (2012). Own-language use in language teaching and learning. Language Teaching, 45(3), 271–308. https://doi.org/10.1017/S0261444812000067
  10. Brooks-Lewis, K. A. (2009). Adult Learners’ Perceptions of the Incorporation of their L1 in Foreign Language Teaching and Learning. Applied Linguistics, 30(2), 216–235. https://doi.org/10.1093/applin/amn051
  11. Kucukali, E. (2021). Benefits and Issues of Translanguaging Pedagogies on Language Learning: Students’ Perspective. Null.
  12. Dryden, S., Tankosić, A., & Dovchin, S. (2021). Foreign language anxiety and translanguaging as an emotional safe space: Migrant English as a foreign language learners in Australia. System, 101, 102593. https://doi.org/10.1016/j.system.2021.102593
  13. Han, J., & Park, K. (2017). Monolingual or Bilingual Approach: The Effectiveness of Teaching Methods in Second Language Classroom. Purdue Linguistics, Literature, and Second Language Studies Conference. https://docs.lib.purdue.edu/plcc/purduelanguagesandculturesconference2017/translationalideas/2
  14. Zulfikar, Z. (2019). RETHINKING THE USE OF L1 IN L2 CLASSROOM. Englisia: Journal of Language, Education, and Humanities, 6(1), 42–51. https://doi.org/10.22373/ej.v6i1.2514
  15. Duff, P. A., & Polio, C. G. (1990). How Much Foreign Language Is There in the Foreign Language Classroom? The Modern Language Journal, 74(2), 154–166. https://doi.org/10.1111/j.1540-4781.1990.tb02561.x
  16. Zhao, T., & Macaro, E. (2016). What works better for the learning of concrete and abstract words: teachers’ L1 use or L2-only explanations? International Journal of Applied Linguistics, 26(1), 75–98. https://doi.org/10.1111/ijal.12080
  17. Tian, L., & Macaro, E. (2012). Comparing the effect of teacher codeswitching with English-only explanations on the vocabulary acquisition of Chinese university students: A Lexical Focus-on-Form study. Language Teaching Research, 16(3), 367–391. https://doi.org/10.1177/1362168812436909
  18. Macaro, E. (2009). Chapter 2. Teacher Use of Codeswitching in the Second Language Classroom: Exploring ‘Optimal’ Use. In M. Turnbull & J. Dailey-O’Cain (Eds.), First Language Use in Second and Foreign Language Learning (pp. 35–49). Multilingual Matters. https://doi.org/10.21832/9781847691972-005
  19. Raschka, C., Sercombe, P., & Chi-Ling, H. (2009). Conflicts and tensions in codeswitching in a Taiwanese EFL classroom. International Journal of Bilingual Education and Bilingualism, 12(2), 157–171. https://doi.org/10.1080/13670050802153152
  20. Lee, J. H., & Levine, G. S. (2020). The effects of instructor language choice on second language vocabulary learning and listening comprehension. Language Teaching Research, 24(2), 250–272. https://doi.org/10.1177/1362168818770910
  21. Woll, N. (2020). Towards crosslinguistic pedagogy: Demystifying pre-service teachers’ beliefs regarding the target-language-only rule. System, 92, 102275. https://doi.org/10.1016/j.system.2020.102275
  22. van den Berghe, R. (2022). Social robots in a translanguaging pedagogy: A review to identify opportunities for robot-assisted (language) learning. Frontiers in Robotics and AI, 9. https://doi.org/10.3389/frobt.2022.958624
  23. Belpaeme, T., Kennedy, J., Ramachandran, A., Scassellati, B., & Tanaka, F. (2018). Social robots for education: A review. Science Robotics, 3(21), eaat5954. https://doi.org/10.1126/scirobotics.aat5954
  24. Lee, H., & Lee, J. H. (2022). The effects of robot-assisted language learning: A meta-analysis. Educational Research Review, 35, 100425. https://doi.org/10.1016/j.edurev.2021.100425
  25. Saerbeck, M., Schut, T., Bartneck, C., & Janse, M. D. (2010). Expressive robots in education: varying the degree of social supportive behavior of a robotic tutor. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 1613–1622. https://doi.org/10.1145/1753326.1753567
  26. Tanaka, F., & Matsuzoe, S. (2012). Children teach a care-receiving robot to promote their learning: field experiments in a classroom for vocabulary learning. J. Hum.-Robot Interact., 1(1), 78–95. https://doi.org/10.5898/JHRI.1.1.Tanaka
  27. Kamelabad, A. M., & Skantze, G. (2023). I Learn Better Alone! Collaborative and Individual Word Learning With a Child and Adult Robot. Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, 368–377. https://doi.org/10.1145/3568162.3577004
  28. Wang, Y. H., Young, S. S.-C., & Jang, J.-S. R. (2013). Using Tangible Companions for Enhancing Learning English Conversation. Journal of Educational Technology & Society, 16(2), 296–309. https://www.jstor.org/stable/jeductechsoci.16.2.296
  29. Kwon, O.-H., Koo, S.-Y., Kim, Y.-G., & Kwon, D.-S. (2010). Telepresence robot system for English tutoring. 2010 IEEE Workshop on Advanced Robotics and Its Social Impacts, 152–155. https://doi.org/10.1109/ARSO.2010.5679999
  30. You, Z.-J., Shen, C.-Y., Chang, C.-W., Liu, B.-J., & Chen, G.-D. (2006). A Robot as a Teaching Assistant in an English Class. Sixth IEEE International Conference on Advanced Learning Technologies (ICALT’06), 87–91. https://doi.org/10.1109/ICALT.2006.1652373
  31. Tanaka, F., Isshiki, K., Takahashi, F., Uekusa, M., Sei, R., & Hayashi, K. (2015). Pepper learns together with children: Development of an educational application. 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), 270–275. https://doi.org/10.1109/HUMANOIDS.2015.7363546
  32. García, O., & Wei, L. (2015). Translanguaging, Bilingualism, and Bilingual Education. In The Handbook of Bilingual and Multilingual Education (pp. 223–240). John Wiley & Sons, Ltd. https://doi.org/10.1002/9781118533406.ch13
  33. Leeuwestein, H., Barking, M., Sodacı, H., Oudgenoeg-Paz, O., Verhagen, J., Vogt, P., Aarts, R., Spit, S., de Haas, M., de Wit, J., & Leseman, P. (2021). Teaching Turkish-Dutch kindergartners Dutch vocabulary with a social robot: Does the robot’s use of Turkish translations benefit children’s Dutch vocabulary learning? Journal of Computer Assisted Learning, 37(3), 603–620. https://doi.org/10.1111/jcal.12510
  34. Kim, Y., Marx, S., Pham, H. V., & Nguyen, T. (2021). Designing for robot-mediated interaction among culturally and linguistically diverse children. Educational Technology Research and Development, 69(6), 3233–3254. https://doi.org/10.1007/s11423-021-10051-2
  35. Özcan, B., Kok, N., Wallenburg, J., & Sijbrands, E. J. G. (2014). [Diabetologist 2.0; patients perform diabetic tests using a multilingual robot]. Nederlands Tijdschrift Voor Geneeskunde, 158, A8451.
  36. Beer, J. M., Fisk, A. D., & Rogers, W. A. (2014). Toward a framework for levels of robot autonomy in human-robot interaction. Journal of Human-Robot Interaction, 3(2), 74–99. https://doi.org/10.5898/JHRI.3.2.Beer
  37. Al Moubayed, S., Beskow, J., Skantze, G., & Granström, B. (2012). Furhat: A Back-Projected Human-Like Robot Head for Multiparty Human-Machine Interaction. In A. Esposito, A. M. Esposito, A. Vinciarelli, R. Hoffmann, & V. C. Müller (Eds.), Cognitive Behavioural Systems (pp. 114–130). Springer. https://doi.org/10.1007/978-3-642-34584-5_9
  38. M. Kamelabad, A., Inoue, E., & Skantze, G. (2025). Comparing Monolingual and Bilingual Social Robots as Conversational Practice Companions in Language Learning. OSF. https://doi.org/10.17605/OSF.IO/FZS9P
  39. Bartneck, C., Kulić, D., Croft, E., & Zoghbi, S. (2009). Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots. International Journal of Social Robotics, 1(1), 71–81. https://doi.org/10.1007/s12369-008-0001-3
  40. R Core Team. (2024). R: a language and environment for statistical computing [Manual]. R Foundation for Statistical Computing. https://www.R-project.org/
  41. Dryden, S., Tankosić, A., & Dovchin, S. (2021). Foreign language anxiety and translanguaging as an emotional safe space: Migrant English as a foreign language learners in Australia. System, 101, 102593.
This post is licensed under CC BY 4.0 by the author.