Annotation Instructions
Annotation
The annotations are about the data collected in the experiment related to this study: https://dl.acm.org/doi/10.1145/3568162.3577004 . Please have a look at the paper before the annotation (At least the methodology part).
The data has two main subdivisions for the condition: Collaborative (two students and the robot) and individual (one student and the robot).
The annotations are in the .eaf
file that can be opened by ELAN Annotations software. The main version we are using is ELAN 6.6
for macOS
. You will be provided with laptops that contain the annotation data and you will be given the direction to the annotation files.
Data Folder
The data folder contains sub-folders that are named with these two main structures:
- CG21G## (Collaborative)
- CG21P## (Individual)
Each of these folders should have only one subfolder in them (either called combined
or something long with numbers at the end of it.) which you need to open to arrive to the actual data. It does not matter what the name of the folder is, the important part is the .eaf
file inside. When you arrive to each participant’s data, then open the .eaf
file to start annotaiting.
.eaf
Annotation File Structure
The .eaf
file already contains many of the annotations. Most of these are automatically extracted before (these have annotator name as logs
) so you do not need to touch the settings about them. Below in the structure I will explain exactly which tier is automatic or which one you need to annotate.
Tires
flowchart TB
subgraph game
subgraph deck
subgraph cards
subgraph changed
end
end
end
subgraph ra
end
end
subgraph left
subgraph twl
subgraph twl_fluency
end
subgraph twl_context
end
end
end
subgraph right
subgraph twr
subgraph twr_fluency
end
subgraph twr_context
end
end
end
subgraph rs
subgraph twf
subgraph attention_l
end
subgraph attention_r
end
subgraph twf_context
end
end
opinion
end
game [Automatic]
alien
,trip
Part of the experiment with a particular story line. Each game contains exactly three decks of cards. There are two games that are called Trip
, and Alien
.
deck [Automatic]
Object
/Tool
,Occupation
/Job
,Shelter
/Shelter
You only need to annotate the data that is under deck timelines. (you can skip the times which are not within the decks. We do not care about them). Each deck of card game contains 5 target words and 3 of them are hard words that are important for the annotation.
cards [Automatic]
Lists of 5 words which shows the order of the cards on the table at the moment in time.
changed [Semi-Automatic] [Later Manual Annotation, don’t take action now]
Possible Values: left
, right
This tier says when a change happened in the order of the cards, who was the person who touched the screen and changed the card. (participant on the left or right).
ra: Robot Attention [Automatic] [Empty for Now]
left
,right
,screen
,other
Where the robot was looking at the moment in time. (where was the attention of the robot?)
left and right [Semi-Automatic] [correct when awefully different than the reality]
Speech Transcription
The speech of the participant on the left and right side. (In the individual condition, “left” is the only participant.)
twl: Target Word Left, twr: Target Word Right [Control] [Semi-Automatic]
mallet
,altimeter
,funnel
,hummock
,fuselage
,escarpment
,spokesperson
,welder
,meteorologist
, (one of the 9 target words )
This tier contains annotation of when the left (right) speaker (or the only twl in the individual condition) mentions any of the target words while playing a deck. For example if a participant says “mallet” since it is one of the hard target words, it should be in this tier with the roughly correct timing. In the initial annotation file there are the automatic hard target word transcriptions based on the whisper automatic speech recognition system. You have to go through the data and annotate any of the target words that are missing and remove if the participant says something that is not a target word but has been mistakenly annotated as a target word by the automatic transcription. Pay attention that they user does not need to pronounce the word correctly. It is enough that they pronounce something close to the hard target word.
twl_detection and twr_detection [Automatic]
yes
, empty
Whether the word was detected by the automatic transcription or was transcribed by the annotator (you). Do not touch this tier. It is automatically filled by the system.
twl_fluency and twr_fluency: Disfluency [Manual] [Annotate]
disfluent
,fluent
While saying the target word, or a little before saying that word, did the user have any fluency problem?
Disfluency in the context of learning a new foreign language refers to interruptions or hesitations in speech when attempting to produce unfamiliar words. It’s like hitting a linguistic speed bump as your brain navigates the terrain of a new language, causing momentary pauses or stumbles as you search for the right words. here’s a list of disfluency forms commonly observed when learning a new foreign language:
- Repetition: Repeating a word or phrase to reinforce memory or clarify understanding. (e.g., “I want to go to the store, the store.”, or “The, uh, the restaurant we went to was really good.”)
- Hesitation: Momentary pauses or delays as the speaker searches for the right word or formulates a sentence.
- Filler Words: Inserting filler words like “um,” “uh,” or equivalents in the target language while thinking. (e.g., “I, uh, want to go to the store.”)
- Self-Correction: Interrupting oneself to correct a mistake or choose a more accurate word. (e.g., “She’s, uh, they’re—sorry, I mean, she’s studying French, not Spanish.”)
- False Starts: Initiating a sentence and then restarting it with different wording. (e.g., “I wanted to buy, no, actually, I decided to rent a, um, bicycle instead.”)
- Circumlocution: Describing a word or concept rather than using the exact term, especially when the correct word is momentarily inaccessible. (e.g., “I couldn’t remember the, uh, the word for that thing you wear when it’s cold, you know, the, um, jacket!”)
- Code-Switching: Inadvertently using words or structures from one’s native language in the middle of a foreign language sentence. (e.g., “I was walking down the calle—oops, I mean street, and suddenly I saw a really cool tienda—store.”)
- Pauses for Translation: Pausing to mentally translate a word or phrase from the native language to the target language. (e.g “He is a very buen chico—um, good guy, you know.”)
- Gesture or Expression: Using non-verbal cues like gestures or facial expressions to convey meaning when struggling with words.
- Phonetic Approximation: Pronouncing a word using sounds from one’s native language due to unfamiliarity with the correct pronunciation in the new language. (e.g., saying “Meteorolog” which is the swedish word for the “Meteorologist” while speaking in English. Or saying anything else that is close in pronounciation.)
twl_context and twr_context: Context [Manual] [Annotate]
asking definition
,comparison
,use in context
,providing definition
This tier determines in which context the target word was used. Whether the party who said it was asking for definition, comparing, using it in context, or providing definition.
Annotating
At the current stage, these are the steps and tires that need to be annotated. Please follow these steps:
- You are given the link to a spreadsheet. Open that and choose a group that you are going to annotate, and write your name in front of the group code.
- Open the
.eaf
file in the data folder of the group that you have chosen. - If anything is wrong with the group annotation file, write that in the spreadsheet.
- If everything is fine, place the time header at the beginning of the first deck and start the annotation.
- Start the annotation of:
left
andright
: Most important is the timing of speech. Check that the transcription in the tier have correct timing and that they represent the speech of the relative participant. You do not need to correct the transcriptions. Do so, only if it is very different.twl
,twr
: Check that the automatically detected target words are correct. Especially check for the timing as well. If the timing is wrong, correct it. If the word is wrong, delete it. If the word is missing, add it.twl_fluency
,twr_fluency
: If the target word is produced with disfluency as defined above, mark it asdisfluent
. If it is fluent, mark it asfluent
.- Fill in the
twl_context
andtwr_context
tiers. There are four options.
- Annotate all the decks, and save the file at the end.
- Upload the annotated version to the cloud.
- After doing all the annotation, mark the steps that you have completed in the spreadsheet.
Do not forget to regularly save the file after each annotation or regularly by using
command + s
.
When you open a file to annotate, write your name as annotator in the settings.