Transcribing and coding data
Once you have your video or audio data recording, the next step in the analytic process is to convert this ’raw’ data into a word document that will enable you to code and analyse the data in full. This is called transcription: the process of converting audio and video data into a written document. In this chapter, we will briefly discuss why transcription is an important part of the analysis before working through the practical details: how to format it, where to start (and when to stop), and adding in additional features such as intonation, pauses, overlaps and visual details. We will then consider how to code your data: how to search and select those instances that you will analyse in detail. Note that if you have textual data — such as from an online discussion board — then you will not need the transcription process, though you will still need to format your data. In that case, you can read Box 5.1, then skip to the ’coding the data’ section of this chapter. For the rest of you with video/audio data, you have a little work to do first, and this will help you to become more familiar with your data and better able to analyse it. A good transcript is the key to a brilliant analysis, and it offers a completely different insight into the choreography of social interaction.
The process of transcription should not be, as many people suspect, a dull and repetitive process. If it is, you are probably doing too much for too long (see ’the golden rules of transcription’). On the contrary, it is the process through which we first really see the benefits of the zoom-lens approach of DP: we can be unashamedly meticulous, focused and geeky about our data. We can examine the minutiae of discourse and interaction and the implications of this for social interaction and psychological concepts. Here we also see DP’s close relationship with conversation analysis and a focus on the organisation as much as the rhetoric and word choice of interaction. Transcription can be considered as similar to travelling by canal boat (also known as a narrow boat, or barge). When you first step onto a slow-moving boat like this — which sometimes travels at no faster than walking pace — it can feel frustratingly, mind-numbingly slow. Once you become accustomed to the speed of travel, however, other things become clearer. You are more likely to notice things in the environment around you that you might have otherwise overlooked, had you been going faster. So it is with transcription. It forces us to slow the interaction down to a speed that will enable us to type up what and how things were said. Because of this practical need to slow down the recording, we then begin to notice features of the interaction (a ’tut’ or a sigh, for example) and aspects that might then become our analytic focus. And we realise just how beautifully messy, but highly effective, spoken interaction really is.
Box 5.1: Formatting textual data
If you are using textual data from the internet (such as a blog or discussion board), you already have the words typed out and your initial task then is to format this data. The first thing to do is to copy and paste all parts of the text that you are going to use as data and put this into a word document. Remember to label the document clearly so that you know where and when you sourced the data. You will need to make decisions about whether to include any graphical or pictorial images that accompany the text — such as emoticons or avatars in discussion boards — these are often an important part of the local context of the interaction in that setting. The second step is then to add line numbers (see notes below) so that the textual data is set out clearly. If your data is taken from a discussion thread, then the initial post and replies might be included in one ’extract’, so that you can see the interaction within that thread as a turn-by-turn sequence. Include the dates and times of posts, as well as the name of the person who wrote the post (which are often included automatically in the discussion forums) so that you can track through the progression of the interaction, from when it started (and by whom) to when it finished. Your aim with formatting textual data is thus to include as much of the original information as you can, while still creating a clear and readable transcript.
Why transcription is the first step of analysis
Transcription is a theoretical process as well as a practical one. It is theoretical in that we make decisions about what will be transcribed, what will be included in the transcript and how it will be formatted. Even prior to this, we have already made decisions about what research question to use, what counts as ’data’ and what we will record, and then which parts of this recording we will transcribe. Each of these decisions will be based on our assumptions about what it is we are studying and how we are going to make interpretations about the discourse. They also involve practical issues: the position of microphones can impact on the audibility of people’s voices and other sounds, and thus on the quality of the transcription. In Chapter 2, we saw the differences between transcripts that might be used by different forms of discourse analysis and the implications of these for our analytic focus. As such, the transcript has the potential to illuminate certain features and conceal others. If we include pauses but not hand gestures, for example, then we will analyse in terms of what the pauses are doing with little understanding of how these are related to the visual aspects of the interaction. A transcript typically only makes available those people who are talking in the interaction. If someone is present but silent in an interaction, then they are often missing from the transcript, though this is where the inclusion of images in published transcripts has an added benefit. So the transcript is another step in this theoretical and analytical process. This means that we cannot treat it as a neutral, impartial record of the event; it is always bound up with the research context.
In a similar manner, we might also treat the transcript as never really ’complete’, and that it might change depending on what aspects or areas of the data we are focusing on at that moment. For instance, if we are interested in how people take turns in a political debate then we may pay more attention to these in our transcript and overlook other details, such as facial gestures or voice intonation. This does not mean that the transcript is just a subjective interpretation of what is going on, but rather that we need to be clear in how we transcribe as well as making sure that our research process is coherent from start to finish.
It should hopefully now be clear that transcription is not a stage that we must hurry past before we can get stuck into the ’real’ analytical work. It is a crucial part of the analytical process; a means by which we can become absorbed in the data, begin to notice features of the interaction and gain an understanding of ’what is there’ in the data. This is also why we cannot work with notes or a rough gist of what was said. These would leave us unable to produce the detailed analysis of DP. An orthographic (words only) transcript may be easier to read at first but it, too, can gloss over the features of social interaction that can make all the difference in terms of how we analyse and interpret the talk. If we see just the words spoken, but not how these were uttered, then they are more likely to be interpreted in terms of an individual speaker rather than a rich social context. They appear as if a written, grammatical language, not a live and kicking discourse. We need to see the interaction played out on paper, the nuances of spoken words captured in clear detail. This stage of transcription will prove invaluable later on. Those who have the resources to pay for someone else to transcribe their data may be saving time initially, but this is often time that must be taken at a later date.
The golden rules of transcription
Before we get started into the practicalities of converting your audio and/or video data into a written transcript, here are some golden rules to save you time and help prevent transcription from taking over your life.
· You do not need to transcribe all of your data. Depending on your time and schedule, therefore, it may be more efficient to make detailed notes (see Box 5.2) on all your data, and transcribe those sections which you have identified as being of analytical interest.
· Create an electronic version of your transcript from the start. Do not be tempted to write it out by hand first, then type it up later; you are simply doubling your transcription time if you do that.
· Allow sufficient time to transcribe. It can take at least six hours to produce an orthographic transcript from one hour of video or audio data. This time can vary according to things such as: how many people are talking in the data, how fast you are as a typist, whether there is a lot of overlapping or unclear speech and the quality of the recording. To produce a Jefferson transcript, it can take anywhere between 10 and 20 hours for every hour of data. While these estimates might seem eye-watering, the more time you put into transcription, the quicker you will be able to move forward in analysis and identify issues to focus on.
· Use different headphones. This can be really helpful if you are struggling to hear certain sections of the recordings; different headphones can have different sensitivities to sound.
· Work in short bursts and alternate transcript work with other activities. Taking frequent breaks will enable you to become more attuned to details in the talk and less likely to miss the details. Transcription can be a tiring process and it is always better when you come back to it refreshed.
· Do not use punctuation as you would for written text. While it may feel strange at first not to use commas, full stops and capital letters, it is important that the speech is presented as it is said, not how it might appear grammatically or in written prose.
· A good quality transcript is the key to a good quality analysis. You cannot analyse something that isn’t in your transcript, so you need to make sure that you are thorough and detailed; even the smallest words or features of interaction (such as ’mmms’, ’ohs’ and short bursts of laughter) can be highly consequential for analysis. Note that for a good quality transcript, you need first to have a good quality recording; see Chapter 4 on this issue.
· Paying for a transcriber or transcription service is not the easy option. While this may help if you have vast quantities of data, it will not remove the need to read through and check the transcripts for accuracy, as well as adding in Jefferson details.
· A transcript is never ’complete’ or finished. Don’t despair; this isn’t the same as saying that you will need to transcribe forever. It means that a transcript is always a partial representation of the video/audio recording, just as the recording itself is a partial representation of the interaction it sought to capture. What you are aiming for, then, is a transcript that will be good quality and ’good enough’ for a DP analysis, detailing the features that are relevant to the issue that you are aiming to analyse.
Making detailed notes on your data
As noted above, you do not always need to transcribe all of your data. While it may provide a comfort and satisfaction to have all your data transcribed, unless you have the funds to pay for someone to transcribe for you, this can often take up more time than you really have available. This is a decision you will need to make on the basis of how much data you have recorded, how much time you have and how quickly you can type. Even for a transcription geek like me, there will come a point when transcribing more of the data does not add any extra value to the project. So before you begin transcribing any of your data, consider first the option of making detailed notes on the whole corpus.
Having a written document that details the contents of a video or audio recording but without taking the time to transcribe it in full can be helpful for two main reasons. First, the process of producing these documents can allow us to get a quick overview of our whole data set in approximately the same time that it takes to watch or listen to it in real-time. Second, it can be a really useful resource to search when we’re analysing different topics, so the more detail we add, the better, and you can always go back to the notes and add in further details as you become familiar with your data set. Skim reading through a word document (and using the ’find’ function to search for certain words) can be a lot quicker than trying to watch or listen to your data to find examples of instances where a particular pattern occurs.
Making notes in this way allows you to be as detailed or as broad as you wish, and this may depend on what stage you’re at with your research. For example, I have been studying family mealtimes for a few years now, and often focus on food assessments, so in my detailed notes I typically write down the specific assessment words that people use (see Box 5.2 for an example). This does, of course, mean that I am likely to miss other things; so bear in mind that this process of making notes on the data, like transcription, is only a partial record of the data. Note that I have also written down rough timings quite frequently, so that I can quickly find the sections that I want to focus on, and transcribe fully, at a later time. The notes do not have to be grammatically correct or neat; the important thing is that I am capturing some features of the talk in a relatively quick manner, to help me identify sections later on. I have also headed the notes with some details about the recording that will help me to log and track across different data sets. Some of these details can then be used in a summary table or spreadsheet for when you store or archive your data.
Box 5.2: Data notes
Example notes from a project on weaning interactions with parents and their infants.
Lewis family: meal 001, 30 June 2014.
Length of recording: 44: 28
Present: Mum, Dad, Ellie (infant, 8 months old)
Two video cameras: one on Ellie, one focused on Mum & Dad, who are sitting next to Ellie at the dining room table.
Food eaten: roast chicken and roast vegetables. Parents using baby-led weaning where pieces of food provided for baby who feeds herself.
· Mum brings Ellie to highchair and straps her in. ’Good girl’ said a few times. Mum sits down diagonally opposite Ellie. Puts plastic bib on Ellie. Dad can be heard in background, clattering plates and dishes.
· 1:20 Mum offers water in cup with lid & spout. Ellie drinks. ’Good girl’. Some adjustment of the table on the highchair; this is pulled up to the table.
· 1:50 Mum leaves to get dinner & returns with Ellie’s plate. Mum talks to Dad who is off-camera.
· 2:30 Ellie and Mum/Dad begin eating. ’Is that good’? Dad comments: ’nice? Yum yum, mmm? ’we like roast chicken don’t we?’ Is that good?’
· 3:30 Mum offers water; Ellie still busy eating. Dad makes lots of comments about ’liking chicken, yum yum, is that nice?’ Dad says there’s plenty more.
How to transcribe (in three steps)
Step 1: Creating the orthographic (also known as basic, verbatim, playscript or words-only) transcript
At its simplest, the transcript provides a written document of the data. It means that you can read through the data as well as watching or listening to the original recordings. An orthographic transcript captures only what words are spoken, by whom, and in what order. You do not have to use the correct spelling, however, if words are spoken in a particular way. This means that you should type out the words spoken as they sound, rather than how they are spelt. For example, ’hello’ is the English spelling for a typical greeting, but it might also sound more like ’ello’, ’hallo’ or just ’lo’. There are occasions, however, when using the orthographic form would enable you to see fairly quickly what is being said — to scan through the transcript at the coding stage, for example — and to use the search functions of word software to collate all instances of a particular word or phrase. An example of this is when I searched for all instances of the ’eugh’ utterance for a study into disgust responses (which might also sound like ’urgh’, ’ugh’ or ’euw’) in my family mealtime data. If I had initially transcribed it using different spellings, it would have taken much longer to identify all examples of this in the data corpus. So remain consistent with how things are actually spoken, but also be aware of how this might have implications for coding and electronically searching the data later on. This is one example of how the transcript is never really finished; there are always alternative ways in which we could represent the recording.
Box 5.3: Checklist for orthographic transcript
· Use line numbers for each line in talk.
· Write the speaker’s pseudonym next to their speech. Start a new line when there is a change of speaker.
· Write out what is said by each person, without ’tidying it up’ or typing what you think they meant to say. Write it exactly as it is spoken.
· Do not use punctuation to create sentences; we might write in sentences, but we do not talk in sentences. The transcript might look like a long string of words at this stage, but this is very typical of spoken interaction.
· Represent noticeable pauses as (pause); this can help you identify particular features of the interaction later on.
· Note when there is overlapping speech (two or more people talking at the same time), by typing ((in overlap)) before the words spoken.
· If you are unsure of what is said, write your best guess in brackets, or else write ((unclear speech)).
· Use single quotation marks to indicate speech that is spoken as if it had been spoken by someone else, or by themselves at a previous time (referred to as ’reported speech’).
See Nikander (2008) if you are working with data that is in a language other than English, and if you will need to present in English at some point.
You first need to decide which parts of your data to transcribe, and to what level of detail. Whatever amount of data you have, it makes sense to start transcribing at the beginning: at the start of the first recording that you made (excluding any pilot or test data you might have collected; you can always return to this later if necessary). Remember that you may not need to transcribe all of your data if you have a substantial amount (and what counts as substantial will depend on the time you have available and your typing skills), but you should aim to create an orthographic transcript of at least half of your data to allow you to examine it more closely. The sections that you work up to a more detailed transcript can then be selected on the basis of particular features of the data that you have noted in the early stages of coding (see step 2, below).
To begin the process of transcription, play a very short section of your recorded data —around five seconds — at a time, and type this into a word document. You may have to play this several times, and possibly on ’loop play’ (where the software automatically plays back a section that you have highlighted), to capture all the words in this part of the talk. Once you are happy that you have included all the words, play back the next five seconds, but include around one second from the previous play-back, so that you are overlapping each section that you listen to. For example, listen first to 0:00:00 — 0:00:05, then from 0:00:04 — 0:00:09 of the recording. This can help to avoid missing out small sections, or mis-hearing words that might cut across the short sections. Once you have transcribed one complete section of data recording, listen to the whole interaction in full while you are reading through the transcript, to check and make sure that you have included everything. This final check can often highlight small sections that you may have missed, and allows you to take a metaphorical step-back from the data after the close listening that is required for transcription.
As you are creating the transcript, it is essential that you anonymise your participants and any identifying features (such as place names or organisations), and the easiest way to do this is to create a list of real names alongside your chosen pseudonyms as you encounter each new speaker. Keep this list in a separate word document in a password-protected computer: this is effectively the key that keeps your data confidential and anonymous, so take care with this. It is not advisable to use initials (e.g., ’A’, ’B’) to refer to participants, as if you later decide to change this to a full name you will have to change each occurrence of the initial manually (you cannot use the ’find and replace’ function in Word as every instance of that letter, in all words, will also be changed). When choosing a pseudonym, it can be helpful to keep the number of syllables consistent between the real name and the pseudonym, and consider if the person is ever referred to by a shortened version of their name. For example, if one of your participants is called Jennifer (three syllables) and sometimes called Jen or Jenny, we could use Gillian (shortened to Gill or Gilly) as the pseudonym. You may also use other techniques to enable you to create pseudonyms, such as keeping the first initial the same or ensuring that gender-neutral names or abstract ’avatar’ names are matched with a comparable replacement.
Now we come to the practicalities of the transcript, in terms of how to format it and add in line numbers. This can sometimes be one of the most frustrating parts of creating a transcript, so I will go into quite specific detail here to help guide you through the process. Use Courier font, size 10 (Courier New or Courier Prime). This typeface is what is referred to as monospaced or non-proportional font, which means that each letter takes up the same horizontal space on the line. Compare that with the proportional font used in this book, and where the letter ’m’ takes up a wider space than an ’i’ for example. Having a non-proportional font is important for a DP transcript so that we can accurately represent where overlapping speech occurs. It is also a clear and easy-to-read font, which is important if we want our transcripts to be easily accessible for publication. Size 10 font is also the typical font size used for transcripts (so slightly smaller than the text size — 12 — for the surrounding text).
You should also provide wide margins on the left- and right-hand side of the page. There is no set requirement for how wide these should be, but this paragraph gives an indication of what it might look like. The wide margins provide space for analytical notes, though this is less important when working with an electronic document, where notes can be added using ’comment’ functions.
What you are aiming for is three columns: one for the line numbers, one for the speaker names or identifiers, and one for the discourse. Once you add in the line numbers — and see below how to do this — the first column will automatically be created for you. So for now, just focus on writing the names first, and then a second column for the talk. Use the tab function to create a gap between the name and the talk; this will keep the start of the columns in a neat line. If one person’s talk continues over more than one line, press return at the end of the line, then space bar, then tab, to align the talk with the rest of that column. If another person begins speaking, then you can type in their name, tab across, and start typing up their talk. Do not use a table to create a transcript as this can be cumbersome to add in more line numbers.
Below is an example of how a transcript might look, from a section from the ’steak and fish’ mealtime data that we encountered in Chapter 2. The video that corresponds to this transcript can be found here: www.youtube.com/watch?v=OtKaXw6WqYM. Note that I have just typed the words only, with no pauses or intonation, and no punctuation.
Lesley: yours is fat
Bob: mine’s is quite a lot of I don’t like steak
Lesley: no don’t do either
Now you can add line numbers, and you can start this at any point in the transcription process. There are at least two different ways of adding line numbers, and this can cause confusion when creating the transcript or when you want to cut-and-paste sections of it to another document later on. The best way to add line numbers is to use the ’line numbers’ option that can usually be found under ’page layout’ or ’layout’/’text layout’ tabs, or in page set-up in your word document. These usually have the option to choose ’continuous’/’restart each page’/’restart each section’, as well as ’none’, for when you want to remove the line numbers. There should also be further options (often in a separate pop-up box) to specify what number you want to start with, and whether you want to count in 1s, 5s etc. This way of adding line numbers means that the original format of your transcript (including how many words you can fit on each line) remains unchanged. The line numbers are effectively added to a specific section of your word document, so if you have other writing in this document that you do not want numbered, then you will need to create section breaks. This process should also mean that when you cut-and-paste a section of the transcript into another word document, the line numbers are also transferred. The line numbers should appear down the left-hand side of the page like this:
1. Lesley: yours is fat
2. Bob: mine’s is quite a lot of I don’t like steak
3. Lesley: no don’t do either
Another way of adding line numbers is through the use of a ’numbered list’; this is often found under the ’home’ and ’paragraph’ tabs in the document. While this might at first seem easier, in that you can more easily add line numbers to just a section of the text (by using your mouse to select and highlight the section to which you want to add line numbers), it can affect the formatting of your document. It can shift the whole text over to the right (it takes up space in the left-hand side of the page, thus bumping everything across) which if you have a lot of text on one line can result in this happening:
1. Lesley: yours is fat
2. Bob: mine’s is quite a lot of I don’t like steak
3. Lesley: no don’t do either
Imagine this occurring on hundreds of pages of transcript, and you have just made yourself more work in terms of tidying that up (what you would need to do would be to place the cursor at the start of ’steak’, press return — which creates a new line number — then space bar and tab to align the speech with the rest of that third column of text). So you can also use this approach to add line numbers, but beware that it may cause some formatting issues.
Each line should have its own number, even if it is where there is a pause in the discourse. Start a new line when there is a new speaker, a longish pause (e.g., over one second) or before you get to the right-hand margin (to create a large space at the side of the page). You might also need to adjust the length of a line depending on whether there is any overlapping talk with another speaker to ensure that this is clearly presented. You will also find that as you add in more details to the transcript (as in step 2 below) that you will be able to fit fewer words on each line. As the transcript becomes more detailed then, it will also become longer.
Step 2: Creating the ’Jefferson’ transcript
The standard transcription key for DP (and other interactional) research is that developed by Gail Jefferson and is known as the Jefferson transcription system. Unless you have only a small amount of data, it is unlikely that you will transcribe all of your data to full Jefferson level. As noted earlier, this can take anywhere between 10 and 20 hours for each recorded hour of data. What you will need to do is to transcribe at least half (if not all) of your data to words-only level, then begin the coding stage. At that point, you will be in a better position to identify the sections of the transcript that you want to analyse in more detail. So if you are at this stage, skip ahead now to the ’coding the data’ section, then return here for guidance on how to transcribe in more detail. Bear in mind that the process of coding data and creating more detailed transcripts can often go through cycles; you may have to do this more than once. This is because the detailed transcripts can reveal different features of interaction that might not have been noticeable in the first stages of preparing an orthographic transcript and then coding from this.
The process of creating a Jefferson transcript is very similar to that for producing an orthographic transcript; the difference is that you will spend more time on shorter sections of talk and you will use symbols to represent the phonetic features of talk. Box 5.4 details the main transcription symbols from the Jefferson system that you will need for a DP analysis (see Hepburn, 2004, for extra details to transcribe crying sounds). You should work through your coded sections of data, one at a time, and play each recording back while working with your orthographic transcript, which you have already prepared. You are not starting from scratch each time, therefore, since you already have the words typed out. Listen to short sections of the recording at a time, again using loop-play if available. You will probably need the list of transcription symbols alongside you while you are doing this, until you become familiar with them. Take your time to check and stop the recording frequently. Add in the symbols to correspond with how the talk is spoken: you are mainly focusing on aspects such as the timing of words (how quickly they are spoken, and the silence between words, whether there is overlap between speakers), the pitch (rising or falling) and the emphases placed on certain words. You can do this by concentrating on one feature of talk (e.g., pauses) at a time, or include all aspects of one short section at once. Extract 5.2 below shows the same fragment of interaction that we saw in Extract 5.1 with one extra line to illustrate the occurrence of the overlapping speech here; this is added at the exact point at which the words were heard simultaneously. This section of talk lasted five seconds, so you can see the kind of detail that can be added in just a short space of time.
When using the symbols, always place them before the letter or word to which they apply, although there are occasions, such as when placing overlaps, stretched talk or breaths, when they will be placed in the middle of words. In these cases, place them where you can hear them between the sounds in words. For example, when people appear to be laughing as they are talking, it can look like: ’laugh(h)ing thr(h)ough t(h)alk’, also sometimes referred to as interpolated laughter. The ’.pt’ in Extract 5.2 is a ’lip smack’ sound, and is underlined in the transcript as it was particularly noticeable in the audio. The ’disnae’ is a shortened version of the words ’does not’, spoken with a fairly strong Scottish accent. Focus at first on becoming familiar and confident with using the symbols. Your transcript is a working document, remember, and to be used alongside the recorded interaction, so it does not have to be ’perfect’. Do not worry at this stage as to whether the symbols are relevant for your analysis; what matters here is that you are building a more detailed picture of the interaction in written format so that you can analyse it more completely.
Box 5.4: Jefferson transcription notation for DP (taken from Jefferson, 2004a)
· (.) A micro-pause (less than two-tenths of a second)
· (1.2) A pause or silence, measured in seconds and tenths of seconds
· = Latched talk, where there is no hearable gap between words (can occur within a turn at talk, or between speakers)
· :: Stretched sounds in talk; the more colons, the longer the sound, as in rea::lly l::: ong sounds
· CAPITALS Talk that is noticeably louder in contrast to the surrounding talk (sometimes shouting)
· Underlined Emphasised words, or parts of words, are underlined
· ° Degree symbols enclose noticeably °quieter° talk, with double degree signs indicating °°whispering°°
· > < ’Greater than’ and ’less than’ symbols enclose talk that is at a faster pace (>speeded-up< talk) than the surrounding talk
· < > ’Less than’ and ’greater than’ symbols enclose talk that is at a slower pace ( talk)
· ↑ ↓ Upward arrows indicate a rising pitch in talk, downward arrows indicate falling pitch
· £ British pound sign indicates smiley voice or suppressed laughter
· # Hashtag indicates ’creaky’ voice such as when someone is upset.
· [ ] Square brackets indicate the start (and end) of overlapping talk
· hh hhs indicate audible breaths. A dot followed by hs (.h) indicate audible inbreaths; without the dot (as in hh) is an outbreath. Within a word (as in ’ye(h)s’), this indicates laughter while talking (’interpolated laughter’). The more hs, the longer the breath.
· Huh/heh/hah Laughter can be represented with outbreaths that have vowel sounds within them.
· ? Strongly rising intonation (not necessarily when asking a question)
· , . Commas indicate slightly rising intonation, full-stops indicate falling indication at the end of words
· ’yes’ Single quotation marks are used to indicate reported speech or thought
· (( )) Double brackets (sometimes without italics) contain details about other features that have not been transcribed, e.g., ((waves hand))
· (Unclear) Words in single brackets are the transcriber’s best guess at what was being said, or (unclear) or (inaudible) if it really can’t be heard clearly
[Acknowledgment: From ’Glossary of transcript symbols with an Introduction’. In G. H. Lerner (2004), Conversation analysis: Studies from the first generation, pp. 13—23. With kind permission by John Benjamins Publishing Company, Amsterdam/Philadelphia. www.benjamins.com].
Your first encounter with a Jefferson transcript can be fairly overwhelming and confusing. It can be hard to know what to read first and how to read it. This is why creating a Jefferson transcript for yourself can help you to become familiar with the symbols and how they ’sound’. Reading out the extracts and paying attention to the symbols can help you to ’hear’ the talk as it was said. This detail is there for a reason, however, so spend a bit of time getting used to transcripts in this format. Reading journal articles that include Jefferson transcripts will help too.
One final word of advice about pauses before we move onto even more advanced transcription — the timing of pauses can be done in two ways. There is the technical way to do this, where you use the playback tool to indicate the length of the gap between sounds (see section on software). Use this if there is a particular section of talk where pauses appear to be crucial to the interaction, or if pausing is an aspect of your analysis that is particularly important. The non-technical way is sometimes referred to as the mississippi method of counting seconds (some people also say ’one-one-thousand’ but I prefer mississippi; it has a nicer rhythm to it). Each syllable of miss-iss-ipp-i counts as two-tenths of a second (represented as (0.2) in our transcripts), and so when we add one, two or three to the end of mississippi, then we are counting one second, two second, and so on. Use a watch to help you find the correct pace at which to count. Start saying ’mississippi one, mississippi two’ (and so on) when someone stops speaking, and stop when the next sound in your data appears; this is the gap (pause) between the two sounds. For example, if you get to ’miss-iss’, then that is 0.4 of a second; ’miss-iss-ipp-i’ would be 0.8 seconds. The point of this is that it should be a quick and dirty way of timing the pauses between sounds that will enable you to create your transcript with sufficient detail, and you can always go back and check the crucial pauses with the technical method later on.
Step 3: Adding extralinguistic details (visual gestures)
Once you have become familiar with transcribing the linguistic (words spoken) and paralinguistic (aspects of speech delivery, such as intonation, etc.) details, then the next stage is to include what has been termed extralinguistic detail. This includes the non-verbal aspects of interaction, from eye gaze, to hand or bodily gestures, bodily orientations of different speakers and the interaction between objects and people in their environment. This step is obviously only relevant for those with video data, and it is typically reserved for more advanced DP analyses. If you are using DP for an undergraduate (bachelor) class or dissertation, then getting to Jefferson level transcript should be sufficient. Unless you’re really keen, of course. Either way, this third step is extremely useful, analytically, but it does take time and effort. So if you are not proceeding further, then you will need to pay more attention to the visual aspects of the recorded data in the video alone, without having included these in the transcript.
Knowing where to start adding in these transcription features can be a daunting process: we could, in theory, include all visual aspects of an interaction. These might involve:
· Eye gaze (e.g., gazing at objects or people)
· Facial movements (e.g., smiling, eyebrow raise, lip curl)
· Hand gestures and touch (e.g., pointing or open palms, touching self or others)
· Posture (e.g., how the body is orientated, leaning, or position of head and shoulders)
· Interaction with objects (e.g., passing an object or pointing to something).
Remember, however, that not all of these will be directly relevant to your analysis, and you may find that you need to return to your transcripts once you’ve begun analysing coded sections in detail, to add some more refined features to the transcript or to focus on another feature. This is why a transcript may never be ’finished’. Nor should you try to capture everything. There is immense detail and intricacy in social interaction, even in just one minute’s worth. This is one of the ways in which conversation analysis has been particularly influential for DP: to illustrate the orderliness of talk and interaction, and of the relevance this has for how we understand discursive practices. What you will find is that it becomes almost impossible to read a transcript if there are too many notes about visual details. Our task, then, becomes to maintain an awareness of this complexity without being overwhelmed by it. We can do this by focusing on one feature at a time.
Let us illustrate how this ’layering’ of extralinguistic features might look on our transcript, by detailing the eye gaze of Bob and Lesley, who feature most prominently in the short section of mealtime talk that we have been using as an example. The simplest way that we can add in these details is by describing them in words, and using overlap markers to indicate where and when these occur:
The advantage of including detail in this way is that it is easy to write in features as and when we notice them in the video. In written form, these words can then be searched using the ’find’ tool in Word. The transcript begins to look quite cluttered, though, and the frequent placement of overlapping brackets can lead to confusion at times as to whose eye gaze or gestures are being described. It also leaves us open to the risk that we might label behaviour in a particular way, and that this might lead to one interpretation over another. This is less obvious when describing eye gaze, but if we had added descriptions of other facial gestures, such as ’disgust’ face (rather than: ’lip curl, nose wrinkle, downward turn in mouth’), then that might assume or make relevant a particular emotional response. This is not the same as saying these things are not relevant, just that we have to be careful about how we describe them. Here we see how the social constructionist approach to discourse applies to our own practices as well; we need to be aware of how easily we can make one interpretation seem more ’obvious’ than another.
Another way to include extralinguistic features in transcripts is to make use of lines (dotted, dashed and solid lines) to indicate the start and end of a movement, or the shifting of eye gaze from one person to another (or from an object to a person). For example, a series of dots (….) can indicate that a person is moving their eye gaze to look at, or shift their bodily position to face towards another person. A solid line then denotes the period in which that gaze or bodily orientation is fixed on another person. A series of dashes (- - -) can indicate gaze towards an object, and commas (,,,,,) can also be used to indicate a turning or shift of eye gaze away from a person. This way of transcribing visual details was devised mainly by Charles Goodwin in the early 1980s, though it has been adapted and used by others since then. It is often combined with screen shots (or ’frame grabs’) to illustrate how that visual detail works within the interaction as a whole. Now look at how our mealtime extract might look if we transcribed it that way (note also that the line numbering system has been added using the ’section break’ method):
In Extract 5.4, the images are placed immediately below the line in the transcript to which they relate. There is one image embedded within another one — the smaller image in the bottom left of the frame is taken from the opposite camera angle. From left to right in the main image there are: Bob, Edith, Linda and Lesley. So Bob is sitting opposite Lesley, and Edith is sitting opposite Linda; they are seated at a circular table. In the transcript, the solid line on Lesley’s turn indicates her gaze at Bob (her father). Bob’s gaze is then either at Lesley, his plate, or at his wife Linda (briefly). Even with this very short, five seconds of interaction, adding in the eye gaze of just two people — and we have not included Linda’s or Edith’s eye gaze at this point — the transcript becomes very complicated. So examine how the discursive practices of people are bound up with everything else in the interaction, and with how capturing visual features can enhance your understanding of just how psychological concepts are invoked and made relevant in social settings, but use these sparingly in your transcripts, publications and presentations.
Coding the data
Coding your data — and we are referring here to data as both the recorded interaction (the video or audio file) as well as the transcript you create — is the process in which you sort the data into smaller sections for analysis. It is a means of making the data manageable, as even with small amounts of data you are unlikely to be able to (or need to) analyse it all. It is not to be confused with the coding that is done in forms of qualitative research where meanings are attributed to the data and the aim of the code is to capture some ’essence’ of the data. By contrast, the DP coding stage is the process of searching for particular features or areas to be analysed, using your research question as a guide. For instance, you might code data to identify those sections where people invoke a racist or xenophobic identity to examine how and where these are accomplished in your data corpus.
The relationship between transcription, coding and analysis is an iterative one: while we may begin them in that order, more often than not each stage becomes intertwined with the others. For instance, while analysing a section of transcript that we had first identified through an early coding stage, an issue that we might have at first thought to be simple becomes more complex. When I first began analysing mealtime conversations, I thought that a focus on ’food assessments’ might be one small part of my PhD. Not long after the analysis on food assessments began, that ’small part’ seemed to explode: suddenly I realised there were many different avenues to explore (such as how assessments are constructed, where they occur in mealtimes, who uses them, and so on). I am still researching this topic now. So this phenomenon of a topic or issue ’exploding’ into many other issues — and it does feel like an explosion, in that it can derail your analytical focus temporarily and create a sense of confusion — is a good sign that your analysis is developing and you are identifying new ways of approaching the data. It does mean, however, that you may have to go back through your data and code it for some of the different, new issues that you have recently identified.
To code your data, you first need to decide what feature or aspect of the data to search for. It may be all the ’firsts’ in an interaction — when someone first begins talking to someone else, or when someone enters a room, for example — and in that sense you may identify those instances through their location at particular time points in the interaction. Alternatively, you may want to focus on how a particular psychological notion (such as blame, embarrassment or deference) or a particular social action (such as flirting, criticising or seeking advice) is accomplished. Note that you will see later (in Chapter 6) that psychological notions and social actions are often difficult to separate. In such cases, they may feature at any time in your data corpus, so you will need to search thoroughly and in an organised manner.
As you search through the data, create a new file or folder in which to store those instances that you have coded. This will be a data set: a smaller, sub-set of your whole data corpus. You may want to store just the transcripts, or also include the video or audio clips that accompany these, and software can be used to help with this process through the use of synchronised transcripts (see section below). As you code each instance (or ’extract’), be sure to label it in some way, so that you know exactly where in the data corpus it was taken from. This might include some information on the date of the recording, the participants and the timestamp to indicate exactly where in the recording it is located. This process can be time-consuming, but do not skip over details; this data set may form your final analytical extracts and be used for many weeks or months ahead.
When identifying instances to include in a data set, you should be as inclusive as possible. You may come across a section of the data that you are not sure about, and there are often fine lines between what is and what is not a particular social action or psychological concept. Take flirting, for example. It is not always easy to identify what counts as ’just’ a compliment or whether an eyebrow raise or smile, at just the right moment, is flirtatious or not. Always include these borderline cases. It is far easier to remove them from your data set later on than to search through the whole data corpus again. Similarly, when cutting-and-pasting that instance from your transcribed corpus into your data set file (or the section of audio/video), always include a few lines or seconds before and after what you think is the core of the instance. This is important as it helps us to preserve something of the immediate interactional context, and thus enable us to analyse that fragment appropriately.
Once you have your data set, you may well go back to the transcription stage and transcribe these selected instances — those fragments of data — in more detail, if you have not done so already. You can then begin to analyse those fragments. That is, you do not have to wait until you have Jefferson transcribed them before you begin analysing them all. Sometimes it pays to transcribe a few in detail, begin analysis on those, then you can make a decision as to whether you need to code again, change your analytical focus, or else continue with the transcription and analysis. The key here, therefore, is to take your time when working through these stages. Do not try to rush through to complete and write up the analysis. Be flexible in terms of what your analysis might focus on and what aspects of your data you will code and transcribe. This is the stage in which we really begin to see that our research may produce unexpected results, that we need to work with what is in the data, not what we expect or hope to find.
Software to aid transcription and coding
There are many different types of software tools that will help you with the transcription and coding processes. These include software that will enable you to:
· playback the audio/video recordings
· edit the recordings
· transcribe the data
· ’automatically’ transcribe (voice recognition software)
· code the data
Some of these are essential, in that it would be extremely laborious (involving pens, plenty of paper and a photocopier) without them. Just as the tape-recorder sparked a whole new approach to interaction in conversation analysis so developments in technology are not only facilitating DP research, they are also changing the field. Here, then, is your quick guide to what you need.
Your essential kit here is software that will enable you to play back your video or audio recording (you will not need this, obviously, if you have textual data). There are many free options available, but aim for one that will play back different formats and can be used on different computer operating systems. This will be essential if you are holding a data session (see Chapter 6) or presenting a video or audio clip (see Chapter 8). For video, VLC media player is a good, free all-round universal player. This has the useful option of allowing you to slow down the playback speed as well as allowing the video to ’float on top’ of other documents or windows open on your screen. VLC also has some editing functions, such as saving a snapshot image of the video and creating effects to blur or distort the video images (this can help to anonymise, though you might lose crucial details, such as eye gaze direction or facial gestures). For the Mac, QuickTime also allows you to play a section of video on a loop. As noted above, this is really useful for transcribing short sections at a time. VLC can also allow you to convert your files into different formats. For example, by converting your video file into an audio-only file (such as a .wav format) you can open this file in audio software such as Audacity (free software). Audacity is a very useful tool, enabling you to listen back to the audio at different speeds, on loop-play, measure the length of pauses (by highlighting the timeline gap between waveforms, i.e., between the end of one sound and the start of the next), and create short audio clips for saving separately.
Being able to playback your recorded data — so that you can transcribe and analyse it — is the basic requirement, but being able to edit your data will be your next step for organising and working with your data. For example, you might want to create short sections of video or audio, so that you can save these separately to compile a corpus and use for presentations or data sessions. You might also want to select a single frame or screen shot from your video to use as an image. Two versions — Apple’s iMovie and Windows movie maker — that are typically part of basic software packages on computers and laptops are both user-friendly and offer many features that will do most of the things that are needed for simple video editing and exporting. For instance, they can allow you to cut out sections of the video, apply blurring or edge-detection (to create a line-drawing effect) and merge two videos together (with one inset into the other, as in Extract 5.4 or seen side by side; this is very useful if you have used two video cameras for the same recording, from different angles). You can also add subtitles to the video and separate out the audio waveform. Another very useful tech gadget is to find software that will allow you to edit photo images, for use in presentations or publications. ToyViewer, for example, is available to download for free, and can alter the photograph so that only edges remain (using the contour tools). The effect is more like a line-drawing or cartoon, which can help to anonymise the participants while preserving the facial features and embodied gestures. There are other video editing software that can be used, often designed for professionals (such as Apple’s Final Cut Pro and Adobe Premiere), though these can be expensive and will require some training time. Similarly, image editing tools such as Photoshop, Comic Life and Snagit can provide you with hours of fun in preparing still images for presentations or publications, but don’t let that get in the way of your analysis. A bad analysis cannot be saved by pretty pictures.
The basic software that you need to transcribe are the playback software (see above) and a word document. This is transcription at its simplest, and will suffice for most and many occasions. I have transcribed this way for many years and find it suits me well. All I need is the word document open in half of my computer screen, and the audio or video playback software open in the other half (or ’floating on top’ of the word document). There are other forms of software, however, that can do this and a bit more. For example, Express Scribe, Audio Notetaker or Inqscribe software enable you to incorporate audio or video files into the software, and transcribe alongside the playback. They have free and paid versions — you pay a little more for extra features, such as using a foot pedal to allow you to pause and rewind playback of the recording while still typing continuously — so that you can try out the free version to see how it works. A particularly useful option here is being able to add ’time stamps’ to your transcript, so that you can pinpoint exactly where in the audio or video that part of the transcript occurs. Inqscribe also allows you to export subtitled QuickTime movies, using your transcript as the subtitles, which can then be played on any playback software. If you do decide to use transcription software, remember to check how to export your transcripts into other documents — you may not be able to add Jefferson transcription symbols within the transcript software, for instance — so that you can easily edit and transfer across different operating systems (e.g., Windows or Apple) or import into other software packages (such as MAXQDA or ATLAS.ti) without losing any transcription features or timestamps.
Voice recognition software
For some years now, the allure of voice recognition software has beckoned to those who dread the thought of many hours of transcription. As technology improves, this may be a potential time-saver for those with large data sets, although it does require investment of time and money to use its full potential. The most prominent voice recognition software at the moment is Dragon, and it works by being trained to recognise one person’s voice. This means that you will not be able to simply download your video or audio files and let it transcribe. You may have to ’re-voice’ them first so that all the words spoken are in your voice (which you will need to train the software to recognise). It will provide a words-only transcript, but if trained correctly and using good quality headsets (for a clearly audible voice recording), it may be an option for those seeking a long-term solution.
There are software packages that can assist the process of coding and organising your data to make it easier to find sections from your transcripts and fragments from your audio or video files. Many of these must be purchased, but there are usually reasonable discounted rates for those in education (whether students or tutors) so they can be a useful investment if you are going to be working with recorded data for more than one project. The main ones that are currently available are ATLAS.ti, MAXQDA and NVivo. As with the transcription software noted above, these can allow you to watch (or listen to) the recording alongside the transcribed file, add in timestamps and cut short sections. They do much more than this. You can add codes and labels to organise your data and identify repeating patterns or areas of transcript/recording that are related to a particular topic or aspect of interaction. Synchronising your transcript with your recording also helps not only with locating sections of data, but also with improving your analysis by ensuring that you ’stay close to’ the original recording and do not rely on just the transcript. Note that none of these software packages will do the coding for you, but with a little practice they will be a very useful way of organising, searching and exporting your data.
· Transcription is a theoretical and analytical as well as a practical process.
· You do not have to transcribe all of your data if you have a large volume of data.
· Focus on writing down the words only, then build up by adding phonetic/intonation details, then visual details.
· You can begin to code your data before you transcribe to Jefferson level.
· Coding involves searching and sorting your data for instances of particular features; it provides a way of selecting out sections of data for close analysis.
· Coding can occur multiple times in the transcription and analysis processes.
· Many software packages are freely downloadable and can assist you in the transcription and coding stages of analysis.
Paulus, T. M., Lester, J. N. & Dempster, P. G. (2014). Digital tools for qualitative research. London: Sage.
Rapley, T. (2007). Doing conversation, discourse and document analysis. London: Sage. Particularly Chapter 5.