Report on the MYTILENE Workshop (Mytilini, Greece, on 7-9 May 2018.)

I Introduction

In this section we will give a general overview of how the workshop was organised and what topics have been discussed.

Context

The workshop was given in the context of the ECML – EU RELANG project: Relating language examinations to the common European reference levels of language proficiency: promoting quality assurance in education and facilitating mobility. It was requested by the regional education directorate of the North Aegean.

Aims and intended results of the workshop

In the application for the RELANG workshop the following aims were defined:

Enable participants to deepen their knowledge of CEFR descriptors/levels and their illustrative samples
Enable participants to:
- get practical guidelines and revise tasks based on the classroom assessment materials provided by the Greek participants;
- understand the CEFR model of language use;
- develop valid items related to the CEFR to be used in the exams.

Specific results were formulated as follows:

Development of assessment related to the CEFR for migrants and children;
Creation of a forum of discussion for participants from various professional language areas;
Collaboration on aspects of teaching/learning and testing related to the CEFR;
Discussion on samples of tasks related to the CEFR;
Sharing of ideas and the formation of working groups; participants would be able to work as multipliers in their schools, institutions and across the country.

The feasibility of the intended goals/results was described as follows:

Valid tests for testing English, French, German and Italian at A2 level;
Valid tests for testing English at B1 level;
Valid tests for testing English at B2 level;
The WS activities would first link learning outcomes to relevant CEFR descriptors;
The WS would then focus on developing testing tasks that tap the expected learning outcomes/CEFR descriptors.

On day one Dr Georgios Angelopoulos, Secretary General at the Ministry of Education gave a welcoming speech in which he emphasized on the importance of languages in the learning of any skill, followed by Dr Aristidis Kalargalis, Director of the Regional Education Directorate of the North Aegean. The local organiser Dr Agapios Oikonomidis then welcomed the participants.

Ms Sarah Breslin, director of the European Centre for Modern Languages (ECML) in Graz then addressed the participants in a video message. She gave an overview of all the activities at the Centre involving “languages at the heart of learning”.

Participants

The number of participants was about 30, among them: academics, school advisers, curriculum developers, foreign language teachers and test developers involved in the development and assessment of school-based tests. The focus was on developments in the North Aegean.

The consultants were impressed by participants’ lively interest and their wish to improve the existing tests and examinations. Participants were eager to share their experiences in the construction of language examinations. Some documents used in the workshop were made available in the Greek language so that the workshop could address construction and linking issues in a Greek/North-Aegean context.

Continued support by the ECML

Judging from their responses, the participants in the workshop have much appreciated the kind offer of training and consultancy by the ECML. The participants said they really enjoyed the three-day workshop. The participants showed a clear interest in continuing the cooperation in the development of valid rating scales for speaking and writing. For a validation of existing tests and the successful completion of the linking process of the examinations to the CEFR, the continuation of ECML support would seem to be desirable.

II Issues in the validity and reliability of the present curricula and examinations

In this section of the report we will present some of the topics that were discussed during the workshop.

Linking Ministerial Test Instructions (Article 17) to the CEFR

As was pointed out above, one of the aims of the RELANG workshop was to increase the understanding of the links between CEFR and its implementation in the classroom. In the Ministerial Instructions specific reference is made to the CEFR levels (A1-B2), rather than to the actual or exact descriptors on which these levels are based. In the case of the curricular goals for foreign languages, alignment to the CEFR would help to facilitate mobility within the EU. The RELANG cooperation between the Council of Europe and the EU Commission is aimed at this: it is to promote quality assurance in education and facilitate mobility.

There is a certain risk in setting CEFR levels as goals/requirements if they do not (or only partially) match the corresponding CEFR descriptors. The variety of tapped descriptors in tests and examinations was (thus) rather limited, impacting on the validity of claims that a test or examination is at a specific CEFR level.

Foreign languages taught and assessed at different levels

In most countries, the level actually reached in the teaching of foreign languages is not usually the same for each language. Thus in Greece the level reached for English is expected to be higher than that for such languages as French, German or Italian. The consultants shared a concern with some of the participants (those responsible for foreign languages other than English) that given the difference in the expected attainment levels the teaching will be at a lower level. It would be desirable that the teaching of foreign languages other than English was supported by extra resources and hours of teaching.

III Test and Item Construction and the CEFR

During the workshop general principles of item construction were discussed and sample items were produced by the participants. It was found that some tasks and items were formulated better/clearer than others. An overview of preferable item formats was given by the consultants. In the appendix to this report more specific remarks on item formats will be made. The consultants referred to the ALTE Manual for Language Test Development and Examining; For use with the CEFR and the ALTE materials for the guidance of test items writers, both documents downloadable at: http://www/alte.org. Another useful source can be found at:

http://www.alte.org/attachments/pdfs/files/alte_munich_2012_innovative_approches_of_the_survey_project_qoyy5.pdf

The consultants were pleased to find that the writing of items and the development of an examination is done in teams. Decisions on the final version of an item and the inclusion of items are group decisions rather than the decision of one person.

For the items to be linked to the CEFR, one of the first steps would have to be to make clear which CEFR descriptors in the CEFR are operationalized in the examinations. On the basis of the way some tasks were phrased and contextualized, it was to be wondered if such tasks and examinations containing such tasks would reflect the CEFR action-oriented approach. Elements in the CEFR approach are:

Actions performed by persons – individuals and social agents;
A range of competences, both general and in particular communicative language competences;
Various contexts under various conditions and constraints to engage in language activities;
Language processes to produce and/or receive texts in relation to themes in specific domains.

At the moment, in the existing tests and examinations sections on the receptive skills and on the productive skills candidates are sometimes to understand texts or are asked to produce texts without sufficient relevant context given in which they were to use the foreign language. In the Writing sections e-mailing and texting are now the more authentic types of communication. It is true, though, that such informal communication will be difficult to assess, especially in a paper-and pencil examination in a valid way. One potential solution might be changing the criteria for assessment in a way as to reflect the real-life characteristics of such forms of communication. For instance, grammaticality may be given much less emphasis, and certain aspects of organization, such as paragraphing may not be included.

The choice of relevant CEFR descriptors

During the workshop it appeared that in the Reading sections of the examinations discussed only a limited number of CEFR descriptors is tapped. The focus seemed to be on Reading for detail. It was suggested that Reading for gist would also be appropriate. Also, some of the texts used to tap understanding detail hardly lent themselves to reading for detail, if only because they were probably not written for that purpose. The consultants therefore argued to operationalize more relevant descriptors and to choose texts that would be more relevant and valid for this purpose.

Thus the consultants have advised to a variety of types of texts in the Reading section with different item types tapping different descriptors.

Language Constructs

Even if the examination in each section reviewed stated what was being tested, it was not always clear which language constructs were being tested. Candidates were exposed to language without it always being clear why candidates should read such texts. In some items there was not enough of a focus on what the item wanted to know from the candidates. This may have resulted in unnecessary loss of candidates’ testing time.

Authenticity

The consultants’ were pleased to find that the choice of texts was in line with the interests of the intended age-group. However, reading texts supposedly taken from a diary would not seem to be too authentic. Also, in real life one is not generally supposed to read personal diaries written by others.

The consultants have advised test developers to continue to select authentic materials available in the modern media, to imagine how and when such texts might be of interest to the target population and to present the texts in a meaningful context. It was felt that this approach will also cause a positive washback effect on the teaching of foreign languages. Using authentic texts does not mean that original texts can only be used without any changes. Such changes are permissible if the main content, purpose etc. of the text is maintained. However, it advisable to then indicate that the text has been adapted (e.g. Adapted from an article retrieved from www.ustoday.com).

Fairness and ethical concerns

The consultants are convinced that the examinations are being organized in such a way that candidates have a fair chance of being able to show their true competences in the tests in the various foreign languages. Yet, care must be taken that if the examination (specifically in the Writing section) requires candidates to provide arguments for an opinion, or be asked to produce a piece of creative writing, it is not the creativity or arguments themselves that need to be assessed but the candidates’ wording of them. Candidates should not be placed in the position that they have to write (or speak) about something that they personally may not identify with. Thus, controversial (and trivial!) subjects should be avoided. Candidates should be given a choice of at least three subjects to write about and be helped by the provision of key-words to write about.

Also, there is an issue in that occasionally candidates’ handwriting is partly or entirely illegible. As this may cause rater effects/error in such circumstances, the use of open-ended questions and tasks is to be avoided if possible.

The Consultants have expressed some concern about the replacement of locally produced national examinations by examinations produced by international testing organisations. Such a replacement would need to be in line with the curriculum: what is tested should be part of what is taught during regular school hours. Also access to these international examinations should be equal for all students, irrespective of their social and/or financial background.

Number of items

In the tests and examinations discussed we count a total of 15-20 items in the Reading section. Other tasks in this section were tests of grammar and vocabulary rather than of Reading. For a valid estimate of a CEFR level in any skill, the consultants have advised to use at least 20—25 items with (preferably) 60-75 MCh options per skill (Reading proper and Listening ). Two-thirds of the items in the Reading section were matching items. Possibly a more even distribution with traditional Multiple Choice items is preferably certainly when more (shorter) texts with just a few questions are used. The fourth task in the Reading section was considered by all participants to test word formation and grammar rather than Reading.

Testing linguistic competences

In the present language examinations there are sections in which the knowledge of structures and vocabulary is assessed. Even though the CEFR acknowledges that linguistic competences play a role in its action-oriented model of language use, it is to be wondered why such competences should be tested discreetly and not in the context of the assessment of Reading, Listening, Writing and Speaking. Such assessment would seem to be more valid and more in line with an action-oriented approach. If gap-fill was to be used there could still be some focus on tenses, word forms and vocabulary etc. The advantage of such a shift would be that the Reading section would contain at least four texts with some 40 items. An assessment of the reading proficiency of a candidate would then be much more reliable and valid and it would thus be easier to link such an assessment to the CEFR. On the other hand, such a move would potentially upset the weighting of the skill-specific components of the exam. Thus, procedures would need to be devised to guarantee that the intended weighting of skills is preserved.

Choice of item formats

For reasons of objectivity, reliability and efficiency the use of multiple choice (MCh) items in reading tests is indeed a valid option. The choice of one of the four MCh formats that are used in the present reading sections may need to be reconsidered. In the testing literature the format of True/False/Not Mentioned has been shown to cause validity issues:

the statement that something is true may be strained: the option usually is a summary of what is said in the text and may thus be perceived as not completely true, i.e. false (reliability);
the statement that something is false can likewise be strained. In other types of MCh items candidates are asked to identify the correct option (reliability);
the statement that something is not mentioned may coincide with the statement that something is true or is false, especially when something is not mentioned literally but implied;
the efficiency of the T/F two-option format is doubtful owing to potential reliability problems: we need more items in a test than with items with 3 or 4 options (efficiency).

Another observation on item formats used in the present examinations is that there is a danger that candidates can find a correct answer through test-wiseness rather than through understanding a text. This phenomenon may occur when distractors in the item are too obviously wrong or the correct answer is too obviously right (through echoes in the phrasing of the option). We refer to the appendix for advice on how to avoid this phenomenon.

Item difficulty

The Consultants advised against introducing very difficult items in a test. Apart from the fact that it is difficult to predict how difficult an item will be in the live test, the focus should be on the desired CEFR level, possibly on the higher, so-called plus-levels.

Benchmarking

In the course of the consultancy Benchmarking was mentioned as an issue to be addressed in the production of examinations in the near future. Performance samples for Writing and Speaking can be collected from past papers, which can then be used for benchmarking purposes. To demonstrate the actual process of benchmarking, standardized writing samples were judged by participants. Participants attempted to justify their arguments on the basis of CEFR descriptors.

It is important to realize that for a benchmark to be set at a particular level we first need to look at the level of the task. A panel of judges will first need to decide if the task is at the required CEFR level. If the task is judged to be below the required level, a performance cannot be judged to be at the required level. It needs to be noted, however, that it is sometimes possible for candidates to perform a task at various levels. It is often possible to use more complex language or elaborate on a topic in a more detailed or more sophisticated manner. Thus, when judging the level of tasks, a range of levels may need to be considered.

To demonstrate the process of benchmarking, the Writing tasks and performances from Lithuanian A2/B1 English examinations were used. During this benchmarking session the following observations were made:

It is important to reconsider ratings/judgements when an equal number of points is given for both a A2 and a B1 performance.
Panel members should be encouraged to discuss agreements and disagreements.

The consultants pointed out that Benchmarking was now a well-known procedure in many European countries. It was suggested that the school advisers would cooperate in creating benchmarks for their own tests and examinations, or creating benchmarks that would work like those produced by the Council of Europe and be country-independent. The consultants will take up this point with the ECML with the idea that Benchmarking sessions will be organized under the aegis of the RELANG project.

Standard-setting procedures

As with Benchmarking, Standard setting needs to be addressed in the context of linking to the CEFR (pass/fail standards are laid down in the law). During the workshop it did not become clear how standards are arrived at in the Greek examinations.

General principles of Standard setting including the differentiation between test-centred and candidate-centred methods are presented in the (highlights from the) Manual for linking tests and examinations to the CEFR. The consultants have emphasized the role of empirical data in setting standards, as judges have repeatedly been shown to occasionally hold opinions inconsistent with empirical results. The consultants would like to refer to a number of case studies on standard-setting, some in the context of linking examinations to the CEFR.

To demonstrate the process of standard setting, the reading tasks from a Β’ γυμνασίου (8^th grade)

English examination were used. Participants tried their hand at one specific method of standard setting: the Basket procedure. Participants had earlier solved the reading tasks. Subsequently, they followed the steps of the Basket procedure to set a standard for the task individually. The results of the procedure were then compared and discussed. As a result, participants had the opportunity to experience the difficulties both in arriving at a decision as well as in providing CEFR-based arguments for such decisions.

IV Conclusions and Recommendations for Further Action

The consultants were very pleased to find that the local organiser, Dr Agapios Oikominidis, had done everything to make the workshop a success. Participants were knowledgeable and motivated. The consultants feel that the general aims set for this workshop have been met. Participants have been made aware of the fundamental differences between the CEFR levels. Given the fact that the exams need to be linked to the CEFR, the importance of constructing items in accordance with the CEFR descriptors has been recognized by participants. The consultants have provided suggestions on how to improve the existing tests and examinations. Lastly, the consultants have made proposals on how to apply the CEFR model of language in the development of valid and reliable language tests.

As to the specific results expected from the workshop, the consultants feel that these results have been reached. An overall awareness of CEFR and its use in the classroom CEFR has been achieved, judging from the input of the participants. The desired relationship between items and descriptors has been central in this workshop. Discussions have been held on how to explore and apply the CEFR and to make use of it in the classroom, also in relation to Ministerial Instructions.

As to the development of valid foreign language examinations linked to the CEFR, the following actions are recommended:

Reconsider the format and the content of some sections of the examinations and their items;
Increase the number of items and/or options in the Reading (and Listening) sections for more valid and reliable results;
Operationalize a wider choice of CEFR descriptors.

The ECML consultants have indicated that they will be pleased to continue to guide the Ministry and the Innove Foundation during the above stages in the development of valid examinations linked to the CEFR.

Σχετικά με ΟΙΚΟΝΟΜΙΔΗΣ ΑΓΑΠΙΟΣ

Πρόσφατα άρθρα

Kατηγορίες

Άρθρα

Χρήσιμοι σύνδεσμοι