Frequently Asked Questions (FAQ)

Already read the FAQ? Click Here To Go Directly To Graphical Reports

Guide to Use and Interpretation of Course Evaluation Reports | History of Instrument | The Instrument
New Report Functions | Z-Scores | Report Displays

Guide to the Use and Interpretation of Course Evaluation Reports

Stetson’s mission to provide a transformational education means that effective teaching is prized highly. Indeed, under our current tenure and promotion and annual evaluation guidelines, teaching accounts for 60% of our evaluation. Therefore, we long ago began various practices that attempt to evaluate teaching in order to promote faculty development. Understanding that all forms of evaluation of teaching seems always incomplete, as a faculty we agreed to include student evaluation of our courses among the information provided to faculty, department chairs, and deans. The system for evaluating teaching at Stetson University has recently undergone several changes that have resulted in a robust system, which now relies upon multiple methods and sources of data. The use of multiple evaluation methods helps ensure that an evaluation process is reliable, valid, and fair. At Stetson, these methods include self-evaluation, peer observation, and student course evaluations. Because each of these methods provides a different perspective on teaching, it is vitally important that all data be considered during faculty evaluation. Since Fall 2012, the Course Evaluation Task Force--Bob Boozer (SoBA), Dwaine Cochran (Psychology), Michael Rickman (SoM), Mercedes Tichenor (Education) and Alicia Schultheis (Biology and CERTF Chair) began development of a system for creating uniform reports of student course evaluations. These reports are intended to provide one form of evidence for evaluating teaching during the tenure and promotion and annual merit review processes. The guidelines below are intended to help faculty, department chairs, and academic deans interpret these reports.

Beginning with the 2014-2015 academic year, these reports will be required evidence in annual reviews and reviews for tenure and promotion. They do not replace the standard frequency reports and student comments currently available on the IOTA site. Rather, both types of reports should be included. CERTF met with the University Tenure and Promotion committee, which has clarified that other types of evidence are still admissible. That is, faculty are encouraged to make their best case regarding progress towards meeting standards for tenure and promotion and during annual merit reviews. However, because these reports are new types of evidence, both the tenure and promotion policy and FAR form/instructions should be edited to make clear to faculty that these reports are now required evidence.

At all times, keep in mind that authentic and fair evaluation of teaching should rely upon multiple forms of evidence. Thus, the IOTA-generated reports should always be considered in conjunction with other evidence, namely peer observations and self-evaluations (FARs and narratives).

Because the size of the typical class at Stetson is small enough that an instructor’s average for a given course can be highly influenced by a few students, it is critically important to consider course ratings over time and to not place too much emphasis on results of single course offerings (particularly those with <20) students). Trends across multiple offerings are a much more appropriate way to evaluate such courses.

Prior to Fall 2013, results from item 6 (The general workload was appropriate for this level course) should be disregarded. Note: Effective Fall 2013, this item was changed to: The class workload was rigorous.

Student evaluation scores for a specific course must be understood in relation to the scores of other similar classes. For instance, the evaluation scores of one section of ENGL 101, of which there are 20 or more sections each semester, should be understood in relation to the average of all ENGL 101 sections that term. We do this by using what statisticians call “z-scores,” which is way of determining the importance of the difference between a given score and the average. If the average for all similar classes is, for instance, 4.0, and an instructor receives a 4.2 in her course, the z-score will tell us if that 4.2 is a noteworthy difference. However, z-scores should be interpreted in the appropriate context:

Regardless of the z-score, faculty, department chairs, and deans should keep in mind the absolute value of the ratings themselves, rather than the z-score. For example, ratings of 4.0 and above indicate that, on average, students ‘Agreed’ with an item. Ratings of 3.0 and above indicate that the students neither agree nor disagree with the item. Thus, a faculty member with average ratings near 4.0 but with z-scores near or below -0.5, may not have cause for concern.

As always, student comments should be read carefully for explanations of why an instructor received a particular rating (e.g., Comments such as: “The professor has high standards but she was willing to work with me to succeed” validate that a high score indicates effective teaching).

The course sample size should be kept in mind. Small courses are subject to high variance because a few disgruntled (or thrilled) students can have a profound effect on the mean for the course-in a small course it only takes 1 or 2 disgruntled students to result in a low average (and thus a z-score at or below -0.5).

Course mean ratings with a z-score between +0.5 and -0 .5 are considered satisfactory and partial evidence for meeting the standards for tenure and promotion. This range indicates 0.5 standard deviations above and below the mean. We used this delineation to analyze one semester of data and found that approximately 80% of Stetson courses fell within this range.

Z-scores > +0 .5 may be considered exemplary. Using one semester of data, we found that ~ 10% of course ratings fall above +0.5. Thus ~90% of course ratings are expected to be satisfactory or exemplary. Z-scores lower than -0.5 may indicate the need for improvement; only ~10% of courses are expected to fall within this range.

History of Instrument

Why was this course evaluation instrument and reporting system developed?

As outlined in Stetson’s Tenure and Promotion policy, student course evaluations are an important means of showing progress towards meeting the standards for excellence in teaching. After the tenure and promotion policy was revised in 2010, it was clear that Stetson’s existing course evaluation instrument did not contain questions relevant to all of the new teaching standards. Thus, in 2011, the Course Evaluation Revision Task Force (CERTF) was charged with revising the existing evaluation instrument and reporting process to better align with the new tenure and promotion policy. Shortly thereafter, the Merit Review Task Force recommended that the new course evaluation instrument be used in the annual merit review process. Thus, we have approached the reporting function with a dual function in mind: to provide partial evidence towards meeting the standards for teaching excellence for tenure and promotion purposes and to provide a comprehensive, yet time efficient method for annual merit reviews.

How was the instrument developed?

First, we searched the literature to find aspects of teaching or ‘dimensions’ that aligned with the descriptions of the teaching standards (short and long forms) found in Stetson’s tenure and promotion policy. To be selected, dimensions also had to be correlated with student achievement and overall evaluation (i.e., good measures of teaching performance in other contexts). Five dimensions were chosen: 1) Organization and Clarity, 2) Feedback, Grades and Grading, 3) Workload, 4) Teacher Effectiveness, and 5) Accessibility. Dimensions were aligned to the T&P standards as shown in Table 1.

Table 1: Alignment of Standards for Teaching Excellence and Related Dimensions. Bolded standards are associated with dimensions.

Tenure and Promotion to Associate
Standard	Dimension(s)
Command of subject matter	Not assessed via student course evaluations
Organization	Organization and clarity Feedback, grades, and grading
Rigor	Intellectual Challenge/Workload
Maturity	Cannot be assessed by single student(s) in single course Student course evaluations over time
Engagement	Teaching Effectiveness Accessibility

Promotion to Professor

Standard Dimension(s)

Command of subject matter Same as above

Organization Same as above

Rigor Same as above

Maturity Same as above

Engagement Same as above

Impact Teaching Effectiveness

Items related to chosen dimensions (Table 1) were identified from the literature and existing instruments. We chose this approach because using questions that have already been analyzed for validity and reliability in some context increases the likelihood of having a valid and reliable instrument.

Has the instrument been psychometrically evaluated?

Yes. A pilot study was conducted during two terms: Spring and Summer 2012. Analysis of these data indicated construct validity of the five dimensions. We analyzed the relative weight of each of the dimensions. After the pilot, we removed several questions so that there are a total of 15 standard items on the instrument (3 per dimension). Optional questions are part of the instrument but are not included in this reporting system (because they do not appear on all evaluations).

How did the process for evaluating courses change?

In 2012, the University contracted with an outside vendor, IOTA Solutions to administer online evaluations and generate reports for faculty and administrative uses.

The Instrument

Which items are included on the standard instrument?

The standard 15 items and the dimension to which they belong are shown below (dimension names in bold):

Organization and Clarity

The course material was presented in a clear manner.
The course was organized effectively.
The instructor was prepared for each class.

Intellectual Challenge/ Workload

The general workload was appropriate for this level course. (Note, effective Fall 2013, this item was changed to: The class workload was rigorous.)
The course challenged me to think deeply about the subject matter.
This course demanded intellectual effort.

Teaching Effectiveness

I am better able to communicate about the course subject matter as a result of this course.
I learned a lot in this course.
The instructor advanced my knowledge of course content.

Feedback, Grades & Grading

The instructor explained how grades are determined in this course.
Students' work was graded in a reasonable amount of time.
The instructor made helpful comments on exams, assignments, and other performance measures.

Accessibility

The instructor was accessible to individual students.
The instructor welcomed students seeking help with the course.
The instructor was willing to meet with students outside of class.

New Report Functions

What does the new reporting function include?

Reports will still include the standard data, i.e., item-level responses (frequencies, means and standard deviations) as well as student comments. Reports now also include graphs and tables that allow faculty and administrators to easily evaluate results in a proper comparative framework, without reducing the complexity of the instrument to a single global score-a purpose for which this instrument was explicitly NOT designed.

What is the new ‘comparative framework’?

It is a system which relies on a statistical measure called a ‘z-score’. Z-scores are used to compare individual courses with the average rating of all courses in their comparison group. At Stetson, the comparison groups were determined by the Provost and Deans or Associate Deans. In the case of Business, Education, Humanities, Natural Sciences, and Social Sciences, pre-existing comparative frameworks were in place and duplicated here. For example, in Business, courses are compared within Departments and across the School. In the Humanities, courses are compared to other Humanities courses of the same level (e.g., all 100-level courses, etc.). Courses in Music are compared by Course Type (e.g., large ensemble compared to other large ensemble courses, etc.) Reports also show how course ratings change over time (a period of six years will be shown when data allows).

How do I know which comparative group is being used for my course?

It is automatically selected when you choose a course. When you look at the table headings, you will see the name of the comparison group that is being used. The only exception is for FSEM, JSEM, and HON courses, which should be compared both to similar course types (i.e., other FSEMs) as well as within the disciplinary grouping (i.e., all 100 level Philosophy courses). In the latter case, you will have to select ‘Humanities’ as the disciplinary grouping.

Can I change my comparative group?

This type of change would require the approval of your Dean in consultation with the Provost.

Are all courses included in the assessment?

Courses with low enrollment (sample sizes smaller than 3) are not included in the reports but will continue to be evaluated as future reporting possibilities are anticipated. Also, no reports are generated for summer courses because they fall outside the annual contract period and are thus not evaluated for tenure and promotion purposes.

Z-Scores

What does a Z-score indicate about my performance?

A Z-score of zero indicates that your course mean is equal to the mean of your particular comparison group. A Z-score = 1.0 indicates that your course mean is one standard deviation above the mean of all such courses. If Z is a negative number, your course mean is below the mean of all courses.

I have no idea what all this statistical jargon means…can you explain a Z-score?

First, let’s review some of the other statistical terms used in this document:

A mean is the sum of all responses to an item for that course divided by the number of respondents.
The standard deviation is a measure that shows the typical amount individual ratings differ from the mean. The standard deviation is an important part of Z-scores which are used to make comparisons by including information of both the average and the variability of ratings.

Ok, now that we’ve cleared those up, let’s tackle the Z-score:

The Z-score is a measure of performance relative to others teaching similar courses. It measures how much your course mean differs from the mean of ratings of all similar courses in that disciplinary grouping. It uses standard deviation to show how your difference compares with the typical difference found with similar courses. Thus, the Z-score indicates the number of standard deviations that your course mean differs from the mean of ratings from all similar courses.

How should Z-scores be interpreted?

Course mean ratings are considered satisfactory and partial evidence for meeting the standards for tenure and promotion if they have a Z score between +0.5 and -0 .5. This range indicates 0.5 standard deviations above and below the mean. We used this delineation to analyze one semester of data and found that approximately 80% of Stetson courses fell within this range and that ~ 10% of Z-scores were > +0 .5; thus, these may be considered exemplary. Thus, using this system, we expect that ~ 90% of course ratings will be considered satisfactory or exemplary. Z-scores lower than -0.5 may indicate a need for improvement; only ~10% of courses are expected to fall within this range.

However, it is vital to note ratings from courses with less than 20 students are subject to high variability. For the many Stetson courses with <20 students, it is important to consider performance over time and to not over-interpret results from a single course offering.

Report Displays

What will I see in these reports?

Reports contain three major types of displays:

The first display is a table showing the course mean and standard deviation for each of the five dimensions. This table also contains the means and standard deviations from that disciplinary grouping which are used to calculate the Z scores for each dimension. Here is an example:

The second display is a graph in which bars show how far (in Z-score units) each dimension’s mean rating is above or below the comparison mean. Here is an example:
The third display is graphical and shows Z-scores for each dimension for each term. This graph is important for assessing performance in course ratings over time. Here is an example:

For the School of Business, one additional report is available that shows the mean, standard deviation and Z-score information from all courses of an instructor compared with all courses in the Business School that semester.

Click Here To Go To Graphical Reports