General Responsibilities of Ethical Practices
1. Protect the health and safety of all participants
2. Be knowledgeable about and behave in compliance with state and federal laws relevant to the conduct of professional activities
3. Maintain and improve their professional competence in educational assessment
4. Provide assessment services only in areas of their competence and experience, affording full disclosure of their professional qualifications
5. Promote the understanding of sound assessment practices in educations
6. Adhere to the highest standards of conduct and promote professionally responsible conduct with educational institutions and agencies that provide educational services
7. Perform all professional responsibilities with honesty, integrity, due care, and fairness.
No Child Left Behind (NCLB) 2002
is a government mandate to schools and states to have ALL children up to grade level with their peers. It set unrealistic goals and penalized schools that did not make Adequate Yearly Progress (AYP). The act also required all teachers become "highly qualified". With these new benchmarks and changes to the ways states' addressed education, came increased accountability. Failure could mean a decrease in funding or dissolution of a school/district. Achievement was linked to standardized testing done grades 3-8 and at least one year during high school. Tied achievement to annual standardized tests. These tests are administered to grades 3-8 and at least one year between 9-12th grades.
Elementary and Secondary Education Act 1994 (ESEA). Reauthorized
Challenging standards were set for student achievement and to develop and administered to "all" students and required that all schools make "Adequate Yearly Practice". It also included "special needs students" in the definition of all students. States were required to set challenging standards for student achievement, and develop and administer assessments to measure student progress towards those standards. Federal laws such as ESEA and IDEA can be seen as legislated attempts to 'raise the bar.'
Instruction is most effective when
1. Directed toward a clearly defined set of intended learning outcomes.
2. The methods and materials of instruction are congruent with the outcomes to be achieved.
3. The instruction is designed to fit the characteristics and needs of the students.
4. Instructional decisions are based on information that is meaningful, dependable, and relevant.
5. Students are periodically informed concerning their learning progress.
6. Remediation is provided for students not achieving the intended learning.
7. Instructional effectiveness is periodically reviewed and the intended learning outcomes and instruction modified as needed.
Assessment is most effective when
1. Designed to assess a clearly defined set of intended learning outcomes.
2. The nature and function of the assessments are congruent with the outcomes to be assessed.
3. The assessments are designed to fit the relevant student characteristics and are fair to everyone.
4. Assessments provide information that is meaningful, dependable, and relevant.
5. Provision is made for giving the students early feedback of assessment results.
6. Specific learning weaknesses are revealed by the assessment results.
7. Assessment results provide information useful for evaluating the appropriateness of the objectives, the methods, and the materials of instruction.
Authentic Assessments
A title for performance assessments that stresses the importance of focusing on the application of understandings and skills to real problems in real-world; contextual settings.
Achievement Assessment
achievement assessment is a broad category that includes all of the various methods for determining the extent to which students are achieving the intended learning outcomes of instruction
Alternative Assessments
A title for performance assessments that emphasizes that these assessment methods provide an alternative to traditional paper-and-pencil testing.
Content standards
describe what students should know and be able to do at the end of a specified period of learning (e.g., a grade or series of grades). They provide a framework for curriculum development, instruction, and the assessment of student achievement. Various professional organizations have also developed sets of content standards in their particular subject areas. It is hoped that the use of such standards will raise achievement expectations, increase the quality of public education, provide a better informed citizenship, and make the country more competitive with other countries.
Placement Assessment
(measures entry behavior)
To determine student performance at the beginning of instruction
Example: Unit Pre-test
Performance-Based Tasks
may also be useful for determining entry skills. In the area of writing, for example, obtaining writing samples at the beginning of instruction can establish a base for later assessments of progress. This type of preassessment would be especially valuable if portfolios of student work were to be maintained during the instruction.
Formative Assessments
(monitors learning progress)
To monitor learning progress during instruction
Example: End of lesson quiz
Diagnostic Assessment
(identifies causes of learning problems)
To diagnose learning difficulties during instruction
Example: Test of math computational skills necessary for learning math
Bloom's taxonomy
6. Creating- Putting information together in an innovative way.
5. Evaluating- Making judgments based on a set of guidelines.
4. Analysis- Breaking the concept into parts and understanding
3. Applying- Use knowledge gained in new ways.how each part is related to one another.
2. Understanding- making sense of what you have learned.
1. Remembering- recalling relevant knowledge from long term memory.
Teachers' Standards for Student Assessment
1. Teachers should be skilled in choosing assessment methods appropriate for instructional decisions. Skill in choosing appropriate, useful, administratively convenient, technically adequate, and fair assessment methods are prerequisite to good use of information to support instructional decisions.
2. Teachers should be skilled in developing assessment methods appropriate for instructional decisions. While teachers often use published or other external assessment tools, the bulk of the assessment information they use for decision making comes from approaches they create and implement.
3. The teacher should be skilled in administering, scoring, and interpreting the results of both externally produced and teacher-produced assessment methods. It is not enough that teachers are able to select and develop good assessment methods; they must also be able to apply them properly.
4. Teachers should be skilled in using assessment results when making decisions about individual students, planning teaching, developing curriculum, and school improvement. Assessment results are used to make educational decisions at several levels: in the classroom about students, in the community about a school and a school district, and in society, generally, about the purposes and outcomes of the educational enterprise. Teachers play a vital role when participating in decision making at each of these levels and must be able to use assessment results effectively.
5. Teachers should be skilled in developing valid pupil grading procedures that use pupil assessments. Grading students is an important part of professional practice for teachers. Grading is defined as indicating both a student's level of performance and a teacher's valuing of that performance. The principles for using assessments to obtain valid grades are known and teachers should employ them.
6. Teachers should be skilled in communicating assessment results to students, parents, other lay audiences, and other educators. Teachers must routinely report assessment results to students and to parents or guardians. In addition, they are frequently asked to report or to discuss assessment results with other educators and with diverse lay audiences. If the results are not communicated effectively, they may be misused or not used. To communicate effectively with others on matters of student assessment, teachers must be able to use assessment terminology appropriately and must be able to articulate the meaning, limitations, and implications of assessment results.
7. Teachers should be skilled in recognizing unethical, illegal, and otherwise inappropriate assessment methods and uses of assessment information. Fairness, the rights of all concerned, and professional ethical behavior must undergird all student assessment activities, from the initial planning for and gathering of information to the interpretation, use, and communication of the results.
Selected Response Tests
we can obtain a comprehensive coverage of a content domain, and can administer, score, and interpret it easily, but we sacrifice realism and some types of complexity (students are selecting the response: multiple choice, matching, and true/false items)
Performance Assessment
-high degree of realism
-high in complexity of the tasks we can assess
- time needed for assessment is frequently excessive and the evaluation of the performance is highly judgmental
The purpose of the assessment device is to direct the observation toward the most important elements of the performance and to provide a place to record the judgments.
assessments provide direct evidence of valued learning outcomes that cannot be adequately assessed by traditional paper-and-pencil testing, but they are time consuming to use and require greater use of judgment in scoring.
Assessments requiring students to demonstrate their achievement of understandings and skills by actually performing a task or set of tasks (e.g., writing a story, giving a speech, conducting an experiment, operating a machine).
Guidelines for Effective Student Assessment
1. Effective assessment requires a clear conception of all intended learning outcomes.
2. Effective assessment requires that a variety of assessment procedures be used.
3. Effective assessment requires that the instructional relevance of the procedures be considered.
4. Effective assessment requires an adequate sample of student performance.
5. Effective assessment requires that the procedures be fair to everyone
6. Effective assessment requires the specifications of criteria for judging successful performance.
7. Effective assessment requires feedback to students that emphasizes strengths of performance and weaknesses to be corrected.
8. Effective assessment must be supported by a comprehensive grading and reporting system
Domain-Referenced Interpretation
Assessment results are interpreted in terms of a relevant and clearly defined set of related tasks (called a domain). Meaning is similar to criterion-referenced interpretation but the term is less used, even though it is a more descriptive term.
Content-Referenced Interpretation
Essentially the same meaning as domain-referenced interpretation when the content domain is broadly defined to include tasks representing both content and process (i.e., reactions to the content). This term is declining in use and being replaced by criterion-referenced interpretation.
Objective-Referenced Interpretation
Assessment results are interpreted in terms of each specific objective that a set of test items represents. This is frequently called criterion-referenced interpretation, but the more limited designation is preferable where interpretation is limited to each separate objective.
Norm-Referenced Interpretation
Principal Use-Survey testing.
Major Emphasis-Measures individual differences in achievement.
Interpretation of Results-Compares performance to that of other individuals.
Content Coverage-Typically covers a broad area of achievement.
Nature of Test Plan-Table of specifications is commonly used.
Item Selection Procedures-Items are selected that provide maximum discrimination among individuals (to obtain a reliable ranking). Easy items are typically eliminated from the test.
Performance Standards-Level of performance is determined by relative position in some known group (e.g., ranks fifth in a group of 20).
according to relative position in some known group
a test or other type of assessment designed to provide a measure of performance that is interpretable in terms of an individual's relative standing in some known group.
Criterion-Referenced Interpretation
Principal Use-Mastery testing.
Major Emphasis-Describes tasks students can perform.
Interpretation of Results-Compares performance to a clearly specified achievement domain.
Content Coverage-Typically focuses on a limited set of learning tasks.
Nature of Test Plan-Detailed domain specifications are favored.
Item Selection Procedures-Includes all items needed to adequately describe performance. No attempt is made to alter item difficulty or to eliminate easy items to increase the spread of scores.
Performance Standards-Level of performance is commonly determined by absolute standards (e.g., demonstrates mastery by defining 90 percent of the technical terms).
according to a specified domain of clearly fined learning tasks
a test or other type of assessment designed to provide a measure of performance that is interpretable in terms of a clearly defined and delimited domain of learning tasks.
Other terms that are less often used but have meanings similar to criterion referenced:
standards based
objective referenced
content referenced
domain referenced
universe referenced.
Supply Response
are higher in realism and the complexity of tasks they can measure (e.g., ability to originate, integrate, and express ideas) than selected-response tests, but they are more time consuming to use and more difficult to score. (students are supply the response: short answer items and essay items)
Standard
a broad statement that describes what students should learn. Provide the framework for curriculum development.
Benchmark
Statements that follow a standard and clarify in broad terms what the standard means.
Instructional Objectives:
Specific statements that describe how students will demonstrate achievement. Instructional objectives describe intended learning outcomes. Instructional objectives clarify what standards and benchmarks mean.
Learning Outcomes:
Terms included in instructional objectives that describe the expected results of instruction.
Summative Assessment
(measures end-of-course achievement)
To assess achievement at the end of instruction
Example: End-of-year State test
Accommodations
Do not change the expectations for learning
Do not reduce the requirements of the task
Modifications
Do change the expectations for learning
Do reduce the requirements of the task (e.g., reduce number of items, alternate assignments, lower-level reading assignments)
For students who require more support or adjustments than accommodations provide.
Accommodations Commonly Used for Students with Disabilities.
Presentation accommodations allow a student with a disability to access information in ways other than standard visual or auditory means (e.g., by reading or listening).
Response accommodations allow students with disabilities to complete instructional assignments or assessments through ways other than typical verbal or written responses.
Setting accommodations allow for a change in the environment or in how the environment is structured.
Timing and scheduling accommodations allow students extra time to complete an activity or a test
Rubrics
an objective set of guidelines that defines the criteria used to score or grade an assignment.
Portfolios
collection of artifacts, or individual work samples, that represent a student's performance over a period of time.
Conferencing with students and parents:
PURPOSE: The collected samples of work make clear to students and parents alike what students are learning and how well they are learning it.
You can present a summary of the students' achievements and then support it by showing actual samples of the students' work.
This provides as comprehensive and complete a report of student achievement as is possible.
The conference also provides for two-way communication that permits the student or parent to ask for clarification and to discuss ways to improve performance.
No better way of reporting student achievement
Self-Assessment
process of students using specific criteria to evaluate and reflect on their own work.
Assigning Grades
Achievement:(i.e., how the student is performing in relation to expected grade-level goals)
Growth: (i.e., the amount of individual improvement over time)
Habits: (e.g., participation, behavior, effort, attendance)
Evaluating Performance
Rubrics: an objective set of guidelines that defines the criteria used to score or grade an assignment.
Portfolios: collection of artifacts, or individual work samples, that represent a student's performance over a period of time.
Self-assessment: process of students using specific criteria to evaluate and reflect on their own work.
Formal Assessments
(administered formally in the classroom)
large-scale assessments at the school, district, state, national, and international levels; standardized tests
Informal Assessments
(administered informally in the classroom)
Observational measures; teacher conducted assessments; assessment support materials; and other achievement, aptitude, interest, and personality measures used in and for education.
Mandated Core Components of 'Standards Based Reform'
a) content and performance standards set for all students;
b) development of tools to measure the progress of all students toward the standards;
c) accountability systems that require continuous improvement of student achievement.
Individuals with Disabilities Education Act (IDEA)
requires states to include children with disabilities in general state and district-wide assessment programs, with appropriate accommodations where necessary, and to report annually on the participation rates, performance, and progress of students with disabilities.
When students with disabilities cannot participate in testing, even with accommodations, states are required to include students using alternate assessments.
Estimates of the prevalence of severe disabilities indicate that only 1-2% of all students will need to take alternate assessments.
Federal laws such as ESEA and IDEA can be seen as legislated attempts to 'raise the bar.'
Multiple Stakeholders want Assessments to Meet a Variety of Needs
educators want test results to inform instruction;
taxpayers want to know that the money they spend translates into student learning;
governors want assurances that their students are achieving at a level similar to or better than students in other states
Negative Consequences of Large-Scale Assessments
1. Use of a single test score in making promotion/retention decisions.
2. Use of a single test score in graduation decisions.
3. Use of test performance as a basis for systems level rewards and sanctions.
4. Impact on mainstream education.
Product Assessment
Result of a performance assessment that becomes the focus of the assessment.
Classroom Achievement Tests
The tests designed for classroom use provide a systematic procedure for determining the extent to which intended learning outcomes have been achieved
Validity
adequacy and appropriateness of the interpretations and uses of assessment results
Concerned with the appropriateness of the interpretations made from the results.
Considerations of content, construct, assessment-criterion relationships, and consequences all can contribute to the meaning of a set of results
(validity is strongest when evidence to support all four of these considerations is present)
An absence of bias and procedural fairness is essential for an assessment to have a high level of validity in measuring the knowledge, skills, and understandings that it is intended to measure.
Reliability
consistency of measurement, that is, how consistent test scores or other assessment results are from one measurement to another
Concerned with the consistency of the results.
-provides the consistency that makes validity possible
-indicates the degree to which various kinds of generalizations are justifiable.
Bias in Tests and Testing
refers to construct-irrelevant components that result in systematically lower or higher scores for identifiable groups of examinees.
the presence of some characteristic of an item and/or test that results in two individuals of the same ability but from different subgroups performing differently on the item and/or test.
Therefore, it is most important that there are no ambiguities in the test items (questions and responses), passages, prompts, stimulus materials, artwork, graphs, charts, and test-related ancillaries.
Construct Irrelevance
Extent to which test scores are influenced by factors (e.g., mode of presentation or response) that are irrelevant (not related) to the construct that the test is intended to measure.
Different Types of Validity
Face validity: Do the assessment items appear to be appropriate?
Content validity: Does the assessment content cover what you want to assess?
Criterion-related validity: How well does the test measure what you want it to?
Construct validity: Are you measuring what you think you're measuring?
Factors That Lower the Validity of Assessment Results
1. Tasks that provide an inadequate sample of the achievement to be assessed.
2. Tasks that do not function as intended, due to use of improper types of tasks, lack of relevance, ambiguity, clues, bias, inappropriate difficulty, or similar factors.
3. Improper arrangement of tasks and unclear directions.
4. Too few tasks for the types of interpretation to be made (e.g., interpretation by objective based on a few test items).
5. Improper administration—such as inadequate time allowed and poorly controlled conditions.
6. Judgmental scoring that uses inadequate scoring guides, or objective scoring that contains computational errors.
Factors That Lower the Reliability of Test Scores
1. Test scores are based on too few items.
(Remedy: Use longer tests or accumulate scores from several short tests.)
2. Range of scores is too limited.
(Remedy: Adjust item difficulty to obtain larger spread of scores.)
3. Testing conditions are inadequate.
(Remedy: Arrange opportune time for administration and eliminate interruptions, noise, and other disrupting factors.)
4. Scoring is subjective.
(Remedy: Prepare scoring keys and follow carefully when scoring essay answers.)
Arranging Items on a Test
1. For instructional purposes, it is usually desirable to group together items that measure the same outcome.
-Each group of items can then be identified by an appropriate heading (e.g., knowledge, understanding, application).
-The inclusion of the headings helps to identify the areas where students are having difficulty and to plan for remedial action.
2. Where possible, all items of the same type should be grouped together.
-This arrangement makes it possible to provide only one set of directions for each item type. It also simplifies the scoring and the analysis of the results.
3. The items should be arranged in terms of increasing difficulty.
-This arrangement has motivational effects on students and will prevent them from getting "bogged down" by difficult items early in the test.
Selection Item: True- False
typically used to measure the ability to identify whether statements of fact are correct. The basic format is simply a declarative statement that the student must judge as true or false.
Selection Item: Matching
simply a variation of the multiple-choice form. A good practice is to switch to the matching format only when it becomes apparent that the same alternatives are being repeated in several multiple-choice items.
Selection Item: Interpretive Exercise
Complex learning outcomes can frequently be more effectively measured by basing a series of test items on a common selection of introductory material.
This may be a paragraph, a table, a chart, a graph, a map, or a picture. The test items that follow the introductory material may be designed to call forth any type of intellectual ability or skill that can be measured objectively.
-both multiple-choice items and alternative-response items are widely used to measure interpretation of the introductory material.
Selection Item: Multiple Choice
the most widely used and highly regarded of the selection-type items. They can be designed to measure a variety of learning outcomes, from simple to complex, and can provide the highest quality items.
-consists of a stem, which presents a problem situation The stem may be a question or an incomplete statement.
-several alternatives (options or choices), which provide possible solutions to the problem. The alternatives include the correct answer and several plausible wrong answers called distracters.
Supply Item: Supply Answer
(or completion) item requires the examinee to supply the appropriate words, numbers, or symbols to answer a question or complete a statement.
-Due to these weaknesses, theyshould be reserved for those special situations where supplying the answer is a necessary part of the learning outcome to be measured—for example, where the intent is to have students recall the information, where computational problems are used, or where a selection-type item would make the answer obvious. In these situations, the use of the short-answer item can be defended despite its shortcomings.
Supply Item: Essay Question
The most notable characteristic is the freedom of response it provides.
-Students are free to decide how to approach the problem, what factual information to use, how to organize the answer, and what degree of emphasis to give each aspect of the response. Thus, the essay question is especially useful for measuring the ability to organize, integrate, and express ideas. These are the types of performance for which selection-type items and short-answer items are so inadequate.
Performance Assessments may focus on...
a procedure
(e.g., giving a speech, reading aloud, physical skills,
musical performance)
a product
(e.g., a theme, a written essay, a graph, a map, a painting, a poster, a model, a woodworking project, and a laboratory report, drawing, or insect display)
or both
(e.g., using tools properly in building a bookcase).
Restricted Performance Tasks
highly structured and limited in scope
Example: Construct a graph
Extended Performance Tasks
typically less well structured and broad in scope
Example: Design and conduct an experiment
Analytic Scoring
The assignment of scores to individual components of a performance or product. Provides diagnostic information useful for improving performance. *grades specific criteria.
Rating Scale
provides an opportunity to mark the degree to which an element is present.
The scale for rating is typically based on one of the following:
frequency with which an action is performed
e.g., always, sometimes, never
the general quality of a performance
e.g., outstanding, above average, average, below average
set of descriptive phrases that indicates degrees of acceptable performance
e.g., completes task quickly, slow in completing task, cannot complete task without help.
Holistic Scoring
The assignment of a score based on an overall impression of a performance or product rather than a consideration of individual elements. The overall judgment is typically guided by descriptions of the various levels of performance or scoring rubrics. can be guided by scoring rubrics that clarify what each level of quality is like. Holistic scoring rubrics and product scales are especially useful where global judgments are being made. For an evaluation of the student's final level of performance
The Knowledge Dimension
A. Factual Knowledge—The basic elements students most know to be acquainted with a discipline or solve problem in it.
B. Conceptual Knowledge—The interrelationships among the basic elements within a larger structure that enable them to function together
C. Procedural Knowledge—How to do something, methods of inquiry, and criteria for using skills, algorithms, techniques, and methods
D. Metacognitive Knowledge—Knowledge of cognition in general as well as awareness and knowledge of one's own cognition
Role of Instructional Objectives
objectives provide a description of the intended learning outcomes in performance terms—that is, in terms of the types of performance students can demonstrate to show that they have achieved the knowledge, understanding, or skill described by the objective.
By describing the performance that we are willing to accept as evidence of achievement, we provide a focus for instruction, student learning, and assessment (objectives keep all three in close harmony)
Stating Instructional Objectives
We are simply describing the student performance to be demonstrated at the end of the learning experience as evidence of learning.
Percentile Rank
indicates relative position in a group in terms of the percentage of group members scoring at or below a given score.
Grade Equivalents
indicates relative test performance in terms of the grade level at which the student's raw score matches the average score earned by the norm group.
Absolute Grading
A common type is the use of letter grades defined by a 100-point system. In the case of an individual test, this 100-point system might represent the percentage of items correct or the total number of points earned on the test. When used as a final grade, it typically represents a combining of scores from various tests and other assessment results.
Strengths
1. Grades can be described directly in terms of student performance, without reference to the performance of others.
2. All students can obtain high grades if mastery outcomes are stressed and instruction is effective.
Limitations
1. Performance standards are set in an arbitrary manner and are difficult to specify and justify.
2. Performance standards tend to vary unintentionally due to variations in test difficulty, assignments, student ability, and instructional effectiveness.
3. Grades can be assigned without clear reference to what has been achieved (but, of course, they should not be).
Relative Grading
the students are typically ranked in order of performance (based on a set of test scores or combined assessment results), and the students ranking highest receive a letter grade of A, the next highest receive a B, and so on.
Strengths
1. Grades can be easily described and interpreted in terms of rank in a group.
2. Grades distinguish among levels of student performance that are useful in making prediction and selection decisions
Limitations
1. The percent of students receiving each grade is arbitrarily set.
2. The meaning of a grade varies with the ability of the student group.
3. Grades can be assigned without clear reference to what has been achieved (but, of course, they should not be).
Guidelines for Effective and Fair Grading
Inform students at the beginning of instruction what grading procedures will be used.
Base grades on student achievement, and achievement only.
Base grades on a wide variety of valid assessment data.
When combining scores for grading, use a proper weighting technique.
Select an appropriate frame of reference for grading.
Review borderline cases by reexamining all achievement evidence.
Year-end assessments are primarily designed to:
Estimate a growth in knowledge and skills from one year to the next. Identify academically at-risk students. Evaluate students' progress against national norms
Progress monitoring is designed to:
Estimate rates of improvement for each student. Identify students who are not making adequate progress and who need additional or alternative instruction. Evaluate the effectiveness of instruction so that teachers can create better instructional programs
Using Assessment Results
All learning goals should be assessed using more than one assessment activity.
A variety of assessment strategies should be used.
Assessment results will reveal information about student learning and performance which should be analyzed to assist with improvement of teaching and learning.
Student performance patterns and changes over time can be recorded and analyzed to provide information about student growth.
Some unexpected results or surprises may emerge.
The data will raise questions that you can use for your own and your students' growth.
Results can be compared to those of other teachers with similar classrooms and units of study to see if school-wide patterns emerge.
Monitoring Student Progress
A recordkeeping system that uses a grid format can help you monitor the progress of individual students and the whole class at the same time.
Since your tests are designed to measure student learning of specific objectives, this kind of display will allow you to see which students have mastered specific objectives and which students need additional help such as second-chance testing (which simply involves letting a student retake a test after additional instruction), remedial instruction, or accommodations
The same grid used for monitoring individual progress can also give you a picture of the entire class.
By tracking how students do on specific objectives, you can see which students (and objectives) require additional instruction
A grid that charts performance for multiple assessments can help you see patterns of strengths and weaknesses.
Looking at Assessment Data
The steps are as follows:
1. review the data from each assessment you conduct
2. review the overall results of all the assessment strategies you have used
3. consider outside factors that might have affected student learning
4. develop an action plan.
The Revised Taxonomy of Educational Objectives
provides a useful framework for (1) identifying a wide array of intended learning outcomes; (2) planning instructional activities; (3) planning assessment methods; and (4) checking on the alignment among objectives, instruction, and assessment.
Re-authorization of Elementary and Secondary Education Act (ESEA) called Every Student Succeeds Act (ESSA) 2015
Holds all students to high academic standards, prepares all students for success in college and career, provides more kids access to high-quality preschool, guarantees steps are taken to help students, and their school improve, reduces the burden of testing while maintaining annual information for parents and students, promotes local innovation and invests in what works