New AI system would probably do better than you on the math SAT

Written By: Chuck Bednar

Published Date: September 23, 2015 Last Edited: September 23, 2015

Christopher Pilny

A team of researchers from the Allen Institute for Artificial Intelligence (AI2) and the University of Washington have developed an AI system that could theoretically pass the SAT tests, and that can complete geometry problems every bit as well as the average US 11th grade student.

The system is known as GeoS, and as the inventors explained in research presented recently at the 2015 Conference on Empirical Methods in Natural Language Processing in Lisbon, Portugal, it was able to interpret diagrams and process language well enough to correctly solve 49 percent of SAT geometry questions.

GeoS had to solve unaltered test questions which it had never previously encountered, and which required it to have an understanding of implicit relationships, ambiguous references and the links between diagrams and natural-language text, the researchers explained in a statement Monday.

Were the results extrapolated over the entire Math SAT, the AI system would have scored a 500 out of 800 – the average test score for 2015, AI2 CEO Oren Etzioni, UW assistant professor of computer science and engineering Ali Farhadi, and their co-authors added.

Developers looking to expand its areas of expertise

GeoS is said to be the first complete system capable of solving SAT plan geometry problems, doing so initially by interpreting a question using the diagram and text to come up with the best possible logical expression of each problem. Once that is completed, it sends that information to a geometric problem solver to come up with a solution, then compares it to the multiple choice options presented within the context of the test.

On questions it was confident enough to answer, the AI technology had an accuracy rate of 96 percent. Currently, GeoS is only able to solve plane geometry questions, but AI2 officials report that they are attempting to expand its knowledge base so that GeoS will be able to solve the full set of SAT math questions within the next three years.

“Unlike the Turing Test,” Etzioni said, “standardized tests such as the SAT provide us today with a way to measure a machine’s ability to reason and to compare its abilities with that of a human.Much of what we understand from text and graphics is not explicitly stated, and requires far more knowledge than we appreciate. Creating a system to be able to successfully take these tests is challenging, and we are proud to achieve these unprecedented results.”

Farhadi added that the research team was “excited about GeoS’s performance on real-world tasks. Our biggest challenge was converting the question to a computer-understandable language. One needs to go beyond standard pattern-matching approaches for problems like solving geometry questions that require in-depth understanding of text, diagram and reasoning.”

—–

Feature Image: Thinkstock

Comments

comments