Hitchhiker’s Guide to a Brief History of AI

7 min readAug 25, 2019


It was 1950, a fascinating academic paper, Computing Machinery and Intelligence appeared in Mind, an academic journal published by Oxford University Press. It piqued the interest of the intellectual community for many reasons — first, it was written by Alan Turing of Enigma fame, but the most important of all, it posed a very intriguing question — “Can machines think ?” And what it means for machines to think? Turing’s this groundbreaking and revolutionary idea was pivotal in shaping the vision around the feasibility of creating intelligent systems.

The Imitation Game

Alan Turing believed that intelligent systems are inevitable in the future, but how do we define the intelligence of machines? The very definition of intelligence in itself is so broad and abstract, he wanted to define specific criteria which will allow us to gauge the intelligence of the machines. Turing devised a game called “The imitation game”. The idea was simple; three players play this game of questions answers where one player asks the questions, and the other two responds to those questions. If the human interrogator couldn’t tell within five minutes whether she was talking to a computer or a person, then the computer would pass the test. Today it is known as the “Turing Test”. Most people believe that the intention of the Turing Test was not to define intelligence but to squash the objections and the skepticism around the possibility of building intelligent systems. Turing very methodically addressed and argued against the major objections around artificial intelligence, including mathematical, religious, and theological.

1955 Dartmouth Summer Research Project on Artificial Intelligence

Soon after Turing’s paper was published, in 1955 the Dartmouth Summer Research Project on Artificial Intelligence was set up by John McCarthy along with other scientists like Marvin Minsky, Nathaniel Rochester, and Claude Shannon, to brainstorm about Artificial intelligence. This is considered as the germinal event for the field of Artificial Intelligence. The term Artificial Intelligence was first coined to define the study in this field. Many prominent scientists, including Minsky , McCatrhy, Allen Newell, and Herb Simon set up dedicated laboratories to explore the possibility of AI. They used different approaches and focussed on different methodologies for building intelligent systems. McCatrhy started with Mathematical logic, while Newell and Simon focussed on modeling human thinking. Minsky’s approach was a little hard to characterize. Minsky and Dean Edmunds built the first artificial neural network called SNARC (Stochastic Neural Analog Reinforcement Calculator). Minky believed that no single approach would deliver a full understanding of intelligence. In his paper “Steps Toward Artificial Intelligence”, he divided the problems of heuristic programming into five major areas: Search, Pattern-Recognition, Learning, Planning, and Induction. Minsky and Seymour Papert’s book Perceptrons became pivotal in the analysis of artificial neural networks. However, along with the tremendous promise of neural networks, Minky also understood its limitations, which he highlighted in this book which created major controversies and is blamed for discouraging AI research mid-70s leading to AI winter.

The First Wave of AI

The sixties to mid-seventies saw broader implementations of Artificial Intelligence. In 1961 James Slagle, one of Minky’s students developed a program called SAINT (Symbolic Automatic INTegrator), which solved elementary symbolic integration problems approximately at the level of a college freshman. This was one of the first expert systems. The core of this program was a concept called problem reduction. The approach was to break a big problem into smaller and simpler problems and continue breaking them into even smaller problems until it gets to a level where they can be solved easily. This was a groundbreaking idea which was quickly adopted by others to solve various kinds of problems. By 70s there was much progress around programs that understood drawings, learned from examples, knew how to build structures and even answered questions like today’s Siri or Alexa. This was a quite exciting and promising time for AI research.

The Second Wave of AI

In 1972 a Stanford student, Ed Shotlife successfully developed a system called MYCIN for diagnosing a class of diseases. This was a rule-based classification or rules-based “expert system”. The compelling results showed by MYCIN generated a lot of interests in rule-based systems. The rue-based systems work based on clearly defined logical rules. The systems examine various parameters in the problem area and reach a conclusion based on the rules devised by human experts.

In 1978 John P. McDermott in Carnegie Mellon developed R1 (aka XCON — eXpert CONfigurer), a rule-based expert system which could automatically select DEC’s VAX computer configurations based on customer requirements while ordering the computers.

AI Winter

The excitement and promises of AI from 50 to 70s started to fade in mid-80s as assertive projections didn’t reflect in results. It was a great topic of debate in the Annual Meeting of American Association of Artificial Intelligence in 1984. Minky warned the business community that the over-enthusiasm around AI was bound to be followed by disappointment if realistic expectations were not set. This led to pessimism in the press, ultimately leading to the collapse of perception of AI by the government and other private investors. This resulted in a significant cutback in funding in the field. However, the research in the area still continued with limited funding and resources.

The Third Wave of AI

In 2010–2011, with the release of IBM Watson and Apple’s Siri revived the interest in AI in the press and the business community and brought in a new wave of AI systems backed by industry giants like IBM, Amazon, Google, Facebook etc. This wave introduced systems that focussed on solving problems using statistical models rather than precise rules like previous rule-based systems. These systems were successful in solving many problems because they had the capability to self learn from sample training data and improvise on precision and efficiency. The success of these machine learning systems was also fueled by the availability of much higher computing and processing powers and volumes of data (or Big Data ) to train and test on . As a result, Machine Learning empowered systems showed impressive results and made many things possible, which were not possible to do by early AI systems.

Deep Neural networks based systems won numerous contests in pattern recognition and self-learning. The Google captioning program was developed based on the neural net. In 2012 Jeff Dean and Andrew Ng built a neural network of 16,000 computer processors with one billion connections and fed it with 10 million unlabeled images randomly taken from YouTube videos, to their amusement the program began recognizing pictures of cats using a “deep learning” algorithm. This was an unsupervised algorithm where no training data was provided to identify cats. The excitement around this generated a lot of interest in neural nets. However, the idea of neural nets is not new, Minsky pioneered this concept in 50s.

In 1974, Paul Werbos, in his PhD thesis at Harvard, first proposed the concept of training neural nets through backpropagation which in mathematical terms called gradient ascent. But for the next 30 years, very few people showed interest in this till Geoff Hinton Alex Krizhevsky, Ilya Sutskever used this to show impressive results in picture classification.

In 2015, AlphaGo, a program developed in Google Deep Mind lab in London to play the board game Go, defeated a human professional player and in 2017 defeated the world rank 1 player. It used a blend of deep neural nets together with Monte Carlo tree search technique with reinforcement learning as well as carefully designed board features for extensive training. All of these concepts have been around for a long time, but integrating them together was the key behind Alphago’s success.

The Next Wave :

Coupled with massive computing power and enormous volumes of data , today’s AI systems are showing impressive results in terms of perception, recognition and learning. However, how these algorithms derive conclusions, especially when it comes to the deep neural net is not well understood yet. For example, if a program recognizes a cat in an image, it is unclear how it perceives this, whether it focuses on identifying distinct cat features like ears, whiskers or entirely focuses on something else in the image pixel data? If we don’t know how these results were derived, the question becomes how much we can trust the results from these systems? This is the question that will drive the future wave of AI systems. One of the areas that will see much traction is self-aware or self-explanatory systems capable of contextual adaptation. For example, if a system recognizes an image of a cat, it should be able to tell why it thinks that it is a cat ex, does it have ears, whiskers etc. or something else. Self-aware is a very broad term, or a suitcase term in Minsky’s language, that can contain a variety of meanings packed into it. However, to build credibility and trust, in future people will expect AI systems to explain how they derive at a result, more like a reasoning capability. This again is not a new concept. It was the center of Turing and Minsky’s very first papers. We can think of reasoning as a special case of perceiving and telling stories. The ability to create, tell, understand stories is what sets humans apart from other species. The systems of the future will focus on storytelling capabilities as building blocks of creating self-explanatory and self-aware systems.