Components of a pattern recognition system

Pattern recognition, which is closely related to both artificial intelligence and machine learning is an automated process of recognizing the regularities of given data and classifying it into different categories using various computer algorithms. Pattern recognition has vast practical implementations and some of them are: differentiate patients with and without cancer, separating different species of plants and making a distinction between forged and real signatures.

To better understand the process it is best to separate it into different components which will be described in the following part of the text.

  • Often called the input, the first component of a pattern recognition system represents the data that is being processed. This data is usually gathered via different sensors, cameras, and similar tools or obtained from a database. When the data is obtained it is in its raw form, which can be used right away, or it can be altered to suit the later steps. Most often used is a form of a vector.
  • The second step is preprocessing. This includes manipulation of data to produce a dataset on which a specific computer algorithm can be applied. For example, when preprocessing images, you can do a re-sizing of the images so that all of the images in the data set have the same number of pixels, and apply different types of filtering such as a low pass and a high pass filter which in this case are used for noise reduction in the data. Preprocessing is done to increase the overall accuracy of the recognition and to reduce the storage requirements.
  • Dimensionality reduction is used to reduce the number of variables in the data set by obtaining a set of principal variables. Different approaches are feature selection and feature extraction. Feature extraction derives an informative and non-redundant set of variables from an initial dataset. In image processing, this is used to detect and isolate different shapes on an image, and it can be used to isolate different letters from one another using similar principles.
  • Prediction is the next component of a pattern recognition system. Machine learning models are being selected in this step. You can use various different machine learning models and two of the biggest groups are supervised and unsupervised learning.  Supervised learning solves classification and regression problems. Unsupervised learning is used to detect a structure within the given dataset. It uses clusterization and dimension reduction algorithms.
  • Applying different algorithms on the dataset means the output will have several different models available to select. This is done in the next step where the best model is chosen for the data. Validation of models is done in several different ways, and the goal is to minimize the prediction error that occurs. There are methods like holdout, cross-validation, bootstrap and separation of data into training, validation and testing subsets.

When the best model is chosen it is ready to be implemented into a system and serve its practical purpose.