Machine learning is a subset of artificial intelligence that essentially focuses on pattern recognition in data. It is a very transformational interdiscplinary field at the crossroad of computer science, mathematics, probability, statistics and optimisation theory. Artificial neural networks (ANN) and deep learning have popularised machine learning due to their ability to identify complex patterns with a plethora of applications across pretty much all fields.
The significance of data in machine learning
What is referred to as data in machine learning? In its cleanest form after several rounds of preprocessing, data is considered as a set of structured records relating to a certain topic organised in rows and columns.
Every column represents a feature that is a particular property of the data, and every row is the actual data instance representing a single observation gathered from the dataset. This is typically the expected data format used in machine learning for any meaningful model building. Machine learning scientists thus represent this data organisation in vectors and matrices using linear algebra.
Families of ML algorithms
Machine algorithms can be divided in three main learning philosophies or paradigms:
Supervised learning
Supervised learning refers to the set of algorithms that aims to identify the underlying mapping or relationship between a set of dependent features and a given target feature.
The dataset contains data pair (Y, x) called “labelled data” based on which the mapping model (i.e. f(.) ) will be built. Depending on the nature of the target feature or variable (categorical or continuous), two types of supervised learning subfamilies can be distringuished: regression and classification.
Regression algorithms aim to find the relationship between dependent features and a real valued target feature for which the possible output values are infinite. The type example of such problems is house price estimation which aims to find the house price model based some features related the house properties, its location and vicinity to important ameneities.
Classification algorithms aim to find the relationship between dependent features and a categorical target for which the possible out values finite. A typical classical problem will be that of diagnostic whether a patient has got a certain disease or not based on some medical properties related to the patient profile
Unsupervised learning
Unsupervised learning refers to a set of algorithms that aims to find patterns in unlabelled data which contains no target output variable. The algorithms must identify hidden pattern in the data based on the inherent structure in the features alone.
Several subproblems exist in unsupervised learning, which include clustering, dimensional reduction, anomaly detection and more. A clustering problem will aim, for instance, to identify the existing data groups based on the similarity of data points as dictated by their feature values.
Reinforcement learning
Reinforcement learning is a different learning methodology that is based on a reward system. It aims to find the best policy to solve a given stragetic problem based on a trial and error iterative scheme. It is the learning system commonly used in self learning systems and games that improves their performance over time. To understand the philosophy, we only have to look at children, how they learn to walk and how they stop touching candles or burning appliances. It is based on feedback of their previous attempts that impart new knowledge to them over time.
Applications of Machine learning
Machine learning finds applications anywhere data can be harvested, and there is a need for insight acquisition and automated problem-solving; thus, its use cases are endless. Common areas of applications are, however, in the following fields, systems and domains, in no other of importance:
- Computer Vision and Robotics
- Medical Expert Systems
- Product Recommender Systems
- Transportation and Logistics
- Natural Language Processing
- Time Series Forecasting
- Gaming
Machine Learning is used in Computer vision for object recognition and detection; Modern cameras are embedded with face detection systems, and surveillance platforms are embedded with ML-based video analysis systems that can detect and identify moving objects. Robotics integrated with computer vision make use of machine learning for object recognition and as well as path planning to navigate obstacles and perform dexterous maneouvres.
Medical Expert systems make use of machine learning algorithms to diagnose diseases based on patient profile data. These systems support medical doctors in the identification of pathologies and recommendation of potent treatments. Recommender systems are pervasive in e-commerce platforms that suggests potential products to customers based on their previous selections and likes for certain products, hoping to increase revenue and keep affiliation. Platforms like Amazon and Netflix make use of these systems on a regular basis.
Machine learning can also be used in transportation industry to determine shortest route policies to dispatch products or scheduling methods for flights as well as discouting programmes to maintain customer affiliation. Machine learning is also exentisively used in natural language processing for the design of chatbots, language translation, speech recognition, topic modellign or sentiment analysis. It is also used in applications involving time series forecasting such as algorithmic trading, weather forecasting and financial time series analysis. Other applications of machine learning and AI are in gaming with the design of self-learning AI or multiple agents games.
Prerequisities to Machine learning mastery
Gaining proficiency in machine learning requires a moderate skill level in several computer science and applied mathematics related topics in typically the following order of importance.
- Programming Skills
- Basic to Advanced Linear Algebra
- Basic to Advanced Multivariate Calculus
- Basic to Advanced Probability and Statitistics
- Database Skills
Proficiency in computer programming in paramount in machine learning to be able to navigate data through and through and customise pre-established machine learning libraries and algorithms. The most commonly used machine learning programming languages are Python, R and MATLAB although frameworks exist that minimise the use of computer coding. Platforms such as PowerBI, and WEKA can be used in compensation although a good knowledge of computer programming remains essential.
Practical machine learning usage does not require a profound knowledge of applied mathematics; however basic understanding of linear algebra would be beneficial to ML practitioners when they use ML libraries that require data in basic mathematical formats related to linear algebra (i.e. matrices and vectors) and machine learning training process that involves calculus (i.e. differential calculus). Good knowledge of statistics remains essential, as mostly real-world case studies involve some uncertainty that can be modelled using statistics. The better an ML user gets into these mathematical skills, the better he can move from being a machine learning practitioner to designing and optimising them.
Although not a paramount skill, good manipulation of relational databases can be relevant especially for data scientists that uses relational databases to extract relevant datasets for the problem in hand. In a typical data science team, such task will be assigned to a data engineer that will extract and prepare the relevant data, a data scientist that has database skills would be more versatile, and thus the knowledge of SQL, a language of relational database manipulation and data extraction is advised.
Conclusion
Machine learning is a revolutionary application field that embeds human like learning and pattern recognition in computer systems and hardware with potential to drastically improving productivity, insight acquisition and problem solving. Its applications are everywhere data can be harvested from computer vision, robotics, transportation, businesses to public governance. Proficiency in this field more and more becomes paramount in this digital era.
