Linear regression and optimization in deep learning

Today my teacher in the deep learning course asked us to write shortly about linear regression and optimization, especially if these topics are not yet familiar. I have already done some deep learning studies at my home university, but I thought it would be a good idea to refresh my understanding of the course topic and these two fundamental concepts.

What is deep learning

Let’s start with the term deep learning which is a subset of machine learning. It uses neural networks which are algorithmic structures inspired by the brain. They learn patterns from data instead of following explicitly programmed rules. A simple example could be recognizing handwritten digits. Imagine we want a computer to tell whether an image contains the number 0, 1 or 2. Writing fixed rules for this is difficult because every person writes numbers differently. Instead we collect many labeled images of handwritten digits and feed them into a neural network. During training the model learns which visual patterns are associated with each number. Those patterns can be curves, edges and shapes. After training the network can classify new unseen digit images by the shapes. This illustrates the core idea of deep learning. Rather than defining rules by hand, the model learns, the shapes in this case, directly from data.

Linear regression

Linear regression is a technique used to find a relationship between given variables. It works well when the relationship in the dataset is roughly linear, but it is also a fundamental building block of deep learning models. Here’s a simple example of predicting house prices. The price is the output variable and the size of the house is the input variable (notice that the actual price we get from data is not the same thing than the predicted output variable price). If we plot house sizes on the x-axis and prices on the y-axis, each house appears as a point on the graph. Linear regression finds a straight line that fits these points by minimizing the prediction error.

In neural networks a similar idea appears inside each neuron. A neuron takes multiple input variables, multiplies them by weights, adds them together and produces an output. These simple linear operations combined across many neurons and layers, form the foundation of deep learning. While linear regression alone cannot solve complex tasks like digits recognition, it server as a crucial building block within deep learning models.

Optimization

Optimization is an essential part of training machine learning model. In the simple house price example we use only one input variable, size. If we plot the data points on a graph, it is possible to draw fairly good straight line by eye. However, optimization allows the computer to find the mathematically best line by minimizing the prediction error, meaning the difference between predicted and actual house prices, across all data points.

When we add more input variables such as location, age and number of rooms, the straight line becomes a multi-dimensional surface called a hyperplane, which we can no longer visualize. The optimization still works in the same way. The algorithm adjusts the model’s parameters step by step and repeatedly to find the best possible fit to the data.