How the 'dumb' cameras in our smartphones suddenly became 'smart'.
For the past few years, I have been tormented by the question of how, using old technologies for manufacturing camera modules, modern flagships manage to 'stretch' the picture to the quality of inexpensive 'DSLRs'. There is no doubt that this is happening, the Internet is filled with beautiful pictures that people take with their mobile phones. It should be noted that the camera module itself has not changed in any way over the past ten years. Yes, the resolution of the photodiode array has grown, but a more significant parameter – the physical size of this matrix (and the area of the final photodiode), and hence the luminosity – is plus or minus at the same level. So what exactly does today's 'smart camera' do in today's flagships? And why would she need an extra 'AI' unit in the chipset? To answer these questions, it was decided to do a little research using straight arms, a clear head, and free software.
Content
- Photographing and setting the problem
- Available technologies
- Practical guesswork
- Result and conclusions
Photographing and setting the problem
Recently, International Women's Day – March 8 thundered across the country. This is a good opportunity to take a picture of the holiday table on different smartphones and study the issue with the most effective method – comparison. For this purpose, Honor 10 were taken from one of the guests, and the second photo was taken with the help of a veteran Sony Xperia Z1 Compact. Photographing was carried out from the window (the shadow side of the house) towards the center of the room, the conditions are rather harsh in any case. And this is what happened:
The Intelligent Shooting Honor 10 mode is impressive, the cucumbers are life-like and the tomatoes have the right shade of store-ripened vegetable. The background is processed normally, carpets of similar shade do not merge into a mess, the pattern on them is clear. The resolution of the picture, which dropped from 24 MP (main camera) to 4.9 MP during processing, is not upsetting either. Reducing the frame size often accompanies automatic processing, almost all algorithms, when they do not know what to do with grayscale, 'shrink' the image until they are completely destroyed. Camera Honor 10 has artificial intelligence, according to its creators. It's time to compare it to a 'stupid' camera. The same composition, photographed on the Z1 Compact:
Lack of light led to the appearance of a whitish veil and color of the twilight zone throughout the frame. The Z1C does not support the Google Camera API at the hardware level, and it lacks modern ways to enhance images on the fly, making it ideal for our experience.
It's time to set the problem and consider options for solving it. So we have a good shot of a smart camera Honor 10 and a disgustingly flawed shot of the Z1C.
Objective: to find out the essence of the 'smart camera' algorithm.
Possible Solution:
- Perform a series of simple operations that will turn a bad Z1C snapshot into a similar snapshot Honor 10
- Reverse snapshot Honor 10, which will turn it into a hideous snapshot like Z1C.
If the proposed algorithm works in the opposite direction, then its principle of operation can be considered proven. At least in this particular case.
Available technologies
Before you start, you need to decide what, in fact, the camera of a modern smartphone can do with an image without 'smart' functions. So, the camera can:
- Determine focal length using different methods of measuring the distance to the object.
- Apply HDR while artificially expanding the brightness range.
- Make bokeh by defining the silhouette of the nearest object and blurring the background around it.
This is the basis prescribed in the drivers of most modern cameras. You should rely on it if you are an artificial intelligence and you need to 'stretch' the picture.
Practical guesswork
In our particular case, changing the focal length will do nothing. If you look closely at the pictures, both cameras correctly focused on the holiday table, offering similar clarity around the edge of the salad plate (six layers).
The look of the table clearly indicates the use of HDR in the process of taking a picture at Honor 10, all objects are bright, the colors are very rich. Achieving HDR on any photo is very easy, just go to Google Photos and apply the Palm filter, which is pretty aggressive HDR. Here's what happened with the original Z1C shot:
It is clearly seen that the color scale and brightness of the objects on the table have become identical to those that we see in Honor 10. At the same time, a real color chaos is going on around the table with red reigning over everything and everything. And this is where the first guess creeps in that the 'artificial intelligence' algorithm uses a standard feature – bokeh. Perhaps, at first, the object in focus was cut out of the frame, both obtained images were processed separately (the object in focus is more aggressive), and then combined back.
To cut the holiday table out of the frame, I used the free Paint.NET application. If you have absolutely nothing to do and you want to repeat all the way (the process is laborious), then you can try photoshop or something else that you know how to use. In order not to create new image artifacts during operations, I highly recommend that all manipulations be carried out with photographs in the BMP bitmap format, not JPEG. Here's what happened:
Expand the dynamic range of brightness (HDR) for both images with different degrees of aggressiveness and merge them into one image:
Photo of Z1C after separate processing of objects
You can continue to play with the settings and different filters, but our task is to get a general idea and come to a result similar to a 'smart camera'. And I think that the goal has been achieved. Now let's go back with a photo Honor 10:
Photo Honor 10 after HDR Narrowing
I think the manual image enhancement methods are exactly the same as those used by 'artificial intelligence' in the cameras of modern smartphones.
Result and conclusions
The result can only be considered a description of the algorithm for processing a photo by a smartphone camera, in which the existence of some cleverness is declared. In my opinion, the description of the algorithm might look like this.
The smartphone's camera takes multiple shots, including 'bokeh' and 'reverse bokeh', where the background is removed instead of blur. Each image is processed with different filters – aggressive HDR for the central subject and softer for the background. Then the 'bokeh' and 'antiboke' are glued into one photo and presented to the user. This is a particular case for daytime photography, but also for nighttime photography, the key to success can be a well-tuned 'bokeh' technology with the ability to separate parts of the image.
The validity of the method is confirmed by the simplest experience – it is enough to fix the camera and take two pictures, one with focus in the center of the screen, and the second with focus on the background. In the process of shooting, the camera will change not only the focal length, but also the exposure (if there is not enough light), making distant objects brighter. After running the central subject through an HDR filter and combining the two photos in bokeh-anti-bokeh mode, you get a photo worthy of a modern flagship. It’s a pity, but I didn’t know how to do it right away, so I had to change a single photo.
At first glance, all these functions are very simple and do not require additional computing power, which is now implemented as separate AI blocks in the chipset. But let's remember that for such processing, the camera needs to focus not just one, but the maximum possible number of times in different areas of the frame, be able to instantly separate and stitch the photos. This can cause the creation of specialized computing units. The so-called “artificial intelligence” in the smartphone's camera was again not found, which is not surprising.
Each manufacturing company keeps its algorithm like the apple of its eye and uses various methods to combat intellectual property thieves. Sony uses a hidden TA section and DRM keys stored in it, the loss of which will instantly turn a picture into soap. Others use additional security chips to prevent the bootloader from unlocking and accessing content. For these reasons, we are shown only the result of such an algorithm and never the sequence of manipulations it performs. In any case, progress does not stand still, and the camera phone market froze in anticipation of the emergence of mobile technology for adding lost parts of the image, which will happen very soon. But even then we will talk about speeding up the calculations and complicating the already existing comparison technologies, implemented, by the way, in the well-known optical mouse. Or in the visual apparatus of a bee, which manages to squeeze out of the comparison of two images, which are updated in parallel through two channels (eyes), information about its position in space and distance to the object along the course, and which does not have a volumetric vision mechanism.
Dear readers, do you agree with the result or can you offer your own vision of the smart camera?