Learning Isn't Exactly Perfect

I've seen a number of discussions about machine learning as it applies to the Z9 autofocus subject detection. No doubt Nikon has used some form of machine learning in developing that system.

However, I see a number of mistaken thoughts about the machine learning trend in electronic devices and software that need a bit of straightening out. 

First, let's be clear: your Z9 doesn't learn as you photograph (more on that in a bit). The subject detection algorithms Nikon has created are hard coded into the camera's firmware. I'm pretty sure they can be adjusted and updated by Nikon, but only Nikon's engineering department has the ability to teach the system new things.

So let's talk about that teaching. 

The natural assumption is that you just keep throwing more and more data at a machine learning engine and it picks up all of it and just keeps getting better until it's 100% perfect. That's not the case. A little bit of learning nets you quite a bit of the benefit, then as you keep throwing things at the engine you get more modest improvements. Eventually, you can give the system too much data and its accuracy starts to go down again. I believe that's particularly true of multi-subject recognition systems when you start to add more types of subjects to recognize; you start to get false recognitions. 

For instance:

In this Z9 image the camera claims to have recognized the "eye" of an animal that was clearly not part of the training. Indeed, as I was photographing this warthog—I was expecting him to start running and get blur, note the slow shutter speed—I noted that the subject detection seemed confused. It would snap between recognizing face and then something that wasn't the eye, then back and repeat. 

The issue is this: if you train the system to recognize the unusual warthog head/eye configuration, you might upset the overall animal face/eye algorithm by making the camera start to ponder subtle things when the actual situation should be obvious. With animals, in particular, there can be patterns on the head that mask the shape. Indeed, some of that is evolutionary camouflage. Developers I've talked to working with these types of learning AI routines tell me that getting to 80-90% accuracy is the easy part, but then making further progress starts running into all these special case issues, and progress on specialty cases can detract from your progress on something else that is more regularly encountered. 

I believe that this is one reason why Nikon used a hierarchical approach to the Z9 subject recognition: first see if you can overall recognize what kind of subject it is (body), then see if you can recognize its key element (face/front), then dive in for the detail (eye/headlight). The Z9 clearly recognizes animals that I'm pretty sure weren't part of the learning equation (though it also doesn't recognize a few more). It then tries to apply the the key element/detail algorithms and isn't always successful. 

In human terms, it's similar to what happens over time as you get exposed to more things and learn more. You can't always pull up the relevant neuron firing to remember something completely or accurately because you've been storing so much random information in your wetware.

Which brings me to this: I'm not sure that we want our cameras trying to learn in the field, as some have suggested. Take my typical use pattern recently: I'm mostly photographing sports and wildlife at the moment. Two very different things. And within both categories, I'm photographing some very different sub-categories. If my Z9 were trying to learn from what I take photos of in real time, it would keep going down rabbit holes of specific subjects, and then I'd end up throwing it for a loop as I changed subjects. In sports, for instance, we also have a wide range of helmets and uniforms that enter the equation. Heck, we have sports where people are on animals.

The question I can't answer, of course, is exactly what it is that Nikon is using as their engine and how they're training it. Nikon has provided absolutely no clues that could be used to decipher how they're managing the machine learning process, as they once did with the somewhat similar sample-based evaluation system that trained their matrix metering to be the best in the business. 

The key takeaway should be this: Nikon can make the Z9's subject detection better, but it will take a lot of effort and concentrated work in Japan to do so. The system is already really good, arguably the best at multi-subject recognition. But, as always, the photographer attempting to elicit every ounce of performance from a Z9 has to always be prepared to take control away and force the camera into a different approach at times. This is where the previous Z's had a huge liability (no AF-ON+AF-Area mode button assignments, for example). The Z9 gives that back, so it is possible to wrest control from the machine learning and make the camera do what you need it to. That just takes learning and practice on your part.

Bottom line: machine learning is now present in several ways in most of our cameras, with some, like the Z9, getting quite sophisticated algorithms. These systems are welcome, for sure, but as always, you need to be prepared to take control of them when they're not achieving what you want them to. 

I'm tempted to say that this is a "trust, but verify" type of thing, but that's not quite right. It's really "partake, but (sometimes) override." 

Looking for other photographic information? Check out our other Web sites:
DSLRS: dslrbodies.com | mirrorless: sansmirror.com | general/technique: bythom.com | film SLR: filmbodies.com

text and images © 2022 Thom Hogan — All Rights Reserved
Follow us on Twitter: @bythom, hashtags #bythom, #zsystemuser