Machine Learning and Image Recognition with iOS 11

Machine Learning

During summer 2015, I had the opportunity to visit Earth Resource Observation and Science Center (EROS), a research center for U.S. Geological Survey. During my interaction with the Director and Chief Architect of EROS I was deeply intrigued by the remote data sensing and data management methodologies at EROS. This was my foray into Image Processing and Computer Vision.

The concept of interpreting and reconstructing a 3D scene from a 2D image completely blew me away. I rolled up my sleeves and started working on a sample program which would detect my handwriting. My goal was to accomplish that on my new Samsung Note 4 (by the way, I am currently an iPhone user – so don’t hold this against me). To my surprise, I found I was limited by the technology. The closest I could get was writing the program in Matlab and then exporting the results to the device. The barrier at that time was the lack of API’s which could enable interaction between my mobile and the platform, which does image processing and runs computation algorithms.

Google Cloud Vision

On February 18, 2016, this barrier was broken by Google’s release of their Cloud Vision API’s. These API’s gave a jump start to developers and researchers across the world. One could submit the image to the API as a request and would get the response like Optical Character Recognition, sentiment analysis etc. This was a major step for Web applications because the heavy lifting of computation was performed on the server. The mobile device was only rendering the results.

Google’s Cloud Vision API

The downside for Google Cloud Vision API:

The flip side of Google Cloud Vision API was that one would have to upload the pictures to a server, which leads to the concern for privacy, the cost of a cloud solution, data transfer costs, limitations for offline usage. Also, one cannot analyze a video as they would end up sending each frame to the server and wait for a response. Imagine developing an iPhone app which analyzes video and in iPhone 6 it would be 30 frames per second.

Apple Core ML and Vision Framework

In WWDC, held June 5 – 9, 2017, Apple dropped hotness around this topic. Apple introduced their new Vision API’s and Core ML API’s.

In my opinion, Apple has broken down all the barriers in this field. The Vision framework enables image processing on our devices and provides results in real time. The Core ML framework provides the native acceleration to custom Machine Learning Models. Since the computation is on users’ device, there are no more concerns on topics around privacy, offline response time, data usage charges, etc.

The combination of Machine Learning and Image Processing open an array of possibilities for innovation. Ex: If you take the picture of a car, you could identify the category as a sports car vs hatchback vs sedan.

Sample Program

Finally, two years later, I realized my dream of writing an Image Recognition program that applies machine learning algorithms to my own device.

Step 1:

// Create the Request Handler and assign the classification action
let handler = VNImageRequestHandler(ciImage: ciImage, orientation: Int32 (orientation.rawValue))
DispatchQueue.global(qos: .userInteractive).async {
do {
try handler.perform([self.classificationRequest])
} catch {
print(error)
}}

Step 2:

// Load the ML model through its generated class and create a Vision request for it.
lazy var classificationRequest: VNCoreMLRequest = {
do {
let model = try VNCoreMLModel(for: VGG16().model)
return VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
} catch {
fatalError("can't load Vision ML model: \(error)")
}}()

Step 3:

// Perform Completion Handler
func handleClassification(request: VNRequest, error: Error?) {
guard let observations = request.results as? [VNClassificationObservation]
else { fatalError("unexpected result type from VNCoreMLRequest") }
guard let best = observations.first
else { fatalError("can't get best result") }
DispatchQueue.main.async {
self.classificationLabel.text = "Classification: \"\(best.identifier)\" Confidence: \(best.confidence)"
}}

Demo

https://www.youtube.com/watch?v=Pcd6LoBduqw&feature=youtu.be

Use Cases

Apple’s Vision and Core ML frameworks open doors to an array of application use cases. I am excited to see how more application developers adopt these frameworks in designing Apps.

It could be an educational app to recognize the type of flower. Or an app which would enable you to recognize the color of the building and place an order for that paint color. Well the sky is the limit with the power of hardware and software. So start thinking now….

Resources:

Vision Framework: Building on Core ML https://developer.apple.com/videos/play/wwdc2017/506/

If you have an interest in viewing similar content, visit our blog, here.

View our LinkedIn, here.