Pictofit / iOS SDK / 2.6.2 / Capturing Avatars

Capturing Avatars

The SDK currently provides a module for capturing which allows users to create a virtual avatar of themselves for use with our 2D features. We will add different other ways of capturing for 2D and 3D in the near future. In any case, a Pictofit Content Service account is required to generate the avatar since the processing happens in the cloud on our servers. Please contact our sales team to get your free trail account.

Head Capturing

This features allows users to create an avatar of themselves by esentially capturing a short selfie video. During the process, the user is guided and receives feedback to make sure that we get all the data we need. The captured data is then uploaded to our cloud for processing. Once the processing has finished, the resulting avatar can be downloaded and used with our mobile SDKs. Since users only capture their face, the body is purely virtual and can be adjusted. More information on this can be found in the respective section. The resulting avatar can be used with our mix & match functionality in 2D. The following steps outline the process of creating an avatar this way:

  • Instruct the user on the capturing process
  • Perform the capturing and guide the user
  • Upload the data to our Content Service
  • Download the resulting avatar
  • Allow the user to adjust their avatar

The following short visual illustrates the capturing process.

Visualisation of the capturing process.

Capturing & User Guidance

The logic for capturing is provided by the RRTrueDepthCaptureView class. To perform a head avatar capturing, the required steps involve showing an instance of the RRTrueDepthCaptureView and call its startCapturingWithStorageDirectory method. To achieve a good user experience, it will also be necessary to implement the RRTrueDepthCaptureViewDelegate protocol and provide it as the view’s delegate. This allows you to recieve feedback from the capturing component and instruct the user accordingly.

To make sure that the capturing works out fine and users know what to do, you should show instructions before starting the process. Best practice is to display a short video clip that explains the capturing process. Additionally, showing a bullet point list that mentions all the important points is adviseable. The list should cover the following points:

  • Tie up your hair or put it in front of your shoulders
  • Sit upright and face the sun/light
  • Turn your phone volume up to hear all instructions
  • Remove glasses

The following image gives an example of what the instructions screen could look like:

Example of an instruction screen.

The following code snippet shows a simple example of how the updatedCaptureWarning and updatedCaptureInstruction delegate method can be used to give the user feedback. You should provide your users with the instructions and warnings the delegate delivers to guide them. During capturing, users should not look at the screen so using audio is the best way to give the feedback.

func trueDepthCaptureView(_ captureView: RRTrueDepthCaptureView, updatedCaptureWarning warning: RRTrueDepthCaptureViewCaptureWarning) {
  var text : String? = nil
  switch warning {
  case .noFaceDetected:
    text = "No face detected"
  default:
    text = nil
  }
  
  if text != nil && self.lastSpeechFeedbackText != text {
    let utterance = AVSpeechUtterance(string: text!)
    utterance.voice = AVSpeechSynthesisVoice(language: "en-EN")
    self.speechSynthesizer.speak(utterance)
    self.lastSpeechFeedbackText = text
  }
  
  if self.mainView.headCapturingView?.isCapturing == false { // No overlay messages during capturing since the user should anyway not look at the screen
    self.showOverlayMessage(text)
  }
}

func trueDepthCaptureView(_ captureView: RRTrueDepthCaptureView, updatedCaptureInstruction instruction: RRTrueDepthCaptureViewCaptureInstruction) {
  var text : String? = nil
  switch instruction {
  case .centerToTheLeft, .centerToTheRight:
    text = "Center horizontally"
  case .moveDown:
    text = "Move down"
  case .moveUp:
    text = "Move up"
  default:
    text = nil
  }
  
  if text != nil && self.lastSpeechFeedbackText != text {
    let utterance = AVSpeechUtterance(string: text!)
    utterance.voice = AVSpeechSynthesisVoice(language: "en-EN")
    self.speechSynthesizer.speak(utterance)
    self.lastSpeechFeedbackText = text
  }
}

As soon as the capturing has finished, the RRTrueDepthCaptureViewDelegate method trueDepthCaptureViewFinishedCapturing will be triggered. Use this to signal the user the that the capturing is done. Ideally, this is again done using audio feedback or vibration since the user should not look at the screen until the process is done. The trueDepthCaptureViewFinishedCapturing also provides a capturing quality summary, that includes several boolean properties indicating possible quality warnings about the finished session. Have a look at the API docs for the possible warning types. If there are warnings, they should be presented to the user and you should allow the user to restart the capturing if he/she wants to give it another try.

Data Transfer

After you received the delegate’s trueDepthCaptureViewFinishedCapturing callback, some processing will happen internally within the RRTrueDepthCaptureView class. After the processing has finished, the delegate’s trueDepthCaptureViewDataReady method will be triggered. This callback tells you that the captured data is ready for upload. From now on you can access the captured data and transfer it to our Pictofit Content Service. The data consists of several captured frames where each frame is represented by an instance of the RRTrueDepthKeyframe class.

The following code snippet shows how you can access the captured data. The file names used in this sample already show the naming you must follow when uploading the files to our Pictofit Content Service so that we can properly process it .

func uploadData(headCapturingView: RRTrueDepthCaptureView) {
  var fileNames: [String] = []
  var fileData: [Data] = []

  let numKeyframes = headCapturingView.capturedFramesCount
  
  for keyframeId in 0..<numKeyframes {
    let keyframe = headCapturingView.getCapturedKeyframe(forFrameID: keyframeId)!
    fileNames.append("color_\(keyframeId)")
    fileData.append(keyframe.colorData)
                     
    fileNames.append("depth_\(keyframeId)")
    fileData.append(keyframe.depthData)
    
    fileNames.append("metadata_\(keyframeId)")
    fileData.append(keyframe.metadataToJSON())
  }
  
  uploadFiles(fileNames, fileData)
}

The following list summarises again what you need to upload so that we can generate an avatar for you:

  • One color image file per captured keyframe named color_<keyframe index> with Content Service file type ADDITIONAL_VIEW
  • One depth image file per captured keyframe named depth_<keyframe index> with Content Service file type DEPTH
  • One metadata file per captured keyframe named metadata_<keyframe index> with Content Service file type MATRIX

Additionally, it is also required to upload a metadata JSON dictionary to the Content Service. This metadata must be uploaded as the product entity’s metadata. To create this JSON metadata, use the RRHeadAvatar3DProductMetadata class that is provided by the Pictofit iOS SDK. The following code snippet shows how to use it:

func getProductMetadata(avatarName: String, gender: RRGender, bodyHeight: Int) -> String {
  let metadata = RRHeadAvatar3DProductMetadata()
  metadata.avatarName = avatarName
  metadata.gender = gender
  metadata.bodyHeight = bodyHeight
  
  return metadata.getJsonString()
}

Here’s a detailed description of the JSON values and their format:

  • avatarName: An arbitrary display name for the captured avatar as a string
  • gender: The avatar gender as a string, which will define the template body model that will be used. Currently supported values: "female" and "male"
  • bodyHeight: The full body height of the captured user in centimeters. Data type must be integer
  • platform: The platform of the capturing device as a string
  • deviceName: The device identifier of the capturing device as a string
  • osVersion: The OS version number of the capturing device as a string

Once the processing has finished, you can download the resuling avatar and present the user the Avatar Configurator as the next step.

© 2014-2020 Reactive Reality AG