Image for post
Image for post

Introduction

You are familiar with PyTorch/XLA. You have tested some example code, it works, encouraged by the quick win you set out to train your own model. Your model trains but you find that it is slower than expected. You want to find out why. This article attempts to share some of the tools to help you find out why and derive actionable steps to improve the training performance.

Pre-Reading

In the remainder of this article we will walk through a performance debug case study. A sound understanding of inner workings of XLA Tensor can make the following content more accessible. We encourage the reader to review this talk from PyTorch Developers Day 2020 and this talk from Google Cloud Next for a quick primer. You may also find this article helpful if you are new to PyTorch/XLA. This article also assumes that the reader is familiar with Google Cloud Platform SDK and has a project with permissions to create resources such as virtual machines and Cloud TPU instances. …


Image for post
Image for post

PyTorch/XLA on TPU

You have read PyTorch/XLA on TPU is GA now. Perhaps you have run through some example Colabs as well. But you want to understand how you can run your own PyTorch model on TPUs. You are in the right place. This article deconstructs a simple example in terms of key concepts. The goal is to equip you with necessary fundamentals to update your own training code to train on TPUs.

Basic Concepts

PyTorch/XLA introduces a new tensor abstraction called XLA Tensor. Any operation performed on XLA tensor is traced into an IR (Intermediate Representation) graph. In the simplest scenario the tracing continues until it encounters a `xm.mark_step()` call. At this stage the IR graph is converted into XLA HLO (High Level Optimization) format and sent to TPU. The HLO format is compiled by the TPU runtime and further optimized into a LLO (Low Level Optimized) format. …

About

Vaibhav Singh

Machine Learning Engineer @Google, Hardware Design Staff Engineer in past life.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store