# Suggested Notation for Machine Learning

## Introduction

This introduces a suggestion of mathematical notation protocol for machine learning.

The field of machine learning is evolving rapidly in recent years. Communication between different researchers and research groups becomes increasingly important. A key challenge for communication arises from inconsistent notation usages among different papers. This proposal suggests a standard for commonly used mathematical notation for machine learning. In this first version, only some notation are mentioned and more notation are left to be done. This proposal will be regularly updated based on the progress of the field. We look forward to more suggestions to improve this proposal in future versions.

## Dataset

Dataset \$S={mathbf{z}i}^n{i=1}={(mathbf{x}i,mathbf{y}i)}^n{i=1}\$ is sampled from a distribution \$mathcal{D}\$ over a domain \$mathcal{Z}=mathcal{X}timesmathcal{Y}\$.

• \$mathcal{X}\$ is the instances domain (a set)
• \$mathcal{Y}\$ is the label domain (a set)
• \$mathcal{Z}=mathcal{X}timesmathcal{Y}\$ is the example domain (a set)

Usually, \$mathcal{X}\$ is a subset of \$mathbb{R}^d\$ and \$mathcal{Y}\$ is a subset of \$mathbb{R}^{dtext{o}}\$, where \$d\$ is the input dimension, \$dtext{o}\$ is the ouput dimension.

\$n=#S\$ is the number of samples. Wihout specification, \$S\$ and \$n\$ are for the training set.

## Function

A hypothesis space is denoted by \$mathcal{H}\$. A hypothesis function is denoted by \$f{mathbf{theta}}(mathbf{x})inmathcal{H}\$ or \$f(mathbf{x};mathbf{theta})\$ with \$f{mathbf{theta}}:mathcal{X}tomathcal{Y}\$.

\$mathbf{theta}\$ denotes the set of parameters of \$f{mathbf{theta}}\$.

If there exists a target function, it is denoted by \$f^\$ or \$f^:mathcal{X}tomathcal{Y}\$ satisfying \$mathbf{y}i=f^(mathbf{x}i)\$ for \$i=1,dots,n\$.

## Loss function

A loss function, denoted by \$ell:mathcal{H}timesmathcal{Z}tomathbb{R}{+}:=[hbf{x}) + mathbf{b}^{l-1})\$, \$l\$-th layer output | \$f{mathbf{theta}}(mathbf{x})\$ | \$=f{mathbf{theta}}^{L}(mathbf{x})=mathbf{W}^{L-1} f{mathbf{theta}}^{L-1}(mathbf{x}) + mathbf{b}^{L-1}\$, \$L\$-layer NN |

# Acknowledgements

Chenglong Bao (Tsinghua), Zhengdao Chen (NYU), Bin Dong (Peking), Weinan E (Princeton), Quanquan Gu (UCLA), Kaizhu Huang (XJTLU), Shi Jin (SJTU), Jian Li (Tsinghua), Lei Li (SJTU), Tiejun Li (Peking), Zhenguo Li (Huawei), Zhemin Li (NUDT), Shaobo Lin (XJTU), Ziqi Liu (CSRC), Zichao Long (Peking), Chao Ma (Princeton), Chao Ma (SJTU), Yuheng Ma (WHU), Dengyu Meng (XJTU), Wang Miao (Peking), Pingbing Ming (CAS), Zuoqiang Shi (Tsinghua), Jihong Wang (CSRC), Liwei Wang (Peking), Bican Xia (Peking), Zhouwang Yang (USTC), Haijun Yu (CAS), Yang Yuan (Tsinghua), Cheng Zhang (Peking), Lulu Zhang (SJTU), Jiwei Zhang (WHU), Pingwen Zhang (Peking), Xiaoqun Zhang (SJTU), Chengchao Zhao (CSRC), Zhanxing Zhu (Peking), Chuan Zhou (CAS), Xiang Zhou (cityU).