# Training

Modern experimental and computational methods, such as high throughput experimentation or data mining, can rapidly generate large datasets. However, most organic chemists are not well trained to quantitatively analyze such large datasets. C-CAS provides training for a new generation of “data chemists” looking for a career applying computational and data science to synthesis. Through co-mentoring and workshops for center participants, C-CAS bridges the gap between chemistry and data science in both academia and industry.

## Training Resources

#### A short course by the Sigman Lab

Short Course in Multivariate Linear Regression Models

1.0 Introduction to the Short Course

1.1 What are Linear Free Energy Relationships (LFER)?

2-0 Why is Conformations Searching Important?

2-1 Conformational Searches Using Molecular Mechanics

2-2 Conducting a Conformational search in MacroModel

2-4 Submitting a QM Calculation through Utah's CHPC

3-1 Using Python to Parameterize Molecules

4-0 Intro to Statistical Modeling Strategy

4-1 Interpreting Statistical Models

Training videos of C-CAS on the C-CAS Youtube Channel

#### Introduction to Bayesian Optimization

Part 1: Introduction to Bayesian Optimization

Part 2: Applications to "over-the-arrow" optimization

In these videos, Ben Shields from the Doyle group explains the basics of Bayesian optimization and its application to finding the best reaction conditions. The work explained in these video is published in a recent Nature paper by the Doyle group.

#### Conformational Searching

Part 1: Introduction to Conformational Searching

Part 2: Conformational Searching in Macromodel

In these videos, Liliana Gallegos and Guillian Luchini from the Paton Group, together with Jessica Wahlers and Kevin Koh from the Wiest group explain different approaches to conformational searching of small molecules.

**Generating Potential Energy Surfaces in Python**

Lilian Gallegos from the Paton group explains how information on potential energy surfaces can be extracted from Gaussian outputs using a set of python scripts

#### Graph Neural Networks: Basics and Applications

Part 1: Representing molecules as Graph Neural Networks (GNN)

Part 3: Heterogeneous Knowledge Graphs

Part 4: Property Prediction using GNNs

Mandana Saebi, Zhichun Guo and Chuxu Zhang from the Chawla group explain what graph neural networks are and how they can be used to represent and predict chemical properties and reactions.

#### Data Scrubbing

Bozhao Nan from the Wiest group explains workflows to prepare real-world datasets for application in machine learning.

#### Synthesis Planning using Synthia

Melissa Hardy and Brandon Wright from the Sarpong group explain the concepts and application of computer-aided synthesis planning using Synthia®

#### Modern Steric Parameters

Guillian Luchini from the Paton Group demonstrates the use of python scripts to generate a series of modern steric parameters for the featurizations of molecules.

The Center is building up a curated resource library of videos and publications that members find useful:

- Andrew Ng’s online lectures on machine learning have often been described as a “Rite of Passage” for many interested in this topic.