Tuesday, June 23, 2026
8:15 AM – 4:30 PM
Partnership 3 – Room 233 (in-person only)
The Office of Research Cyberinfrastructure is hosting a one-day “Research Computing Advanced Bootcamp” for users interested in specialized topics in research computing such as strategies for leveraging multi-GPU architectures in parallel workflows, GPU profiling, limitations of pandas for large DataFrames, other high-performance tools for DataFrames, querying large language models via Python APIs, reproducibility practices, and automated plotting techniques. The workshop will include three sessions featuring hands-on exercises, followed by an open discussion and Q&A.
Due to limited seating capacity preference will be given to graduate students.
Session 1: Distributed GPU architecture for LLMs
This session provides an introduction to GPU computing and multi-GPU strategies for machine learning workloads. It covers GPU fundamentals, memory considerations, and what data is stored during model training, followed by an overview of multi-GPU approaches such as Model Parallelism, Distributed Data Parallel (DDP), and Fully Sharded Data Parallel (FSDP). Participants will explore the architecture, trade-offs, and practical implementation of each method through code examples, GPU profiling, and hands-on exercises.
Session 2: Handling Large DataFrames in Python
Pandas is the default tool for working with tabular data across many research domains, but as datasets grow into the multi-gigabyte range, it begins to break down, first in memory usage and then in performance. This session walks through that transition end-to-end using a real-world dataset. It examines where pandas struggles at scale (in both memory and CPU) and evaluates how far pandas can be pushed before alternative solutions become necessary.
The session then introduces a newer-generation DataFrame library (e.g., Polars) that is multithreaded by default, uses a columnar memory layout for improved performance on analytical workloads, and can reorganize and optimize sequences of operations prior to execution. It concludes by highlighting how storage format choices and the ability to process data in chunks (rather than loading everything into memory at once) determine whether code remains effective as datasets grow by an order of magnitude, along with a brief discussion of how these concepts extend to multi-machine (e.g., Dask) or GPU-based (e.g., cuDF) workloads.
Session 3: Python and DataFrames for Sensible Experiment Management
Summary: What is “effective” computational research? At any given time, you may have multiple directions you want to investigate. What kind of tooling can you construct during the exploratory phase that will aide you when it’s time to publish and distribute your findings?
In this workshop, you’ll build a benchmarking framework to run inference against a shared LLM server to replicate results of a paper on Zero-shot Chain-of-Thought prompting. Along the way, we will discuss how and when it is appropriate to rearrange and organize our research code.
Topics covered: Querying LLMs via the API in Python, Pandas aggregation, reproducibility, and automated plotting.
****Please Note: All the sessions have a hands-on component. To participate in the hands-on exercises during the session, you will need to bring your own computer equipped with a web browser. Prior experience in Python is helpful. ****
Tentative Agenda for Tuesday, June 23, 2026
Please note that this agenda is subject to updates.
| Event | Start | End | Speaker |
|---|---|---|---|
| Sign-in and Breakfast | 08:15 AM | – | – |
| Introduction to Office of Research Cyberinfrastructure | 08:30 AM | 08:40 AM | TBD |
| Distributed GPU architecture for LLMs | 08:45 AM | 10:30 AM | Kei Long |
| Coffee Break | 10:30 AM | 10:45 AM | – |
| Handling Large DataFrames in Python | 10:45 AM | 11:45 AM | Fahad Khan |
| Lunch Break | 11:45 AM | 12:45 PM | – |
| Handling Large DataFrames in Python (cont.) | 12:45 PM | 01:45 PM | Fahad Khan |
| Coffee Break | 01:45 PM | 02:00 PM | – |
| Python and DataFrames for Sensible Experiment Management | 02:00 PM | 04:00 PM | Benjamin Keene |
| Q & A and Discussion | 04:00 PM | 04:30 PM |