LLM ENGINEERING

Part 1: Minimal LLM Engineering Setup (Busy Builders Edition)

A minimal but flexible LLM environment to start building GenAI workflows quickly — without over-complicating your setup.

Prasanna Arjunan • Jan 17, 2025 • 6:30 AM SGT

Welcome to Part 1 — Minimal LLM Engineering Setup

Welcome back to the LLM Engineering for Busy Builders series. You can follow the full series homepage here.

In Part 0, we talked about why we're doing this: building practical, real-world GenAI workflows while learning — with a strong focus on business use cases, customer experience (CX), and contact center scenarios.

In Part 1, we're getting our environment ready. My personal goal for this stack is simple:

Light enough to get started quickly.
Flexible enough to evolve as we build more advanced agents and workflows.

Let's not over-engineer just yet. This is your Busy Builder starter setup.

Keeping it simple (for now)

We're starting light and simple — just enough to get us building quickly, while leaving room to grow as things get more advanced.

This approach allows us to:

Start experimenting sooner.
Focus on concepts before getting too deep into tooling.
Add complexity only when we actually need it.

As we explore more advanced use cases, we'll naturally evolve the setup. But for now, the priority is to get building with minimal friction.

Create your project workspace

Let's start simple. Open your terminal and create a project folder where we'll build all our experiments:

mkdir llm-busybuilders
cd llm-busybuilders

All files we build throughout this series will live inside this folder. This keeps everything organized as we evolve the stack.

Tip: This works on macOS, Linux, and Windows (using WSL or Git Bash). Use any terminal you're comfortable with.

Choosing an environment manager

There are multiple ways to manage Python environments. In most production systems you'll see:

venv — lightweight, built into Python, simple for most use cases.
poetry — good for reproducibility and dependency resolution.
conda — better when you have ML packages with system-level dependencies.
pyenv / asdf — useful for managing multiple Python versions across projects (optional, not required for this series).

For this Busy Builders series, I'll use conda — mainly because some ML and LLM packages play better with it out of the box, especially early on.

Note: You can absolutely follow this series with venv too. I'll include a separate requirements.txt file later for those who prefer a pure pip setup.

The environment file

We'll manage our Python dependencies using environment files. These help us:

Define exactly which packages and versions we're using.
Reproduce the same environment on any machine (yours or anyone following this series).
Quickly install or update dependencies as we expand our project.

In the public repo, I maintain both:

environment.yml — for Conda users (like this post).
requirements.txt — for pip/venv users who prefer a more lightweight setup.

You can either copy the file below or simply clone the full repo and use it directly.

Here's the environment.yml file that includes most of what we'll need as we build:

yaml

name: llms-busybuilders
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - pip
  - openai
  - transformers
  - langchain
  - chromadb
  - faiss-cpu
  - sentence-transformers
  - jupyterlab
  - pandas
  - numpy
  - matplotlib
  - scikit-learn
  - requests
  - python-dotenv
  - ipywidgets
  - pyarrow
  - pip:
    - gradio
    - tiktoken
    - langchain-openai
    - langchain-core
    - langchain-community

You can also clone or download the full environment files directly from the public repo here:

llm-engineering-busybuilders (GitHub)

Install Anaconda (if you don't have it)

Since we're using conda to manage environments, make sure you have Anaconda or Miniconda installed on your system.

If you don't have it yet, you can download and install it here:

Anaconda (full version) — includes many data science packages by default
Miniconda (lighter version) — minimal install, we'll add only what we need

Either one works for this series. Personally, I am using Anaconda in my local machine.

Tip: After installing Anaconda or Miniconda, restart your terminal to make sure conda is recognized. If conda --version doesn’t work, it's likely your terminal hasn’t picked up the new PATH yet.

Create & activate the environment

Now we create the environment based on the file we just built. In your terminal (from the project directory), run:

conda env create -f environment.yml


conda activate llms-busybuilders

That's it — Conda will install everything in one go.

First-time installs: This step may take a few minutes depending on your machine and internet speed. Conda resolves system dependencies, downloads packages, and sets everything up cleanly.

Windows users: If you're on Windows, highly recommend running this inside Anaconda Prompt or Windows Subsystem for Linux (WSL) for better compatibility with ML packages.

Launch JupyterLab

With your environment ready, we can now launch JupyterLab — this is where most of our LLM experimentation will happen.

bash

jupyter lab

This will open JupyterLab directly in your browser. We'll write, test, and iterate all our code here as we build LLM agents and workflows.

Note: If you see any Jupyter-related package conflicts, you can update with conda update jupyterlab.

New to JupyterLab? Check out the JupyterLab Cheatsheet — a quick guide to notebooks, cells, shortcuts, and more.

Set up your OpenAI account & API key

For this series, we'll start with OpenAI as our model provider — but the general environment and workflows we build will work with other platforms too (like Anthropic, Gemini, Azure OpenAI, and others). As we progress, we'll cover how to adapt to different providers.

Quick reality check: local models vs API models

There are many excellent free models you can run locally using tools like Ollama — for example, Llama 3, Mistral, Phi, etc. These are great for experimentation, testing, and smaller projects. But most local models run with a few billion parameters (typically 3B to 70B).

When you call APIs like OpenAI's GPT models, Anthropic's Claude models, or Gemini models, you're tapping into much larger models — rumored to be trained on trillions of parameters. These models often produce much more advanced reasoning, planning, context handling, and language generation, which is why we'll start there for our initial experiments.

We'll explore both worlds later — but for now, we'll connect to OpenAI's API as our first production-grade model provider.

Since we're starting with OpenAI, you'll need an API key to connect your code to their models.

Go to OpenAI Platform. and create an account if you don't already have one.
Once signed in, click your user profile → Manage Account → Billing.
You'll be asked to add a small initial credit (typically $5 USD minimum for pay-as-you-go access).
After adding credit, go to API Keys in your dashboard and click Create new secret key.
Copy this key and store it somewhere safe — you won't be able to see it again once you leave the page.

Why the $5? OpenAI requires this initial credit for API access. You'll only be charged based on usage — most small experiments cost just a few cents.

Store your key securely

We don't want to hardcode API keys into our notebooks. Instead, we'll store the key in a .env file, which is a common way to safely load secrets while coding.

Mac / Linux:

Inside your project folder, create a new file:

nano .env

Add the following line (replace with your real key):

OPENAI_API_KEY=sk-xxxxxxx

Save and exit (Ctrl+O → Enter → Ctrl+X).

Windows:

Open Notepad.
Paste the same content:

OPENAI_API_KEY=sk-xxxxxxx

Go to File → Save As.
Set Save as type: All Files.
Save the file as .env inside your project folder.

Watch out: The filename must be exactly .env — not .env.txt or anything else. Windows sometimes tries to sneak in an extra file extension.

That's it for Part 1

You now have:

A fully working LLM development environment
JupyterLab running for all our experiments
A minimal, easy-to-evolve stack that won't slow us down as we build

In Part 2, we'll go hands-on and start working with prompts, models, and our first real LLM workflows.

As always, this stack may evolve as we go deeper — but you now have your Busy Builders playground ready.

Repo note: I'll post the repo link with all environment files once we publish the next parts so you can clone and follow along easily.