In partnership with

The Future of Shopping? AI + Actual Humans.

AI has changed how consumers shop by speeding up research. But one thing hasn’t changed: shoppers still trust people more than AI.

Levanta’s new Affiliate 3.0 Consumer Report reveals a major shift in how shoppers blend AI tools with human influence. Consumers use AI to explore options, but when it comes time to buy, they still turn to creators, communities, and real experiences to validate their decisions.

The data shows:

  • Only 10% of shoppers buy through AI-recommended links

  • 87% discover products through creators, blogs, or communities they trust

  • Human sources like reviews and creators rank higher in trust than AI recommendations

The most effective brands are combining AI discovery with authentic human influence to drive measurable conversions.

Affiliate marketing isn’t being replaced by AI, it’s being amplified by it.

ML Jargon Guide Part 2 | ResearchAudio.io

ResearchAudio.io

The Complete ML Jargon Guide

Part 2: Training, Architectures, and Common Terms

Welcome to Part 2 of our ML jargon guide. Yesterday we covered the foundations: core concepts, how models learn, neural network components, and language models.

Today we complete the picture with training approaches, model architectures, multimodal systems, and the buzzwords you hear constantly in AI discussions.

Part 2 Contents

5. Training Approaches — Different ways to teach AI
6. Architectures — Blueprints for building AI systems
7. Multimodal Systems — AI that sees, reads, and hears
8. Common Terms — Buzzwords decoded
SECTION 5

Training Approaches

Understanding these methods explains how models go from raw text prediction to helpful assistants.

How Modern LLMs Are Built

Pre-training
Learn from internet
SFT
Learn from examples
RLHF
Learn from feedback

Each stage builds on the previous. Pre-training creates raw capability, SFT teaches format, RLHF aligns with preferences.

Pre-training

The initial phase where a model learns from massive amounts of general data—often most of the internet. Creates a "foundation model" with broad knowledge. Extremely expensive, costing millions in compute. The result can predict text but is not yet helpful.

Analogy: General education from kindergarten through 12th grade. Broad foundation before specialization.

Fine-tuning

Takes a pre-trained model and trains it further on specific data to specialize it. Much cheaper than pre-training because you build on existing knowledge. Companies fine-tune on internal documents to create custom assistants.

Analogy: A trained chef taking a two-week Japanese cuisine course. They specialize existing skills.

RL and RLHF

Reinforcement Learning (RL): Learning through trial and error with rewards. RLHF (RL from Human Feedback): Humans rate outputs as good or bad; the model learns to produce higher-rated responses. This is how ChatGPT became helpful.

Analogy: Training a dog with treats. Good behavior gets rewards. RLHF is treats for AI.

SFT (Supervised Fine-Tuning)

Training with human-written examples of ideal responses. Provide pairs of prompts and perfect answers; the model learns to mimic that style. SFT typically happens before RLHF—teach format with examples first, then refine with feedback.

Analogy: A writing class where the teacher shows A+ essays. Learn by studying excellent examples.

LoRA (Low-Rank Adaptation)

Efficient fine-tuning technique. Instead of updating all billions of parameters, LoRA adds small "adapter" layers and trains only those. Original model stays frozen. Dramatically reduces memory and compute while achieving similar results.

Analogy: Instead of renovating your entire house, you add a new room. Main structure stays intact.

SECTION 6

Architectures

Different blueprints for building AI systems. The transformer has become dominant, but alternatives provide context.

Attention: What the Model Focuses On

"The cat sat on the mat because it was tired"

When processing "it," attention determines that "cat" is most relevant. This is how models resolve references.

Transformer

The architecture behind GPT, Claude, Llama, and most modern AI. Introduced in the 2017 paper "Attention Is All You Need." Processes all tokens simultaneously using attention mechanisms. Unlike older sequential models, transformers can be massively parallelized.

Analogy: Older models read like a typewriter, one letter at a time. Transformers read like you—seeing the whole page at once.

Attention Mechanism

Allows the model to focus on relevant parts of input when processing each position. For every token, attention computes how much to "attend to" every other token. Self-attention, multi-head attention, and cross-attention are variations.

Analogy: At a crowded party, you focus on one conversation while background noise fades away.

CNN (Convolutional Neural Network)

Designed for processing images. Uses "filters" that slide across the image, detecting patterns at different scales—first edges, then shapes, then objects. Dominated computer vision before transformers. Still efficient for many image tasks.

Analogy: Examining a photograph through a magnifying glass, one patch at a time. First edges, then shapes, then "that is a cat."

MoE (Mixture of Experts)

Contains multiple "expert" sub-networks. A router decides which experts handle each input. More total parameters while only activating a fraction per input, making them efficient. Mixtral and some GPT-4 variants use MoE.

Analogy: A hospital with specialists. Triage routes you to the right expert—cardiologist, orthopedist, etc.

SECTION 7

Multimodal Systems

Modern AI increasingly handles multiple input types—text, images, audio, video—simultaneously.

Multimodal: Multiple Input Types

📝

Text

🖼

Images

🔊

Audio

🎬

Video

Unified Understanding

Multimodal

AI that processes multiple input types: text, images, audio, video. GPT-4V, Claude with vision, and Gemini are multimodal. You can show them images and ask questions—much more powerful than describing things in words.

Analogy: Humans are naturally multimodal—we see, hear, and read simultaneously. AI is catching up.

Vision Encoder

Converts images into numerical representations (embeddings) the language model can understand. Takes raw pixels and outputs numbers in the same format as text embeddings, allowing unified processing.

Analogy: Your eyes and visual cortex. Light hits retina (pixels), converts to neural signals (embeddings).

CLIP

OpenAI model connecting images and text in a shared embedding space. Trained on millions of image-caption pairs. Given an image, finds matching text. Given text, finds matching images. Powers many search and generation systems.

Analogy: A translator who speaks both "image" and "text" languages fluently.

SECTION 8

Common Terms

These terms appear constantly in AI discussions. Understanding them helps you follow conversations and evaluate claims.

Hallucination

When AI confidently generates false information. Invents citations, states incorrect facts, makes up events—all while sounding confident. A fundamental limitation because LLMs predict plausible text, not necessarily true text.

Analogy: A friend who always has a story but sometimes fills gaps with plausible-sounding made-up details.

RAG (Retrieval-Augmented Generation)

Instead of relying solely on training knowledge, RAG first searches a database for relevant information, then feeds that context to generate a response. Grounds output in specific sources and reduces hallucination.

Analogy: Open-book versus closed-book exam. RAG looks up information before answering.

Prompt Engineering

Crafting instructions to get better AI outputs. Small wording changes can dramatically affect quality. Techniques include being specific, providing examples, asking for step-by-step reasoning, and specifying output format.

Analogy: Interview technique. "Tell me about yourself" rambles. "What is your biggest achievement?" gets focused responses.

Chain of Thought

Asking the model to show reasoning step by step before the final answer. Adding "Let's think step by step" can significantly improve accuracy on math, logic, and reasoning problems.

Analogy: "Show your work" on a math test. Writing steps helps catch errors along the way.

Zero-shot and Few-shot

Zero-shot: No examples, just task description. Few-shot: A few examples provided first. Few-shot often works better because examples demonstrate expected format and style.

Analogy: Asking someone to write a poem (zero-shot) vs showing them three poems you like first (few-shot).

AGI (Artificial General Intelligence)

Hypothetical AI that can perform any intellectual task a human can. Current AI is "narrow"—excellent at specific tasks but not truly general-purpose. AGI would learn any skill, reason across domains, adapt to novel situations. Timelines hotly debated.

Analogy: Current AI is a chess grandmaster who cannot make toast. AGI would learn any skill you throw at it.

Benchmark

Standardized test for comparing AI models. MMLU tests general knowledge, HumanEval tests coding, HellaSwag tests common sense. Enables apples-to-apples comparisons, though models can optimize for benchmarks without matching real-world performance.

Analogy: Standardized tests like the SAT. Everyone takes the same test for fair comparison.

Complete Quick Reference

Fundamentals

Model — Learned patterns from data

Training — Learning phase

Inference — Using phase

Parameters — Adjustable values

Learning

Loss — Error measurement

Gradient — Direction to improve

Epoch — One complete data pass

Backprop — Tracing errors backward

Language Models

Token — Text unit, about 4 chars

Embedding — Words as numbers

Context — How much model sees

Temperature — Creativity level

Key Terms

Transformer — Dominant architecture

Attention — Focus mechanism

RAG — Retrieval + generation

Hallucination — Confident errors

Combined with Part 1, you now have 45 essential terms to navigate any AI conversation. Save both parts as your reference guide.

ResearchAudio.io

AI research explained clearly.

Unsubscribe · View online

Keep Reading

No posts found