Learn and Level up your Generative AI expertise. Everyone can listen and learn GenAI any time, any where.
Whether you're just starting or looking to dive deep,...
Dive into the groundbreaking world of DeepSeek LLM, an open-source language model that's challenging the dominance of closed-source AI.
This episode unpacks the secrets behind DeepSeek's impressive capabilities, exploring its unique Mixture-of-Experts (MoE) architecture that optimizes performance and allows it to run efficiently on consumer-grade hardware.
We'll delve into its multi-stage training process, from massive pre-training to supervised fine-tuning and reinforcement learning, revealing how DeepSeek learns through trial and error, even developing human-like self-verification and reflection.
Discover how DeepSeek excels in diverse domains, from complex math and coding challenges to general reasoning tasks, often outperforming even established models. We'll also explore DeepSeek's specialized tools like DeepSeek Coder and DeepSeek Math, demonstrating its versatility, and look at how its knowledge distillation process allows smaller models to inherit its advanced reasoning abilities, making powerful AI more accessible to all.
Join us as we explore the potential impact of DeepSeek, both for the scientific community and for everyday applications, and discuss the ethical considerations that come with these advanced AI tools.
--------
15:42
Titans: Learning to Memorize at Test Time
Are current AI models hitting a memory wall? Join us as we delve into the fascinating research behind "Titans: Learning to Memorize at Test Time," an innovative approach to AI learning.
The podcast covers key concepts from the paper, including:
The challenges of long-term memory in AI, noting that models like Transformers are good at understanding immediate relationships but struggle with retaining information from the past.
How the Titan model addresses these limitations by equipping AI with both short-term and long-term memory.
The concept of "learning to memorize at test time", where the model figures out what is important to remember as it encounters new information.
The use of a surprise-based approach, where the model prioritizes information that is most surprising or unexpected.
The combination of surprise-based long-term memory with a more traditional short-term memory.
The way long-term memory is stored, which is within the parameters of a deep neural network.
The use of a technique similar to gradient descent with momentum for efficient memory formation.
The model's built-in forgetting mechanism to manage memory capacity and prioritize important information.
The use of attention to guide the search for relevant information in long-term memory.
The ability of Titans to handle longer sequences of information by using long-term memory to free up short-term memory.
The advantages of Titans in real-world applications such as language modeling, common sense reasoning, and the needle in a haystack problem.
The three variants of the Titan architecture: Memory as a Context (MAC), Memory as a Gate (MAG), and Memory as a Layer (MAL). Each variant uses long-term memory differently.
--------
17:36
Memory Layers at Scale: Revolutionizing AI Efficiency and Factuality
Join us for an in-depth exploration of the groundbreaking research paper, "Memory Layers at Scale." Discover how trainable key-value lookup mechanisms are transforming the landscape of AI by making large-scale models more efficient, accurate, and capable of continuous learning.
We'll unpack the innovations behind memory layers, including product-key lookup and parallel memory techniques, and discuss their implications for democratizing AI development.
Learn how these advancements are paving the way for smarter, more adaptable AI systems while addressing challenges like computational efficiency, scalability, and ethical considerations.
Whether you're an AI enthusiast, a researcher, or just curious about the future of intelligent systems, this episode offers insights into a paradigm shift in AI development.
--------
19:10
LOCOMO: Unlocking Long-Term Memory in Conversational AI
How well can AI remember and use information in long conversations?
This episode explores the groundbreaking LOCOMO dataset, a unique resource designed to evaluate long-term conversational memory in Large Language Models (LLMs).
We delve into the challenges of current AI in maintaining coherent, empathetic conversations over multiple sessions. Discover how the LOCOMO dataset, generated through a human-machine pipeline with unique personas, temporal event graphs, and multimodal dialogue capabilities, is pushing the boundaries of conversational AI.
We discuss key findings from experiments using base models, long-context LLMs, and Retrieval Augmented Generation (RAG) techniques, revealing limitations and promising approaches for improving long-term memory. We'll also examine the ethical considerations of creating realistic conversational agents that can remember our past interactions.
Learn about the importance of structured information like observations about speakers and retrieval based methods, in order to create truly conversational AI.
--------
8:23
Beyond the Short Chat: Exploring Long-Term Memory in AI
Ready for a deep dive into the fascinating world of large language models?
In this episode, we push AI chatbots to their conversational limits—spanning hundreds of turns, multiple sessions, and even images—to find out how well they remember and understand context over time.
We delve into a groundbreaking dataset called “Locomo” that evaluates an AI’s ability to recall events, summarize complex stories, and navigate tricky, adversarial questions.
We also discuss how giving these models structured notes (or “observations”) can dramatically improve their performance—and why they still struggle with understanding time, cause and effect, and cleverly worded “gotcha” questions.
Finally, we look ahead at emerging possibilities when AI gains access to richer, multimodal inputs like audio and video.
Join us for a thought-provoking conversation on what it takes to give AI a more human-like sense of memory, context, and experience—and why it matters for the future of technology and society.
Learn and Level up your Generative AI expertise. Everyone can listen and learn GenAI any time, any where.
Whether you're just starting or looking to dive deep, this series covers everything from Level 1 to 10 – from foundational concepts like neural networks to advanced topics like multimodal models and ethical AI. Each level is packed with expert insights, actionable takeaways, and engaging discussions that make learning AI accessible and inspiring.
🔊 Stay tuned as we launch this transformative learning adventure – one podcast at a time. Let’s level up together! 💡✨
#learn #generative #ai