{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Welcome to the start of your adventure in Agentic AI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", "

Are you ready for action??

\n", " Have you completed all the setup steps in the setup folder?
\n", " Have you read the README? Many common questions are answered here!
\n", " Have you checked out the guides in the guides folder?
\n", " Well in that case, you're ready!!\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", "

This code is a live resource - keep an eye out for my updates

\n", " I push updates regularly. As people ask questions or have problems, I add more examples and improve explanations. As a result, the code below might not be identical to the videos, as I've added more steps and better comments. Consider this like an interactive book that accompanies the lectures.

\n", " I try to send emails regularly with important updates related to the course. You can find this in the 'Announcements' section of Udemy in the left sidebar. You can also choose to receive my emails via your Notification Settings in Udemy. I'm respectful of your inbox and always try to add value with my emails!\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### And please do remember to contact me if I can help\n", "\n", "And I love to connect: https://www.linkedin.com/in/eddonner/\n", "\n", "\n", "### New to Notebooks like this one? Head over to the guides folder!\n", "\n", "Just to check you've already added the Python and Jupyter extensions to Cursor, if not already installed:\n", "- Open extensions (View >> extensions)\n", "- Search for python, and when the results show, click on the ms-python one, and Install it if not already installed\n", "- Search for jupyter, and when the results show, click on the Microsoft one, and Install it if not already installed \n", "Then View >> Explorer to bring back the File Explorer.\n", "\n", "And then:\n", "1. Click where it says \"Select Kernel\" near the top right, and select the option called `.venv (Python 3.12.9)` or similar, which should be the first choice or the most prominent choice. You may need to choose \"Python Environments\" first.\n", "2. Click in each \"cell\" below, starting with the cell immediately below this text, and press Shift+Enter to run\n", "3. Enjoy!\n", "\n", "After you click \"Select Kernel\", if there is no option like `.venv (Python 3.12.9)` then please do the following: \n", "1. On Mac: From the Cursor menu, choose Settings >> VS Code Settings (NOTE: be sure to select `VSCode Settings` not `Cursor Settings`); \n", "On Windows PC: From the File menu, choose Preferences >> VS Code Settings(NOTE: be sure to select `VSCode Settings` not `Cursor Settings`) \n", "2. In the Settings search bar, type \"venv\" \n", "3. In the field \"Path to folder with a list of Virtual Environments\" put the path to the project root, like C:\\Users\\username\\projects\\agents (on a Windows PC) or /Users/username/projects/agents (on Mac or Linux). \n", "And then try again.\n", "\n", "Having problems with missing Python versions in that list? Have you ever used Anaconda before? It might be interferring. Quit Cursor, bring up a new command line, and make sure that your Anaconda environment is deactivated: \n", "`conda deactivate` \n", "And if you still have any problems with conda and python versions, it's possible that you will need to run this too: \n", "`conda config --set auto_activate_base false` \n", "and then from within the Agents directory, you should be able to run `uv python list` and see the Python 3.12 version." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "# First let's do an import. If you get an Import Error, double check that your Kernel is correct..\n", "\n", "from dotenv import load_dotenv\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Next it's time to load the API keys into environment variables\n", "# If this returns false, see the next cell!\n", "\n", "load_dotenv(override=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Wait, did that just output `False`??\n", "\n", "If so, the most common reason is that you didn't save your `.env` file after adding the key! Be sure to have saved.\n", "\n", "Also, make sure the `.env` file is named precisely `.env` and is in the project root directory (`agents`)\n", "\n", "By the way, your `.env` file should have a stop symbol next to it in Cursor on the left, and that's actually a good thing: that's Cursor saying to you, \"hey, I realize this is a file filled with secret information, and I'm not going to send it to an external AI to suggest changes, because your keys should not be shown to anyone else.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", "

Final reminders

\n", " 1. If you're not confident about Environment Variables or Web Endpoints / APIs, please read Topics 3 and 5 in this technical foundations guide.
\n", " 2. If you want to use AIs other than OpenAI, like Gemini, DeepSeek or Ollama (free), please see the first section in this AI APIs guide.
\n", " 3. If you ever get a Name Error in Python, you can always fix it immediately; see the last section of this Python Foundations guide and follow both tutorials and exercises.
\n", "
\n", "
" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OpenAI API Key exists and begins sk-proj-\n", "DeepSeek API Key exists and begins sk-18925\n" ] } ], "source": [ "# Check the key - if you're not using OpenAI, check whichever key you're using! Ollama doesn't need a key.\n", "\n", "import os\n", "openai_api_key = os.getenv('OPENAI_API_KEY')\n", "deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')\n", "\n", "if openai_api_key:\n", " print(f\"OpenAI API Key exists and begins {openai_api_key[:8]}\")\n", "else:\n", " print(\"OpenAI API Key not set - please head to the troubleshooting guide in the setup folder\")\n", "\n", "if deepseek_api_key:\n", " print(f\"DeepSeek API Key exists and begins {deepseek_api_key[:8]}\")\n", "else:\n", " print(\"DeepSeek API Key not set - please head to the troubleshooting guide in the setup folder\")\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# And now - the all important import statement\n", "# If you get an import error - head over to troubleshooting in the Setup folder\n", "# Even for other LLM providers like Gemini, you still use this OpenAI import - see Guide 9 for why\n", "\n", "from openai import OpenAI" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# And now we'll create an instance of the OpenAI class\n", "# If you're not sure what it means to create an instance of a class - head over to the guides folder (guide 6)!\n", "# If you get a NameError - head over to the guides folder (guide 6)to learn about NameErrors - always instantly fixable\n", "# If you're not using OpenAI, you just need to slightly modify this - precise instructions are in the AI APIs guide (guide 9)\n", "\n", "openai = OpenAI()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Create a list of messages in the familiar OpenAI format\n", "\n", "messages = [{\"role\": \"user\", \"content\": \"What is 2+2?\"}]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4. In base-10 arithmetic, 2 + 2 = 4. If you’re curious about other bases, the result can look different (e.g., in base 3 it’s 11).\n" ] } ], "source": [ "# And now call it! Any problems, head to the troubleshooting guide\n", "# This uses GPT 4.1 nano, the incredibly cheap model\n", "# The APIs guide (guide 9) has exact instructions for using even cheaper or free alternatives to OpenAI\n", "# If you get a NameError, head to the guides folder (guide 6) to learn about NameErrors - always instantly fixable\n", "\n", "response = openai.chat.completions.create(\n", " model=\"gpt-5-nano\",\n", " messages=messages\n", ")\n", "\n", "print(response.choices[0].message.content)\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# And now - let's ask for a question:\n", "\n", "question = \"Please propose a hard, challenging question to assess someone's IQ. Respond only with the question.\"\n", "messages = [{\"role\": \"user\", \"content\": question}]\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "You have 12 visually identical coins, but one coin is counterfeit and has a different weight (you do not know whether it is heavier or lighter). Using only a balance scale and at most three weighings, identify exactly which coin is counterfeit and state whether it is heavier or lighter — what weighing strategy guarantees this?\n" ] } ], "source": [ "# ask it - this uses GPT 4.1 mini, still cheap but more powerful than nano\n", "\n", "response = openai.chat.completions.create(\n", " model=\"gpt-5-mini\",\n", " messages=messages\n", ")\n", "\n", "question = response.choices[0].message.content\n", "\n", "print(question)\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# form a new messages list\n", "messages = [{\"role\": \"user\", \"content\": question}]\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "This is the classical 12-coin puzzle. There are 12 coins, one is counterfeit and is either heavier or lighter; you may use a balance (three-way result: left heavy / right heavy / balance) and are allowed at most 3 weighings. Here is a guaranteed strategy and how to interpret every outcome.\n", "\n", "Label the coins 1,2,...,12.\n", "\n", "Weighing 1\n", "- Put coins 1,2,3,4 on the left pan and 5,6,7,8 on the right pan.\n", "\n", "There are two broad cases.\n", "\n", "A. Weighing 1 balances.\n", "- Then coins 1–8 are genuine. The counterfeit is one of 9,10,11,12.\n", "\n", "Weighing 2 (case A)\n", "- Put 9,10,11 on the left and 1,2,3 (known good) on the right.\n", "\n", "Interpretation after Weighing 2:\n", "- If it balances → counterfeit is coin 12.\n", " - Weighing 3: weigh 12 against any known-good coin (say 1). If 12 = 1 → impossible (shouldn't happen), but if 12 heavier → 12 is heavy, if 12 lighter → 12 is light. Done.\n", "- If left side (9,10,11) is heavier than right → counterfeit is one of {9,10,11} and is heavier.\n", " - Weighing 3: weigh 9 vs 10. If they balance → 11 is heavy. If left side heavier → 9 is heavy. If right side heavier → 10 is heavy.\n", "- If left side is lighter than right → counterfeit is one of {9,10,11} and is lighter.\n", " - Weighing 3: weigh 9 vs 10. If they balance → 11 is light. If left side lighter → 9 is light. If right side lighter → 10 is light.\n", "\n", "B. Weighing 1 does not balance.\n", "- WLOG suppose left pan (1–4) is heavier than right pan (5–8). (If instead the right pan is heavier, rename coins so the heavier side becomes 1–4 and the lighter becomes 5–8 and proceed identically; this symmetry covers both possibilities.)\n", "- From the first weighing, the counterfeit is one of {1,2,3,4} (and is heavy) or one of {5,6,7,8} (and is light).\n", "\n", "Weighing 2 (case B, after relabeling so 1–4 were heavy and 5–8 light)\n", "- Put coins 1,2,5 on the left and 3,6,9 on the right (coin 9 we know was not involved in W1 so treat it as a known genuine coin for this branch).\n", "\n", "Interpretation after Weighing 2 (given W1 left was heavier):\n", "- If W2 balances → none of 1,2,3,5,6 is counterfeit; so the counterfeit must be either 4 (heavy) or 7 or 8 (light).\n", " - Weighing 3: weigh 7 vs 8. If they balance → coin 4 is heavy. If they do not balance, the lighter of 7 and 8 is the counterfeit and is light.\n", "- If W2 is left heavier → possibilities are: 1 heavy, 2 heavy, or 6 light.\n", " - Weighing 3: weigh 1 vs 2. If they balance → coin 6 is light. If they do not balance, the heavier one is the heavy counterfeit (so if 1 side is down, 1 is heavy; if 2 side is down, 2 is heavy).\n", "- If W2 is left lighter (i.e. right heavier) → possibilities are: 3 heavy or 5 light.\n", " - Weighing 3: weigh 5 vs 9 (9 known good). If they balance → coin 3 is heavy. If 5 is lighter than 9 → coin 5 is light.\n", "\n", "That covers every possible outcome of the three weighings. Each branch identifies the single counterfeit coin and whether it is heavier or lighter.\n" ] } ], "source": [ "# Ask it again\n", "\n", "response = openai.chat.completions.create(\n", " model=\"gpt-5-mini\",\n", " messages=messages\n", ")\n", "\n", "answer = response.choices[0].message.content\n", "print(answer)\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "This is the classical 12-coin puzzle. There are 12 coins, one is counterfeit and is either heavier or lighter; you may use a balance (three-way result: left heavy / right heavy / balance) and are allowed at most 3 weighings. Here is a guaranteed strategy and how to interpret every outcome.\n", "\n", "Label the coins 1,2,...,12.\n", "\n", "Weighing 1\n", "- Put coins 1,2,3,4 on the left pan and 5,6,7,8 on the right pan.\n", "\n", "There are two broad cases.\n", "\n", "A. Weighing 1 balances.\n", "- Then coins 1–8 are genuine. The counterfeit is one of 9,10,11,12.\n", "\n", "Weighing 2 (case A)\n", "- Put 9,10,11 on the left and 1,2,3 (known good) on the right.\n", "\n", "Interpretation after Weighing 2:\n", "- If it balances → counterfeit is coin 12.\n", " - Weighing 3: weigh 12 against any known-good coin (say 1). If 12 = 1 → impossible (shouldn't happen), but if 12 heavier → 12 is heavy, if 12 lighter → 12 is light. Done.\n", "- If left side (9,10,11) is heavier than right → counterfeit is one of {9,10,11} and is heavier.\n", " - Weighing 3: weigh 9 vs 10. If they balance → 11 is heavy. If left side heavier → 9 is heavy. If right side heavier → 10 is heavy.\n", "- If left side is lighter than right → counterfeit is one of {9,10,11} and is lighter.\n", " - Weighing 3: weigh 9 vs 10. If they balance → 11 is light. If left side lighter → 9 is light. If right side lighter → 10 is light.\n", "\n", "B. Weighing 1 does not balance.\n", "- WLOG suppose left pan (1–4) is heavier than right pan (5–8). (If instead the right pan is heavier, rename coins so the heavier side becomes 1–4 and the lighter becomes 5–8 and proceed identically; this symmetry covers both possibilities.)\n", "- From the first weighing, the counterfeit is one of {1,2,3,4} (and is heavy) or one of {5,6,7,8} (and is light).\n", "\n", "Weighing 2 (case B, after relabeling so 1–4 were heavy and 5–8 light)\n", "- Put coins 1,2,5 on the left and 3,6,9 on the right (coin 9 we know was not involved in W1 so treat it as a known genuine coin for this branch).\n", "\n", "Interpretation after Weighing 2 (given W1 left was heavier):\n", "- If W2 balances → none of 1,2,3,5,6 is counterfeit; so the counterfeit must be either 4 (heavy) or 7 or 8 (light).\n", " - Weighing 3: weigh 7 vs 8. If they balance → coin 4 is heavy. If they do not balance, the lighter of 7 and 8 is the counterfeit and is light.\n", "- If W2 is left heavier → possibilities are: 1 heavy, 2 heavy, or 6 light.\n", " - Weighing 3: weigh 1 vs 2. If they balance → coin 6 is light. If they do not balance, the heavier one is the heavy counterfeit (so if 1 side is down, 1 is heavy; if 2 side is down, 2 is heavy).\n", "- If W2 is left lighter (i.e. right heavier) → possibilities are: 3 heavy or 5 light.\n", " - Weighing 3: weigh 5 vs 9 (9 known good). If they balance → coin 3 is heavy. If 5 is lighter than 9 → coin 5 is light.\n", "\n", "That covers every possible outcome of the three weighings. Each branch identifies the single counterfeit coin and whether it is heavier or lighter." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import Markdown, display\n", "\n", "display(Markdown(answer))\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Congratulations!\n", "\n", "That was a small, simple step in the direction of Agentic AI, with your new environment!\n", "\n", "Next time things get more interesting..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", "

Exercise

\n", " Now try this commercial application:
\n", " First ask the LLM to pick a business area that might be worth exploring for an Agentic AI opportunity.
\n", " Then ask the LLM to present a pain-point in that industry - something challenging that might be ripe for an Agentic solution.
\n", " Finally have 3 third LLM call propose the Agentic AI solution.
\n", " We will cover this at up-coming labs, so don't worry if you're unsure.. just give it a try!\n", "
\n", "
" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# First create the messages:\n", "\n", "messages = [{\"role\": \"user\", \"content\": \"Pick a business area that might be worth exploring for an Agentic AI opportunity.\"}]\n", "\n", "# Then make the first call:\n", "\n", "response = openai.chat.completions.create(\n", " model=\"gpt-5-mini\",\n", " messages=messages\n", ")\n", "\n", "# Then read the business idea:\n", "\n", "business_idea = response.choices[0].message.content\n", "\n", "# And repeat! In the next message, include the business idea within the message" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# print(business_idea)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "Suggested business area: Autonomous Supply‑Chain & Procurement Orchestration for mid‑market manufacturers and distributors.\n", "\n", "Why this is promising\n", "- Large, fragmented problem space with clear economic ROI: procurement and logistics are 8–15% of COGS on average; small percentage improvements rapidly translate to large dollar savings.\n", "- High-frequency, decision‑rich workflows (sourcing, reordering, exception handling, expedited shipments) that are well suited to agentic AI — the agent can monitor, decide, interact with vendors, update ERPs, and escalate only when needed.\n", "- Many SMEs are digitally under‑automated (legacy ERPs, email/phone vendor interactions), so productization and integration can unlock outsized value.\n", "- Competitive moat: integrations across ERPs, supplier portals, freight platforms, and proprietary negotiation/optimization logic.\n", "\n", "Concrete agentic use cases\n", "1. Dynamic Reorder Agent\n", " - Monitors inventory, forecasts demand, and places optimal replenishment orders across dozens of SKUs, accounting for lead times, lot sizes, and cash constraints.\n", " - Outcome: fewer stockouts, lower safety stock, reduced working capital.\n", "\n", "2. Sourcing & Negotiation Agent\n", " - Identifies alternate suppliers, requests quotes, compares TCO (price, lead time, quality risk), and conducts scripted negotiation conversations via email/API to improve margins.\n", " - Outcome: lower unit costs and faster supplier qualification.\n", "\n", "3. Exception & Expedited Logistics Manager\n", " - Detects shipment delays via carrier APIs, auto‑rebooks faster options if cost/penalty thresholds are met, generates and routes claims, and communicates status updates to sales/ops.\n", " - Outcome: fewer late deliveries, lower penalty costs, better customer satisfaction.\n", "\n", "4. Compliance & Contract Agent\n", " - Monitors contract expirations, enforces approved supplier lists, audits purchase orders, and flags contract breaches for remediation.\n", " - Outcome: lower compliance risk, fewer maverick purchases.\n", "\n", "Value props & measurable KPIs\n", "- Reduce procurement spend by 3–8% (negotiation + sourcing).\n", "- Cut working capital tied to inventory by 10–30% (improved forecasts + dynamic reorder).\n", "- Reduce labor hours spent on vendor coordination/PO exceptions by 40–70%.\n", "- Lower expedited freight spend and late shipments by 20–50%.\n", "\n", "Technical requirements & architecture\n", "- Agent core: multi‑step planning, tool use, and memory (stateful dialogues, long‑term supplier profiles).\n", "- Integrations: ERP (SAP/Oracle/Netsuite/QuickBooks), WMS, TMS/carrier APIs, supplier portals/email, BI/forecasting models.\n", "- Data: SKU master, historical demand, lead times, contracts, PO history, invoice/payment data.\n", "- Safety & governance: action approval workflows, audit logs, RBAC, explainability for decisions, human-in-the-loop escalation thresholds.\n", "- Security: enterprise SSO, least privilege API keys, encryption, SOC2 compliance.\n", "\n", "Regulatory/operational risks & mitigations\n", "- Risk: Agent makes erroneous orders or unauthorized payments.\n", " - Mitigation: start with read‑only and recommendation mode; phased escalation to auto‑execution with spend limits and dual signoffs.\n", "- Risk: Bad supplier selection causing quality/recall exposure.\n", " - Mitigation: enforce supplier scorecards, require human signoff for unvetted suppliers.\n", "- Risk: Integration fragility across heterogeneous ERPs.\n", " - Mitigation: build a robust integration layer, start with adapters for top ERPs + an email/scrape fallback.\n", "\n", "Go‑to‑market / commercial model\n", "- Target customers: mid‑market manufacturers/distributors with 50–1,000 employees, 10k+ SKUs, or >$20M procurement spend.\n", "- Sales motions:\n", " - Start with pilot projects sold as 3–6 month outcomes (e.g., reduce PO exceptions by X, lower PTP cycle time by Y).\n", " - Offer monthly SaaS + success fee (share of realized savings) to align incentives.\n", "- Pricing: base SaaS + per‑transaction/automation fee + % of savings for negotiation wins.\n", "\n", "MVP suggestion (fast, low‑risk)\n", "- Focus single agent on PO Exception Resolution and Expedited Logistics:\n", " - Integrate with one popular ERP (e.g., NetSuite) + one carrier API + email parsing.\n", " - Agent identifies delayed shipments, proposes options (reroute, expedite, partial ship), drafts communications to vendors/customers, logs recommended cost vs penalty.\n", " - Start in recommendation mode (no automatic purchases) and measure time saved and reduction in late shipments.\n", "\n", "Key metrics to validate with a pilot\n", "- Time saved per week for procurement team.\n", "- Reduction in late shipments and expedited freight spend.\n", "- % of vendor negotiations where the agent found a lower cost option.\n", "- Net dollar savings attributed to agent actions.\n", "\n", "Next steps (30/60/90)\n", "- 0–30 days: Customer discovery interviews with 10 target companies (procurement/manufacturing ops) to validate pain points and willingness to pilot; secure 1 pilot.\n", "- 30–60 days: Build MVP agent for PO exception handling and NetSuite integration; run internal tests with synthetic data.\n", "- 60–90 days: Deploy pilot with one customer, run in recommendation mode, collect KPIs and iterate UX/decision thresholds; plan phased auto‑execution rollout.\n", "\n", "Why now\n", "- Many companies are investing in digitizing supply chains post‑pandemic and are open to AI-driven efficiency.\n", "- Agentic AI has matured enough to handle multi‑step tasks, orchestrate tools, and maintain memory across interactions — ideal for procurement workflows.\n", "\n", "If you’d like, I can:\n", "- Sketch a prioritized feature roadmap for an MVP.\n", "- Draft a short pilot plan and data/metrics collection template.\n", "- Identify top 5 supplier/ERP integrations to prioritize. Which would you prefer?" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import Markdown, display\n", "\n", "display(Markdown(business_idea))\n", "\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "messages = [{\"role\": \"user\", \"content\": \"Present a pain-point in that industry - something challenging that might be ripe for an Agentic solution.\"}]\n", "\n", "response = openai.chat.completions.create(\n", " model=\"gpt-5-mini\",\n", " messages=messages\n", ")\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "Which industry do you mean? I can tailor the answer precisely — but here’s one high-impact, broadly relevant pain-point and why it’s ripe for an agentic solution.\n", "\n", "Pain-point: Real-time supply-chain disruption resolution (manufacturing / retail / logistics)\n", "\n", "What’s painful now\n", "- Global supply chains are fragile: late shipments, port congestion, customs delays, factory outages, and sudden demand spikes occur frequently.\n", "- Response is manual and siloed: operations, procurement, carriers, warehouse teams, and sales coordinate by email/phone/meetings. Decision-making is slow and inconsistent.\n", "- Lack of actionable, unified visibility: data is scattered across ERPs, TMS, WMS, carrier portals, and supplier emails, so it’s hard to detect disruptions early or weigh alternate remediation paths.\n", "- Economic impact is large: missed sales, expedited freight costs, excess safety stock, and production downtime.\n", "\n", "Why this is ripe for an agentic solution\n", "- Problem requires continuous monitoring, cross-system integration, multi-step coordination, and autonomous decision-making with human-in-the-loop escalation — all strengths of agentic systems.\n", "- There’s abundant machine-readable data (APIs, EDI, IoT trackers) to feed agents.\n", "- High ROI potential from cost reduction (expedited freight, stockouts), improved fill rates, and faster recovery.\n", "\n", "What an agentic solution would do (capabilities)\n", "- Continuously monitor signals: carrier ETAs, GPS trackers, customs status, production schedules, inventory levels, sales forecasts, and news (weather, strikes).\n", "- Detect anomalies and predict downstream impacts (e.g., projected stockout in X days).\n", "- Generate and evaluate remediation plans automatically (e.g., reroute shipment, switch supplier, adjust production schedule, expedite partial shipments).\n", "- Execute routine actions autonomously via APIs: rebook carriers, place emergency POs, update WMS/ERP, notify customers through CRM.\n", "- Negotiate with partners: use API/negotiation flows to solicit alternate quotes, accept best cost-time tradeoff.\n", "- Escalate complex or high-cost decisions to human operators with concise rationale and ranked options.\n", "- Maintain an audit trail and continuously learn from outcomes to improve future decisions.\n", "\n", "Architecture / integrations (high level)\n", "- Data layer: ingest from ERP, TMS, WMS, supplier portals, carrier APIs, IoT trackers, external data (weather, customs feeds).\n", "- Decisioning layer: predictive models (ETA, disruption impact), constraint solver, multi-criteria optimizer.\n", "- Agentic orchestration: autonomous agents that monitor, propose, and autonomously execute actions; policy engine for risk thresholds and escalation rules.\n", "- Human interface: dashboards, notifications, one-click approvals, natural-language summaries.\n", "- Security & governance: role-based access, transaction approval workflows, explainability logs.\n", "\n", "Metrics to track\n", "- Mean time to remediation for disruptions\n", "- Stockout incidents and days-of-supply lost\n", "- Expedited freight spend reduction\n", "- On-time fulfillment rate\n", "- Number and % of incidents resolved autonomously vs. escalated\n", "- Prediction accuracy for disruption impact\n", "\n", "Risks and mitigations\n", "- Risk: agents take wrong high-cost actions. Mitigation: strict cost/impact thresholds, required approvals above set limits, staged rollout with human-in-the-loop.\n", "- Risk: poor data quality → incorrect decisions. Mitigation: data validation, confidence scoring, fallback manual review.\n", "- Risk: partner resistance to automated negotiation. Mitigation: phased integration with pilot partners; human override; standardized APIs.\n", "- Risk: regulatory/compliance issues. Mitigation: audit logs, approval workflows, legal review.\n", "\n", "Why this is compelling\n", "- It converts slow, error-prone human coordination into scalable autonomous processes, saving significant time and money.\n", "- Improves resilience: faster detection and remediation reduces cascade effects across the network.\n", "- Provides measurable ROI and learning value: each incident improves future responses.\n", "\n", "If you tell me the specific industry you had in mind, I’ll present a tailored pain-point and a short agentic solution blueprint for that sector." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import Markdown, display\n", "\n", "pain_point = response.choices[0].message.content\n", "\n", "display(Markdown(pain_point))\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "messages = [{\"role\": \"user\", \"content\": \"propose the Agentic AI solution.\"}]\n", "\n", "response = openai.chat.completions.create(\n", " model=\"gpt-5-mini\",\n", " messages=messages\n", ")\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "I can — but first: do you mean an “agentic AI” product (an autonomous multi-step agent that performs tasks end-to-end), or a technical proposal for building one? And what’s the target domain (enterprise knowledge work, customer support, software engineering, robotics, finance, R&D, etc.), scale, and constraints (cloud vs on-prem, allowed models, latency, regulatory/compliance requirements)?\n", "\n", "Below is a general, ready-to-adapt proposal for designing and delivering an agentic AI solution. You can tell me your target domain and constraints and I’ll customize it.\n", "\n", "Executive summary\n", "- Build an agentic AI platform that accepts high-level goals, decomposes them into verifiable steps, uses specialized tools and memory, iterates autonomously, and escalates to humans when needed.\n", "- Focus on reliability, traceability, and safety: every action must be auditable and constrained by policies and human oversight.\n", "- Deliver in phases: prototype (MVP), pilot, production, and continuous improvement.\n", "\n", "Goals & success metrics\n", "- Functional: complete X% of defined tasks end-to-end without human intervention; average task latency < Y minutes; human overrides < Z%.\n", "- Safety & compliance: zero unauthorized data exfiltration incidents; 100% audit logs retained; policy compliance rate 100%.\n", "- Business: reduce operator time by N hours/month; increase throughput by M%.\n", "\n", "High-level architecture\n", "- Core agent orchestrator (planner + executor)\n", "- Language model(s) + tools layer\n", "- Perception and connectors (APIs, web, databases, code execution, robotics interfaces)\n", "- Memory & knowledge store (short-term, episodic, long-term)\n", "- Safety & policy enforcement (sandbox, action filters, approvals)\n", "- Observability, logging, and audit trail\n", "- Human-in-the-loop interfaces (review, approvals, explainability)\n", "- DevOps & deployment (CI/CD, model hosting, monitoring)\n", "\n", "Components & responsibilities\n", "1. Planner (high-level reasoning)\n", " - Inputs: user goal, context, memory.\n", " - Outputs: task decomposition, subtask sequence, success criteria.\n", " - Implementation: LLM prompts + symbolic planner or hierarchical RL for complex domains.\n", "\n", "2. Executor (tool invocation & step-level reasoning)\n", " - Executes subtasks using specialized tools (search, browser, database, code runner, APIs).\n", " - Validates step outputs against success criteria, retries, backtracks as needed.\n", "\n", "3. Tools & Connectors\n", " - Search/scraping, enterprise systems (CRM, ERP), code execution, cloud infra, sensors/robotics APIs.\n", " - Each tool interface includes a capability declaration and a risk profile.\n", "\n", "4. Memory & Knowledge\n", " - Short-term: task context and ephemeral state.\n", " - Long-term: user preferences, verified facts, action histories (vector DB + metadata).\n", " - Versioned knowledge with provenance tagging.\n", "\n", "5. Safety & Governance Layer\n", " - Action filters: policy rules that block dangerous operations (data exfiltration, financial transfers).\n", " - Sandboxing: run untrusted code in isolated environments with resource limits.\n", " - Human approval flows for sensitive actions.\n", " - Explainability module: generate human-readable rationale for each action.\n", "\n", "6. Observability & Auditing\n", " - Immutable, queryable logs for prompts, model outputs, tool calls, decisions, and user actions.\n", " - Monitoring metrics (success rate, hallucination rate, latency, overrides).\n", " - Alerts and canary deployments for new behaviors.\n", "\n", "7. Model Pipeline\n", " - Base LLM(s) selection (open-source or cloud provider), domain fine-tuning, RLHF/IRL for alignment.\n", " - Continuous evaluation dataset and automated regression tests.\n", "\n", "Security & compliance\n", "- Identity & access controls per tool; least-privilege keys for connectors.\n", "- Data classification and automated redaction.\n", "- Encrypted logs and storage; key rotation and vaulting.\n", "- Audit-ready records to satisfy regulators (GDPR/CCPA, SOC2, HIPAA where applicable).\n", "\n", "Safety & alignment measures\n", "- Conservative action policies: default to ask/notify for risky actions.\n", "- Red-team testing and adversarial prompts.\n", "- Human escalation thresholds and emergency kill switches.\n", "- Reward shaping and constrained optimization to avoid goal misgeneralization.\n", "\n", "Typical workflow (example: enterprise research assistant)\n", "1. User: “Identify potential acquisition targets in renewable energy between $50–200M revenue, list top 10 with risks and contact strategy.”\n", "2. Planner: decomposes into dataset collection, filtering, ranking, risk analysis, outreach plan.\n", "3. Executor: calls web-scraper, company databases, internal CRM; synthesizes findings; drafts emails.\n", "4. Safety module: blocks sending emails without human approval for external outreach.\n", "5. Human reviewer: approves outreach and final list.\n", "6. System logs everything, updates long-term memory about approved contacts and outcomes.\n", "\n", "Tech stack suggestions\n", "- LLMs: GPT-4o/GPT-4 (cloud) or Llama2/Mistral/GPT-J variants (on-prem) depending on constraints.\n", "- Orchestration: LangChain, LlamaIndex, or custom microservice controller.\n", "- Vector DB: Pinecone, Milvus, or Weaviate.\n", "- Data lake / DB: Postgres, Snowflake, or managed DB.\n", "- Container & infra: Kubernetes, Istio; serverless for lower ops.\n", "- Observability: Prometheus + Grafana, ELK/Opensearch for logs.\n", "- Secrets & keys: HashiCorp Vault, cloud KMS.\n", "\n", "Phased roadmap\n", "- Phase 0: Requirements & safety policy design (2–4 weeks)\n", "- Phase 1 (MVP): Core planner + 3 tools + human approval for actions (6–10 weeks)\n", "- Phase 2 (Pilot): Add memory, additional connectors, RLHF fine-tuning, monitoring (8–12 weeks)\n", "- Phase 3 (Production): Harden security, scale, full governance & compliance checks (12–20 weeks)\n", "- Ongoing: continuous improvement, red-team testing, model updates.\n", "\n", "Evaluation & KPIs\n", "- Functional tests: task completion, accuracy against ground truth, end-to-end latency.\n", "- Safety tests: policy violations, exploit attempts, unauthorized actions simulated.\n", "- Human factors: trust scores, override frequency, time saved.\n", "- Cost metrics: infrastructure cost per active agent-hour and per task.\n", "\n", "Risks and mitigations\n", "- Hallucination: validate outputs against authoritative sources, truth-checker modules.\n", "- Unintended actions: enforce strict action gating and canary deploy.\n", "- Data leakage: encryption, strict scope for connectors, data minimization.\n", "- Misaligned objectives: narrow reward function, human oversight, periodic audits.\n", "\n", "Deliverables\n", "- Requirements & architecture doc\n", "- Prototype agent with 3 connectors and human approval UI\n", "- Safety policy & automated filter set\n", "- Monitoring and audit dashboards\n", "- Training and ops playbooks\n", "- Pilot results & plan for production rollout\n", "\n", "Estimated initial cost ballpark (very approximate)\n", "- Engineering (6–8 people) + infra + model API: $500k–$1.5M for MVP → pilot\n", "- Ongoing production ops: $50k–$250k/month depending on scale and model costs\n", "\n", "Next steps\n", "- Confirm domain, data access constraints, regulatory requirements, and target outcomes.\n", "- Prioritize first 3-5 use cases for the MVP.\n", "- Decide model hosting preference (cloud vs on-prem) and access to production systems.\n", "\n", "If you tell me target domain and constraints (e.g., “enterprise legal assistant, must be on-prem, HIPAA applies”), I’ll produce a tailored architecture, an itemized component list, a 12-week sprint plan, and sample acceptance tests." ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "solution = response.choices[0].message.content\n", "\n", "display(Markdown(solution))\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# DEEP SEEK" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "from openai import OpenAI\n", "\n", "client = OpenAI(api_key=deepseek_api_key, base_url=\"https://api.deepseek.com\")\n", "\n" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "messages = [{\"role\": \"user\", \"content\": \"What is the success rate of AI agents in the industry?\"}]\n", "\n", "response = client.chat.completions.create(\n", " model=\"deepseek-chat\",\n", " messages=messages)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "answer = response.choices[0].message.content" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "Excellent question, but it's important to frame it correctly. There isn't a single, universal \"success rate\" for AI agents in the industry, as success is measured differently across applications and the technology is rapidly evolving. Instead, we can look at **adoption rates, measurable ROI, and key areas of success and challenge.**\n", "\n", "Here’s a breakdown of the current landscape:\n", "\n", "### 1. High Success & Rapid Adoption Areas (The \"Winners\")\n", "These are domains where AI agents are proving highly successful, often with clear ROI.\n", "* **Customer Service (Chatbots & Virtual Agents):** The most widespread use. Success is measured in deflection rate (calls/tickets avoided), customer satisfaction (CSAT), and 24/7 availability. Simple FAQ bots have high success for basic queries, while more advanced agents handling complex issues are improving but can still struggle.\n", "* **Process Automation (RPA + AI):** AI agents that automate repetitive, rule-based digital tasks (data entry, invoice processing, report generation) have a **very high success rate** in terms of efficiency gains and cost reduction. This is a mature and reliable application.\n", "* **Sales & Marketing (Copilots & Prospecting):** AI agents that qualify leads, personalize outreach, and schedule meetings are showing strong success in increasing lead volume and sales productivity. Success is measured in lead conversion rates and time saved for sales reps.\n", "* **Software Development (Coding Agents like GitHub Copilot):** Hugely successful in terms of developer adoption and productivity boosts. Studies show developers code up to 55% faster, making this one of the clearest success stories.\n", "* **Specialized Industry Agents:** In fields like finance (for fraud detection), logistics (for route optimization), and manufacturing (for predictive maintenance), AI agents analyzing real-time data are highly successful at specific, well-defined tasks.\n", "\n", "### 2. Areas with Mixed or Emerging Success\n", "These applications are promising but face hurdles in reliability, complexity, or integration.\n", "* **Autonomous Agents for Complex Workflows:** Agents that can plan, execute multi-step tasks across different apps (e.g., \"research a topic and create a presentation\") are in early stages. While demos are impressive, their **reliability rate in production** is lower. They can get \"stuck\" or make errors, requiring human oversight.\n", "* **Creative & Strategic Tasks:** Agents for content creation, strategy, or open-ended research show potential but often lack the nuanced understanding, consistency, and brand voice required for fully autonomous success. They are used more as powerful assistants than replacements.\n", "* **Physical World Robotics:** While not purely \"software agents,\" AI-powered robots in warehouses (like Amazon's) are highly successful. More general-purpose physical agents (e.g., home robots) have a much lower success rate for unstructured environments.\n", "\n", "### 3. Key Metrics of Success (How It's Measured)\n", "When companies report success, they look at:\n", "* **Efficiency Gains:** Time saved, throughput increased, cost reduction.\n", "* **Accuracy & Quality:** Reduction in errors, improvement in output quality (e.g., code, customer response).\n", "* **ROI (Return on Investment):** Direct financial return from the agent implementation.\n", "* **Adoption Rate:** How willingly and frequently employees or customers use the agent.\n", "* **Task Completion Rate:** For autonomous agents, the percentage of tasks fully completed without human intervention.\n", "\n", "### 4. Major Challenges Affecting Success Rates\n", "* **Hallucinations & Inaccuracy:** LLM-based agents can generate incorrect or fabricated information.\n", "* **Integration Complexity:** Connecting agents to legacy systems, data silos, and ensuring they have the right context.\n", "* **Security & Governance:** Managing data privacy, security risks, and ensuring agents operate within set boundaries.\n", "* **\"Liability\" for Errors:** In critical applications (legal, medical, financial), who is responsible for an agent's mistake? This limits full autonomy.\n", "\n", "### Overall Assessment & Trend\n", "\n", "* **Narrow, Well-Defined Tasks:** AI agents have a **very high success rate** (often >80-90% in terms of ROI and adoption). They excel as **super-powered tools and assistants**.\n", "* **Broad, Complex Autonomy:** The success rate for fully autonomous agents handling open-ended tasks is **lower and more variable**, often requiring a \"human-in-the-loop\" for now.\n", "* **Trajectory:** The success rate is **increasing rapidly**. As models get better, tool integration improves, and companies develop better implementation frameworks (like **agentic workflows**), failures are becoming less frequent.\n", "\n", "**In summary: Don't think of a single percentage. Think of a spectrum.** The success rate for automating a routine invoice is near 100%. The success rate for an agent autonomously running a full digital marketing campaign start-to-finish is much lower, but it's a powerful copilot that significantly increases the marketer's own success rate.\n", "\n", "The most accurate statement is: **AI agents are delivering significant value and high ROI across a wide range of specific, repetitive, and data-intensive tasks, leading to massive and growing adoption. Their ability to perform more complex, multi-step reasoning autonomously is the current frontier, with rapid progress but not yet universal reliability.**" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import Markdown, display\n", "\n", "display(Markdown(answer))\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.5" } }, "nbformat": 4, "nbformat_minor": 2 }