Learn

How to Build an Autonomous Data Analyst using Dynamiq, E2B and Together.ai

Oleksii Babych
November 6, 2024

Data analyst agents are specialized AI-driven tools designed to automate and enhance data analysis tasks. They work by interpreting and processing structured and unstructured data to deliver insights, build predictive models, and automate decision-making processes. These agents leverage advanced algorithms, machine learning models, and language processing capabilities to understand user instructions and perform sophisticated analyses.

With the recent advancements in AI and natural language models, building data analyst agents has become more accessible, especially through libraries like Dynamiq. Using tools like the React Agent and Llama 3.2 / GPT-4o LLMs  allows developers to create robust agents that can process, analyze, and respond to various data requests.

Key Capabilities of a Data Analyst Agent

A data analyst agent typically comes equipped with several key capabilities:

  1. Data Retrieval and Processing: The agent can retrieve data from various sources, including APIs, databases, and local files, and process it into a usable format for analysis.
  1. Execution of Code and Data Manipulation: By interpreting instructions and executing code, such agents can perform complex calculations and data transformations without human intervention.
  1. Statistical Analysis and Modeling: They can build models, conduct statistical tests, and apply techniques like linear regression, clustering, and classification.
  1. Predictive Analytics: Based on historical data, these agents can develop predictive models to forecast future trends, such as stock prices or customer churn.
  1. Generating Reports and Insights: After analysis, the agent can automatically generate summaries and reports in various formats (e.g., Markdown, HTML), making the insights easily accessible.

Example: Data Analyst Agent Built with Dynamiq

An effective data analyst agent can be implemented using Dynamiq’s library. In the following steps we’ll go through an example where we craft an agent that uses OpenAI’s language model and other tools for an end-to-end analysis.

Task 1: Apple Stocks Price Prediction

In this scenario, we use the agent to predict stocks prices by performing analysis on recent historical data. By giving the agent specific instructions, we can prompt it to retrieve data, apply a linear regression model, and generate a prediction. 

Here’s how the looks:

Here’s an example input and response:

Input:


{
   "input": "get the apple and microsoft prices for last 10 days, build linear model, try to predict prices for next 10 days, write summary with results in markdown"
}
    
  • Agent’s Output: The agent fetches Apple and Microsoft prices from a data source, processes the last 10 days of data, and applies a linear regression model to predict the next 10 days.

Predicted Prices:


# Apple and Microsoft Stock Price Prediction
## Introduction
This report provides a summary of the Apple and Microsoft stock prices for the last 10 days, along with a linear model prediction for the next 10 days.

## Last 10 Days' Prices
### Apple
| Day | Price |
| --- | --- |
| 2024-10-30 | 232.6100 |
| 2024-10-29 | 232.7400 |
| 2024-10-28 | 231.7400 |
...
### Microsoft
| Day | Price |
| --- | --- |
| 2024-10-30 | 437.4350 |
| 2024-10-29 | 437.7400 |
| 2024-10-28 | 436.7400 |
| 2024-10-27 | 435.7400 |
| 2024-10-26 | 434.7400 |
...
## Next 10 Days' Predicted Prices
### Apple
| Day | Predicted Price |
| --- | --- |
| 2024-11-01 | 223.966 |
| 2024-11-02 | 223.02763636 |
| 2024-11-03 | 222.08927273 |
| 2024-11-04 | 221.15090909 |
...
### Microsoft
| Day | Predicted Price |
| --- | --- |
| 2024-11-01 | 429.001 |
| 2024-11-02 | 428.07218182 |
| 2024-11-03 | 427.14336364 |
| 2024-11-04 | 426.21454545 |
...
    

Task 2: Long-term Stock Investment Recommendations

Another task is identifying promising stocks for long-term investment. Here, we instruct the agent to analyze market data and identify stocks with strong growth potential.

Input:


{
   "input": "find the most interesting stocks to invest in for the next 10 years, write results with explanation in markdown"
}

    
  • Agent’s Output: The agent processes industry trends and financial data to suggest investment options.

When considering stocks to invest in for the next 10 years, it's important to focus on companies with strong fundamentals, growth potential, and resilience to market fluctuations. Here are some sectors and companies that are often highlighted by experts as promising long-term investments:

Technology Sector:

Apple (AAPL): Known for its innovation and strong brand loyalty, Apple continues to expand its ecosystem with new products and services.
Microsoft (MSFT): With its strong presence in cloud computing and enterprise software, Microsoft is well-positioned for future growth.

Healthcare Sector:

Johnson & Johnson (JNJ): A leader in pharmaceuticals and consumer health products, J&J has a diverse portfolio and a strong research pipeline.
Pfizer (PFE): Known for its vaccine development and pharmaceutical innovations, Pfizer is a key player in the healthcare industry.

...
    

And we can see some tracing - how exactly the agent reasoning was initiated.

Crafting More Complex Analyses

Using Dynamiq, we can enrich our agent’s capabilities further by providing it with specific datasets and allowing it to perform analyses on various topics like customer churn, house prices, and other areas of interest.

Example: House Price Analysis

If we provide the agent with a dataset on house prices, it can conduct a price trend analysis, regional comparisons, and potentially predict future trends based on historical data.

Integrating Custom Code with Dynamiq

With Dynamiq’s library, it’s straightforward to set up an agent that performs complex analyses based on code execution. We might include Python code blocks to allow the agent to run custom functions or work with specific libraries.


import io
import json
from pathlib import Path

from dynamiq import Workflow
from dynamiq.callbacks import TracingCallbackHandler
from dynamiq.connections import E2B
from dynamiq.flows import Flow
from dynamiq.nodes.agents.react import ReActAgent
from dynamiq.nodes.tools.e2b_sandbox import E2BInterpreterTool
from dynamiq.runnables import RunnableConfig
from dynamiq.utils import JsonWorkflowEncoder
from examples.llm_setup import setup_llm

INPUT_PROMPT = "Summarize the text and try to evaluate it"
FILE_PATH = ".data/sample-essay-1.pdf"


def run_workflow(
   agent: ReActAgent,
   input_prompt: str,
   input_files: list,
) -> tuple[str, dict]:
   """
   Execute a workflow using the ReAct agent to process a predefined query.

   Returns:
       tuple[str, dict]: The generated content by the agent and the trace logs.

   Raises:
       Exception: Captures and prints any errors during workflow execution.
   """
   tracing = TracingCallbackHandler()
   wf = Workflow(flow=Flow(nodes=[agent]))

   try:
       result = wf.run(
           input_data={"input": input_prompt, "files": input_files},
           config=RunnableConfig(callbacks=[tracing]),
       )
       # Verify that traces can be serialized to JSON
       json.dumps(
           {"runs": [run.to_dict() for run in tracing.runs.values()]},
           cls=JsonWorkflowEncoder,
       )

       return result.output[agent.id]["output"]["content"], tracing.runs
   except Exception as e:
       print(f"An error occurred: {e}")
       return "", {}

csv_bytes = open(FILE_PATH, "rb").read()

python_tool = E2BInterpreterTool(connection=E2B())

llm = setup_llm()


AGENT_ROLE = (
   "A helpful and general-purpose AI assistant with strong language, Python, "
   "and Linux command-line skills. The goal is to provide concise answers to the user. "
   "Additionally, try to generate code to solve tasks, then run it accurately. "
   "Before answering, create a plan for solving the task. You can search for any API, "
   "and use any free, open-source API that doesn't require authorization."
   "You can install any packages for loading PDFS, such as PyMuPDF, PyPDF2, or pdfplumber."
   "and for the other file extensions as well, if you need to open them try to search and then install."
   "Also if you are working with binary files, try to understand the file format and then read the file."
)

agent = ReActAgent(
   name="Agent",
   id="Agent",
   llm=llm,
   role=AGENT_ROLE,
   tools=[python_tool],
)

output, traces = run_workflow(
   agent=agent,
   input_prompt=INPUT_PROMPT,
   input_files=[csv_bytes],
)
print("Agent Output:", output)
    

Let's take a look at the agent’s results:

Summary: The essay emphasizes the importance of integrating information technology (IT) into education beyond traditional teaching methods. It argues that schools need a clear plan for implementing IT to enhance learning and avoid wasting resources. The essay highlights the necessity of training teachers to effectively use IT and the need for a flexible approach to its application in the classroom. It also warns against over-reliance on technology, stressing that it should complement traditional learning tools. The conclusion reiterates the significance of a well-planned IT strategy to prepare students for future technological advancements.

Evaluation:

The essay is well-structured, beginning with a clear introduction that outlines the importance of IT in education. The body paragraphs logically develop the argument, providing specific examples and citing relevant sources. The use of in-text citations adds credibility to the claims made. However, the essay could benefit from a more detailed discussion on potential challenges schools face when integrating IT and how to overcome them. Overall, it presents a compelling case for the thoughtful incorporation of technology in education.

Building Advanced Analytical Workflows

These examples show the foundation for constructing sophisticated flows where agents interact with various datasets, perform multi-step analyses, and generate insights across different domains. The goal is to build a flexible, intelligent agent that can handle diverse analytical tasks autonomously, providing actionable insights for decision-makers.

The setup with Dynamiq and other tools forms a foundation for crafting intelligent, context-aware agents capable of handling a wide range of tasks and data sources, paving the way for highly automated analytical workflows.

Curious to find out how Dynamiq can help you extract ROI and boost productivity in your organization?

Book a demo
Table of contents

Find out how Dynamiq can help you optimize productivity

Book a demo
Lead with AI: Subscribe for Insights
By subscribing you agree to our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Related posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
View all
No items found.