{ "cells": [ { "cell_type": "markdown", "id": "66be0f9b-942f-421c-882b-5cf6a8d225ef", "metadata": {}, "source": [ "# Creating an AI Trading Bot using Machine Learning with help of AI" ] }, { "cell_type": "markdown", "id": "f70ecd95-f883-40e1-a73d-1ade318c8783", "metadata": {}, "source": [ "## Used tools" ] }, { "cell_type": "markdown", "id": "314ddf94-059a-452a-8098-22b1ec78d5b1", "metadata": {}, "source": [ "- https://aistudio.google.com\n", "- https://alpaca.markets/\n", "- https://jupyter.org/\n", "- https://superai.pl/courses.html" ] }, { "cell_type": "markdown", "id": "2b8e5201-a668-4b6a-9be0-5172b5e67b95", "metadata": {}, "source": [ "## Creating AI Trading Bot with AI" ] }, { "cell_type": "markdown", "id": "9d07ca5d-c625-496a-bd5c-5efa72b3e404", "metadata": {}, "source": [ "### 1st Prompt" ] }, { "cell_type": "markdown", "id": "bccd6bdd-6e98-4f76-bd50-eb0f13f5636c", "metadata": {}, "source": [ "1. \"I would like to create a trading bot. It should be using machine learning to trade BTCUSD in one minute intervals and Alpaca Markets. Can you help me with that?\"\n", "\n", "(Remember: At the moment Alpaca uses different symbols for executing orders such as “BTC/USD” and checking portfolio positions such as “BTCUSD”.)\n" ] }, { "cell_type": "markdown", "id": "6cf0ec94-cdb3-47e5-918e-5ba2cfd6f16e", "metadata": {}, "source": [ "#### Keys to Alpaca" ] }, { "cell_type": "code", "execution_count": 1, "id": "7d232c4b-7f4a-4f9d-b557-1abe46047672", "metadata": {}, "outputs": [], "source": [ "KEY_ID = \"your API KEY\" #replace it with your own KEY_ID from Alpaca: https://alpaca.markets/\n", "SECRET_KEY = \"your SECRET KEY\" #replace it with your own SECRET_KEY from Alpaca" ] }, { "cell_type": "markdown", "id": "ea5c66d3-6c54-49a0-800e-7dd663fec83c", "metadata": {}, "source": [ "### 1st Attempt" ] }, { "cell_type": "code", "execution_count": 2, "id": "56fce22c-c529-40c1-ae46-3576bcb75475", "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "id": "86801639-ce2c-4bac-bf23-94ab71cf70b7", "metadata": {}, "source": [ "Step 1: Alpaca Setup & Python Environment\n", "\n", " Sign up for an Alpaca account: https://alpaca.markets/\n", "\n", " Get your API Key ID and Secret Key from your paper trading account dashboard.\n", "\n", " Install necessary Python libraries:" ] }, { "cell_type": "code", "execution_count": 3, "id": "91850edf-7220-407e-a577-7a8cf4ee6ca9", "metadata": { "scrolled": true }, "outputs": [], "source": [ "pip install alpaca-trade-api pandas scikit-learn xgboost numpy pandas-ta" ] }, { "cell_type": "markdown", "id": "41adbca4-5621-4491-8541-3dd4f52eed2b", "metadata": {}, "source": [ "Set up API credentials (environment variables are best):" ] }, { "cell_type": "code", "execution_count": 4, "id": "bc5e6181-d205-476d-8c17-8cecda0c01ed", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected to Alpaca Paper Trading.\n" ] } ], "source": [ "import os\n", "import alpaca_trade_api as tradeapi\n", "import pandas as pd\n", "from datetime import datetime, timedelta\n", "import time\n", "\n", "# For Paper Trading\n", "os.environ['APCA_API_BASE_URL'] = 'https://paper-api.alpaca.markets'\n", "# Replace with your actual keys or set them as environment variables\n", "os.environ['APCA_API_KEY_ID'] = 'your API KEY'\n", "os.environ['APCA_API_SECRET_KEY'] = 'your SECRET KEY'\n", "\n", "API_KEY = os.getenv('APCA_API_KEY_ID')\n", "API_SECRET = os.getenv('APCA_API_SECRET_KEY')\n", "BASE_URL = os.getenv('APCA_API_BASE_URL')\n", "\n", "api = tradeapi.REST(API_KEY, API_SECRET, BASE_URL, api_version='v2')\n", "\n", "print(\"Connected to Alpaca Paper Trading.\")\n", "# Check account\n", "# account = api.get_account()\n", "# print(f\"Account status: {account.status}\")" ] }, { "cell_type": "markdown", "id": "45682be7-b549-422b-8a7d-1d9dbd8f2672", "metadata": {}, "source": [ "Step 2: Data Acquisition" ] }, { "cell_type": "code", "execution_count": 5, "id": "923fea7c-910e-45cb-b19f-3b7a3002322c", "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "module 'alpaca_trade_api' has no attribute 'টাইমফ্রেম'", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mAttributeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[1;32mIn[5], line 61\u001b[0m\n\u001b[0;32m 59\u001b[0m \u001b[38;5;66;03m# Example usage:\u001b[39;00m\n\u001b[0;32m 60\u001b[0m symbol \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mBTC/USD\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;66;03m# Alpaca uses \"BTC/USD\" for crypto pairs\u001b[39;00m\n\u001b[1;32m---> 61\u001b[0m timeframe \u001b[38;5;241m=\u001b[39m tradeapi\u001b[38;5;241m.\u001b[39m টাইমফ্রেম\u001b[38;5;241m.\u001b[39mমিনিট \u001b[38;5;66;03m# or \"1Min\" for older SDK versions\u001b[39;00m\n\u001b[0;32m 62\u001b[0m \u001b[38;5;66;03m# Fetch last 30 days of data for example\u001b[39;00m\n\u001b[0;32m 63\u001b[0m start_date \u001b[38;5;241m=\u001b[39m (datetime\u001b[38;5;241m.\u001b[39mnow() \u001b[38;5;241m-\u001b[39m timedelta(days\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m30\u001b[39m))\u001b[38;5;241m.\u001b[39mstrftime(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124m%\u001b[39m\u001b[38;5;124mY-\u001b[39m\u001b[38;5;124m%\u001b[39m\u001b[38;5;124mm-\u001b[39m\u001b[38;5;132;01m%d\u001b[39;00m\u001b[38;5;124m'\u001b[39m)\n", "\u001b[1;31mAttributeError\u001b[0m: module 'alpaca_trade_api' has no attribute 'টাইমফ্রেম'" ] } ], "source": [ "def fetch_data(symbol, timeframe, start_date_str, end_date_str=None):\n", " \"\"\"Fetches historical crypto data from Alpaca.\"\"\"\n", " if end_date_str is None:\n", " end_date_str = datetime.now().strftime('%Y-%m-%d')\n", "\n", " # Alpaca API expects ISO 8601 format for start/end times\n", " # And it has a limit on how many bars can be fetched per request (e.g., 10000 for crypto)\n", " # So we may need to fetch in chunks if requesting a long period.\n", "\n", " all_bars = []\n", " start_dt = pd.to_datetime(start_date_str, utc=True)\n", " end_dt = pd.to_datetime(end_date_str, utc=True)\n", "\n", " # Fetch data in chunks to avoid hitting API limits for very long periods\n", " # For 1-minute data, 10000 bars is about 7 days.\n", " # Let's fetch data in smaller chunks, e.g., 5 days at a time.\n", " current_start = start_dt\n", " while current_start < end_dt:\n", " chunk_end = min(current_start + timedelta(days=5), end_dt) # Adjust chunk size as needed\n", " print(f\"Fetching data from {current_start.isoformat()} to {chunk_end.isoformat()}\")\n", "\n", " # Alpaca's get_crypto_bars expects start and end in ISO format\n", " bars = api.get_crypto_bars(\n", " symbol,\n", " timeframe,\n", " start=current_start.isoformat(),\n", " end=chunk_end.isoformat(),\n", " limit=10000 # Max limit per request\n", " ).df\n", "\n", " if bars.empty:\n", " print(f\"No data found for chunk starting {current_start.isoformat()}\")\n", " if current_start + timedelta(days=5) > end_dt and not all_bars: # if first chunk and no data\n", " break\n", " current_start += timedelta(days=5) # Move to next chunk period\n", " time.sleep(1) # Be nice to the API\n", " continue\n", "\n", " all_bars.append(bars)\n", " # Update current_start to the timestamp of the last bar fetched + 1 minute\n", " # to avoid overlap and to ensure we move forward.\n", " if not bars.index.empty:\n", " current_start = bars.index[-1] + pd.Timedelta(minutes=1)\n", " else: # Should not happen if bars is not empty, but as a safeguard\n", " current_start += timedelta(days=5)\n", "\n", " time.sleep(1) # Respect API rate limits\n", "\n", " if not all_bars:\n", " print(\"No data fetched. Check your date range or symbol.\")\n", " return pd.DataFrame()\n", "\n", " df = pd.concat(all_bars)\n", " df = df[~df.index.duplicated(keep='first')] # Remove potential duplicates from chunking\n", " df = df.sort_index() # Ensure chronological order\n", " df = df[df.index <= end_dt] # Ensure we don't go past the requested end_dt\n", " return df\n", "\n", "# Example usage:\n", "symbol = \"BTC/USD\" # Alpaca uses \"BTC/USD\" for crypto pairs\n", "timeframe = tradeapi. টাইমফ্রেম.মিনিট # or \"1Min\" for older SDK versions\n", "# Fetch last 30 days of data for example\n", "start_date = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d')\n", "# end_date = datetime.now().strftime('%Y-%m-%d') # Fetches up to now\n", "\n", "btc_data = fetch_data(symbol, timeframe, start_date)\n", "if not btc_data.empty:\n", " print(f\"Fetched {len(btc_data)} rows of {symbol} data.\")\n", " print(btc_data.head())\n", " print(btc_data.tail())\n", "else:\n", " print(\"Failed to fetch data.\")" ] }, { "cell_type": "markdown", "id": "dc7a0229-e758-4df6-a8b1-1dbe94c6f7a5", "metadata": {}, "source": [ "Step 3: Feature Engineering\n", "\n", "This is where you create signals for your model. pandas-ta is a great library for this." ] }, { "cell_type": "code", "execution_count": null, "id": "c3eac019-211a-4630-95c9-9fcc61e358ec", "metadata": {}, "outputs": [], "source": [ "import pandas_ta as ta\n", "\n", "def create_features(df):\n", " if df.empty or 'close' not in df.columns:\n", " print(\"DataFrame is empty or 'close' column is missing.\")\n", " return df\n", "\n", " df.ta.sma(length=10, append=True, col_names=('SMA_10'))\n", " df.ta.sma(length=30, append=True, col_names=('SMA_30'))\n", " df.ta.ema(length=10, append=True, col_names=('EMA_10'))\n", " df.ta.rsi(length=14, append=True, col_names=('RSI_14'))\n", " df.ta.macd(append=True, col_names=('MACD_12_26_9', 'MACDh_12_26_9', 'MACDs_12_26_9'))\n", " df.ta.bbands(length=20, append=True, col_names=('BBL_20_2.0', 'BBM_20_2.0', 'BBU_20_2.0', 'BBB_20_2.0', 'BBP_20_2.0'))\n", " df.ta.atr(length=14, append=True, col_names=('ATR_14'))\n", "\n", " # Lagged returns\n", " for lag in [1, 3, 5, 10]:\n", " df[f'return_{lag}m'] = df['close'].pct_change(periods=lag)\n", "\n", " # Add more features: volatility, momentum, volume-based if available, etc.\n", " # e.g., log returns, price relative to moving average, etc.\n", "\n", " df.dropna(inplace=True) # Remove rows with NaNs created by indicators\n", " return df\n", "\n", "if not btc_data.empty:\n", " featured_data = create_features(btc_data.copy()) # Use .copy() to avoid modifying original\n", " print(\"\\nData with features:\")\n", " print(featured_data.head())\n", "else:\n", " print(\"Cannot create features, btc_data is empty.\")\n", " featured_data = pd.DataFrame() # ensure it's a DataFrame" ] }, { "cell_type": "markdown", "id": "8d1536cf-2ee0-4a30-ac4e-9fa82c8936c7", "metadata": {}, "source": [ "Step 4: Model Training - Defining Target & Training\n", "\n", "Let's define a simple target: will the price be higher or lower in N minutes?" ] }, { "cell_type": "code", "execution_count": null, "id": "8e773d98-f353-4ab4-b2a1-5d82024865b0", "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split, TimeSeriesSplit\n", "from sklearn.ensemble import RandomForestClassifier # Example model\n", "from sklearn.metrics import accuracy_score, classification_report\n", "import xgboost as xgb\n", "\n", "def prepare_data_for_model(df, target_horizon=5, test_size=0.2):\n", " \"\"\"\n", " Prepares X (features) and y (target) for the ML model.\n", " Target: 1 if price increases by more than a small threshold after target_horizon periods, 0 otherwise.\n", " -1 if price decreases by more than a small threshold. (Optional: make it 3 classes)\n", " \"\"\"\n", " if df.empty or 'close' not in df.columns:\n", " print(\"DataFrame is empty or 'close' column is missing.\")\n", " return pd.DataFrame(), pd.Series(dtype='float64'), pd.DataFrame(), pd.Series(dtype='float64')\n", "\n", " # Define target: 1 if price goes up in `target_horizon` minutes, 0 otherwise\n", " # A small threshold can help avoid noise around 0% change\n", " # price_threshold = 0.0005 # e.g., 0.05% change\n", " # df['future_price'] = df['close'].shift(-target_horizon)\n", " # df['price_change'] = (df['future_price'] - df['close']) / df['close']\n", " # df['target'] = 0 # Hold\n", " # df.loc[df['price_change'] > price_threshold, 'target'] = 1 # Buy\n", " # df.loc[df['price_change'] < -price_threshold, 'target'] = -1 # Sell (for 3-class)\n", " # For 2-class (Up/Not Up):\n", " df['target'] = (df['close'].shift(-target_horizon) > df['close']).astype(int)\n", "\n", " df.dropna(inplace=True) # Remove rows with NaN target (due to shift)\n", "\n", " feature_columns = [col for col in df.columns if col not in ['open', 'high', 'low', 'close', 'volume', 'trade_count', 'vwap', 'target', 'future_price', 'price_change']]\n", " X = df[feature_columns]\n", " y = df['target']\n", "\n", " # Time series split is crucial: DO NOT shuffle time series data for training\n", " # For a simple split:\n", " split_index = int(len(X) * (1 - test_size))\n", " X_train, X_test = X[:split_index], X[split_index:]\n", " y_train, y_test = y[:split_index], y[split_index:]\n", "\n", " # For more robust cross-validation, use TimeSeriesSplit\n", " # tscv = TimeSeriesSplit(n_splits=5)\n", " # for train_index, test_index in tscv.split(X):\n", " # X_train, X_test = X.iloc[train_index], X.iloc[test_index]\n", " # y_train, y_test = y.iloc[train_index], y.iloc[test_index]\n", " # Train and evaluate your model here\n", "\n", " return X_train, X_test, y_train, y_test, feature_columns\n", "\n", "\n", "if not featured_data.empty:\n", " X_train, X_test, y_train, y_test, feature_cols = prepare_data_for_model(featured_data.copy(), target_horizon=5)\n", "\n", " if not X_train.empty:\n", " print(f\"\\nTraining data shape: X_train: {X_train.shape}, y_train: {y_train.shape}\")\n", " print(f\"Test data shape: X_test: {X_test.shape}, y_test: {y_test.shape}\")\n", " print(f\"Features used: {feature_cols}\")\n", "\n", " # Example Model: Random Forest\n", " # model = RandomForestClassifier(n_estimators=100, random_state=42, class_weight='balanced')\n", "\n", " # Example Model: XGBoost (often performs well)\n", " model = xgb.XGBClassifier(\n", " objective='binary:logistic', # or 'multi:softprob' for multi-class\n", " n_estimators=100,\n", " learning_rate=0.1,\n", " max_depth=3,\n", " use_label_encoder=False, # Suppress a warning\n", " eval_metric='logloss' # or 'mlogloss' for multi-class\n", " )\n", "\n", " model.fit(X_train, y_train)\n", "\n", " # Evaluate on test set\n", " y_pred = model.predict(X_test)\n", " print(\"\\nModel Evaluation on Test Set:\")\n", " print(f\"Accuracy: {accuracy_score(y_test, y_pred):.4f}\")\n", " print(classification_report(y_test, y_pred, zero_division=0))\n", "\n", " # Feature importance (for tree-based models)\n", " if hasattr(model, 'feature_importances_'):\n", " importances = pd.Series(model.feature_importances_, index=X_train.columns).sort_values(ascending=False)\n", " print(\"\\nFeature Importances:\")\n", " print(importances.head(10))\n", " else:\n", " print(\"Not enough data to create training/test sets after feature engineering and target creation.\")\n", "else:\n", " print(\"Cannot prepare data for model, featured_data is empty.\")\n", " model = None # Ensure model is defined even if training fails" ] }, { "cell_type": "markdown", "id": "d5279216-4774-4173-a268-a1ca92893ac8", "metadata": {}, "source": [ "Important Considerations for Modeling:\n", "\n", " Target Definition: This is critical. Predicting direction is hard. Predicting magnitude or using a threshold (e.g., price must move > 0.1% to be a \"1\") can be better.\n", "\n", " Class Imbalance: If \"up\" signals are rare, your model might be biased. Use techniques like class_weight='balanced' (for some models) or over/undersampling (e.g., SMOTE).\n", "\n", " Stationarity: Price series are generally non-stationary. Features like returns or indicators often help.\n", "\n", " Overfitting: Models can learn noise from historical data. Robust cross-validation (like TimeSeriesSplit) and regularization are key.\n", "\n", "Step 5: Backtesting (Simplified Vectorized Example)\n", "\n", "A proper backtest is event-driven and considers transaction costs, slippage, etc. This is a very simplified version." ] }, { "cell_type": "code", "execution_count": null, "id": "72a69e0a-484d-490f-8c95-195cccfe8e40", "metadata": {}, "outputs": [], "source": [ "def run_simple_backtest(df_with_predictions, initial_capital=10000, trade_size_usd=1000, transaction_cost_pct=0.003): # Alpaca crypto fee\n", " \"\"\"\n", " A very simplified vectorized backtest.\n", " Assumes df_with_predictions has a 'signal' column (1 for buy, 0 for hold/nothing, -1 for sell if implementing).\n", " For this example, we'll assume our model's prediction (0 or 1) is the signal.\n", " 1 = Go Long, 0 = Exit Long (or do nothing if not in position)\n", " \"\"\"\n", " if df_with_predictions.empty or 'predicted_signal' not in df_with_predictions.columns:\n", " print(\"DataFrame for backtest is empty or 'predicted_signal' column missing.\")\n", " return\n", "\n", " capital = initial_capital\n", " position_btc = 0 # Amount of BTC held\n", " portfolio_value = []\n", "\n", " # Assume 'predicted_signal' comes from your model (1 for predicted up, 0 for predicted down/neutral)\n", " # Let's assume a simple strategy: if signal is 1, buy. If signal is 0 and we have a position, sell.\n", "\n", " for i in range(len(df_with_predictions)):\n", " current_price = df_with_predictions['close'].iloc[i]\n", " signal = df_with_predictions['predicted_signal'].iloc[i]\n", "\n", " # Decision logic\n", " if signal == 1 and position_btc == 0: # Buy signal and no current position\n", " # Buy\n", " amount_to_buy_btc = trade_size_usd / current_price\n", " cost = amount_to_buy_btc * current_price * (1 + transaction_cost_pct)\n", " if capital >= cost:\n", " capital -= cost\n", " position_btc += amount_to_buy_btc\n", " # print(f\"{df_with_predictions.index[i]}: BUY {amount_to_buy_btc:.6f} BTC @ {current_price:.2f}\")\n", "\n", " elif signal == 0 and position_btc > 0: # Sell signal (or neutral) and have a position\n", " # Sell\n", " proceeds = position_btc * current_price * (1 - transaction_cost_pct)\n", " capital += proceeds\n", " # print(f\"{df_with_predictions.index[i]}: SELL {position_btc:.6f} BTC @ {current_price:.2f}\")\n", " position_btc = 0\n", "\n", " current_portfolio_value = capital + (position_btc * current_price)\n", " portfolio_value.append(current_portfolio_value)\n", "\n", " df_with_predictions['portfolio_value'] = portfolio_value\n", " print(\"\\nBacktest Results:\")\n", " print(f\"Initial Capital: ${initial_capital:.2f}\")\n", " print(f\"Final Portfolio Value: ${df_with_predictions['portfolio_value'].iloc[-1]:.2f}\")\n", " returns = (df_with_predictions['portfolio_value'].iloc[-1] / initial_capital - 1) * 100\n", " print(f\"Total Return: {returns:.2f}%\")\n", "\n", " # Plotting (optional)\n", " # import matplotlib.pyplot as plt\n", " # plt.figure(figsize=(12,6))\n", " # plt.plot(df_with_predictions.index, df_with_predictions['portfolio_value'])\n", " # plt.title('Portfolio Value Over Time')\n", " # plt.xlabel('Date')\n", " # plt.ylabel('Portfolio Value ($)')\n", " # plt.show()\n", "\n", "if model and not X_test.empty:\n", " # Use the model to predict on the entire test set for backtesting\n", " # For a more realistic backtest, you'd re-train periodically or use a walk-forward approach.\n", " # Here, we're just using the single model trained on X_train.\n", " all_featured_data_for_backtest = featured_data.loc[X_test.index].copy() # Get original data rows for X_test\n", " all_featured_data_for_backtest['predicted_signal'] = model.predict(X_test) # Use the trained model\n", "\n", " run_simple_backtest(all_featured_data_for_backtest)\n", "else:\n", " print(\"Skipping backtest as model or test data is not available.\")" ] }, { "cell_type": "markdown", "id": "8f950f19-ea56-4d27-bb0c-1aa029f6f909", "metadata": {}, "source": [ "Backtesting Libraries: For more serious backtesting, consider backtrader or zipline-reloaded. They handle many complexities.\n", "\n", "Step 6: Signal Generation & Order Execution (Live/Paper Trading)\n", "\n", "This is where you'd run the bot periodically (e.g., every minute)." ] }, { "cell_type": "code", "execution_count": null, "id": "9454be39-70d8-49ed-a814-8695d69e32b2", "metadata": {}, "outputs": [], "source": [ "SYMBOL = \"BTC/USD\"\n", "TRADE_QTY_USD = 100 # Amount in USD to trade per signal. Adjust based on risk tolerance.\n", "TARGET_HORIZON_MINUTES = 5 # Same as used in training\n", "\n", "# Global model and feature_cols (assuming they are trained and available)\n", "# model = ... (your trained model)\n", "# feature_cols = ... (list of feature column names used for training)\n", "\n", "def get_latest_bar_features():\n", " \"\"\"Fetches latest bars, calculates features for the most recent one.\"\"\"\n", " # Fetch enough data to calculate all features (e.g., max lookback of your indicators)\n", " # If SMA_30 is longest, need at least 30 + target_horizon previous bars\n", " # Let's fetch more to be safe, e.g., 100 bars\n", " now = datetime.now()\n", " start_fetch_dt = (now - timedelta(minutes=150)).strftime('%Y-%m-%d %H:%M:%S') # fetch last 150 mins\n", "\n", " latest_bars_df = api.get_crypto_bars(\n", " SYMBOL,\n", " tradeapi. টাইমফ্রেম.মিনিট,\n", " start=start_fetch_dt, # Alpaca needs ISO format with T\n", " # end defaults to now\n", " limit=150 # fetch a bit more than needed for features\n", " ).df\n", " \n", " if latest_bars_df.empty or len(latest_bars_df) < 35: # Min needed for SMA_30 + some buffer\n", " print(\"Not enough recent bars to calculate features.\")\n", " return None\n", "\n", " featured_bars = create_features(latest_bars_df.copy())\n", " if featured_bars.empty:\n", " print(\"Failed to create features for latest bars.\")\n", " return None\n", " # Return only the features for the most recent complete bar\n", " return featured_bars[feature_cols].iloc[-1:] # Return as DataFrame\n", "\n", "def check_and_place_trade():\n", " global model, feature_cols # Ensure these are accessible\n", "\n", " if model is None or feature_cols is None:\n", " print(\"Model not trained or feature columns not defined. Skipping trade check.\")\n", " return\n", "\n", " print(f\"\\n{datetime.now()}: Checking for trading signal...\")\n", " current_features_df = get_latest_bar_features()\n", "\n", " if current_features_df is None or current_features_df.empty:\n", " print(\"Could not get features for the latest bar.\")\n", " return\n", "\n", " # Ensure columns are in the same order as during training\n", " current_features_df = current_features_df[feature_cols]\n", "\n", " prediction = model.predict(current_features_df)\n", " signal = prediction[0] # 0 for down/neutral, 1 for up\n", "\n", " print(f\"Raw features for prediction: {current_features_df.iloc[0].to_dict()}\")\n", " print(f\"Model prediction: {signal}\")\n", "\n", " try:\n", " positions = api.list_positions()\n", " btc_position = next((p for p in positions if p.symbol == SYMBOL), None)\n", " current_price_info = api.get_latest_crypto_quote(SYMBOL) # Use quote for more current price\n", " current_price = (current_price_info.ap + current_price_info.bp) / 2 # Mid price\n", "\n", " if not current_price:\n", " print(\"Could not get current price for BTC/USD.\")\n", " return\n", "\n", "\n", " if signal == 1: # Predicted UP - Potential BUY\n", " if btc_position is None or float(btc_position.qty) == 0:\n", " qty_to_buy = TRADE_QTY_USD / current_price\n", " print(f\"BUY signal. Attempting to buy {qty_to_buy:.6f} {SYMBOL} at ~${current_price:.2f}\")\n", " api.submit_order(\n", " symbol=SYMBOL,\n", " qty=round(qty_to_buy, 6), # Alpaca crypto needs precision\n", " side='buy',\n", " type='market',\n", " time_in_force='gtc' # Good 'til canceled\n", " )\n", " print(\"BUY order submitted.\")\n", " else:\n", " print(f\"BUY signal, but already have a position of {btc_position.qty} {SYMBOL}. Holding.\")\n", "\n", " elif signal == 0: # Predicted DOWN/NEUTRAL - Potential SELL\n", " if btc_position and float(btc_position.qty) > 0:\n", " qty_to_sell = float(btc_position.qty) # Sell entire position\n", " print(f\"SELL signal. Attempting to sell {qty_to_sell:.6f} {SYMBOL} at ~${current_price:.2f}\")\n", " api.submit_order(\n", " symbol=SYMBOL,\n", " qty=round(qty_to_sell, 6),\n", " side='sell',\n", " type='market',\n", " time_in_force='gtc'\n", " )\n", " print(\"SELL order submitted.\")\n", " else:\n", " print(\"SELL signal, but no open position to sell. Doing nothing.\")\n", " else:\n", " print(\"Neutral signal or unrecognized signal. Doing nothing.\")\n", "\n", " except Exception as e:\n", " print(f\"Error during trade execution: {e}\")\n", "\n", "# Main loop (very basic scheduler)\n", "# For a robust bot, use APScheduler or run it in a more managed environment (e.g., cloud server with cron)\n", "if __name__ == \"__main__\" and model is not None: # Ensure model is trained\n", " # This is a simplified loop. In a real bot, you'd schedule this.\n", " # For example, using APScheduler to run exactly at the start of each minute.\n", " print(\"Starting dummy trading loop (runs a few times for demo). Press Ctrl+C to stop.\")\n", " print(\"IMPORTANT: This is for PAPER TRADING ONLY.\")\n", " print(f\"Will use model: {type(model).__name__} and features: {feature_cols}\")\n", " try:\n", " # Initial run\n", " check_and_place_trade()\n", " for i in range(5): # Run for a few iterations for demo\n", " # Wait for the next minute (approximately)\n", " # A more precise scheduler (like APScheduler) is better for live trading\n", " time.sleep(60)\n", " check_and_place_trade()\n", " except KeyboardInterrupt:\n", " print(\"Trading loop stopped by user.\")\n", " except Exception as e:\n", " print(f\"An error occurred in the trading loop: {e}\")\n", "else:\n", " if model is None:\n", " print(\"Model is not trained. Cannot start trading loop.\")" ] }, { "cell_type": "markdown", "id": "8236c202-3ecf-4e5b-b1cf-f0e0ea222b80", "metadata": {}, "source": [ "Step 7: Risk Management (Conceptual)\n", "\n", " Position Sizing: Don't risk too much on a single trade (e.g., TRADE_QTY_USD should be a small % of your paper capital).\n", "\n", " Stop-Loss: Automatically sell if the price moves against you by a certain percentage or dollar amount after entering a trade. Alpaca supports stop-loss orders." ] }, { "cell_type": "code", "execution_count": null, "id": "aac9b668-35a3-4a53-ab57-c785cd7e4f42", "metadata": {}, "outputs": [], "source": [ "# Example of a market buy order with a trailing stop-loss\n", "# api.submit_order(\n", "# symbol=SYMBOL,\n", "# qty=qty_to_buy,\n", "# side='buy',\n", "# type='market',\n", "# time_in_force='day',\n", "# trail_percent='1.5' # Trail stop loss 1.5% below high water mark\n", "# )" ] }, { "cell_type": "markdown", "id": "327d8404-15c6-48fc-aeee-db45ece2df81", "metadata": {}, "source": [ " Take-Profit: Automatically sell if the price moves in your favor by a certain amount.\n", "\n", " Max Drawdown: If your total capital drops by X%, stop trading and re-evaluate.\n", "\n", "Step 8: Deployment & Monitoring\n", "\n", " Server: Run your bot on a reliable machine or cloud server (AWS EC2, Google Cloud, etc.) that's always on.\n", "\n", " Scheduling: Use cron (Linux) or APScheduler (Python library) to run your check_and_place_trade function every minute precisely.\n", "\n", " Logging: Log every decision, trade, error, and API response.\n", "\n", " Monitoring: Track P&L, number of trades, win rate, errors, API connectivity, etc. Set up alerts.\n", "\n", "Very Important Next Steps & Considerations:\n", "\n", " PAPER TRADE EXTENSIVELY. I cannot stress this enough.\n", "\n", " Improve Feature Engineering: This is key. Explore more advanced features, time-series properties, etc.\n", "\n", " Hyperparameter Tuning: Use GridSearchCV or RandomizedSearchCV (or Optuna/Hyperopt) to find optimal model parameters.\n", "\n", " Robust Backtesting: Implement a proper event-driven backtester that accounts for slippage and realistic transaction costs.\n", "\n", " Walk-Forward Optimization: Markets change. Periodically re-train your model on newer data and test on out-of-sample data that follows.\n", "\n", " Understand Alpaca's Crypto Trading Rules: Fees (currently 0.3% for taker/maker on basic accounts), minimum order sizes, precision for quantities.\n", "\n", " Error Handling & Resilience: Your bot needs to handle API errors, network issues, unexpected data, etc.\n", "\n", " The 1-Minute Problem:\n", "\n", " Noise: Extremely high noise-to-signal ratio.\n", "\n", " Latency: Your reaction time vs. market speed.\n", "\n", " Transaction Costs: Can quickly kill profitability on small moves. Your average win needs to be significantly larger than costs.\n", "\n", " Alternative ML Approaches:\n", "\n", " Reinforcement Learning: Can learn trading policies directly. Very complex.\n", "\n", " LSTMs/Transformers: For sequence modeling, but require significant data and expertise.\n", "\n", " Simplicity First: Start with a very simple model and strategy. Complexity doesn't always mean better results and makes debugging harder.\n", "\n", "This is a massive undertaking. Be patient, learn continuously, and manage your risk (even paper money risk, as it builds habits). Good luck!" ] }, { "cell_type": "markdown", "id": "0a7209fb-f7aa-4eb7-aacc-13a929b9e268", "metadata": {}, "source": [ "