Skip to main content
octoparse detect opens the Octoparse extension browser, inspects the page, and generates a local task file. Use it when you want to create a task from a URL without the Octoparse desktop app. Three modes are available:
ModeWhen to use
--autoYou want the CLI to pick the best data region automatically
--manualYou need to log in, dismiss a paywall, or select the region yourself
AI agent (--agent)An LLM or automation tool is driving the workflow
detect requires a valid Octoparse account and credentials. Local Chrome is also required. detect does not support Linux arm64. See Installation for platform requirements.

Automatic mode

The CLI picks the best candidate data region and generates a task file:
octoparse detect <url> --auto --output task.json
Pass a natural-language description of what you want to extract:
octoparse detect <url> --auto --goal "Extract product titles and prices" --output task.json
Search for a keyword before detecting, useful for search result pages:
octoparse detect <url> --auto --query "keyword" --goal "Extract search results" --output task.json
Use --json for a structured response:
octoparse detect <url> --auto --goal "..." --output task.json --json
If --output is omitted, a detected_<host>.json file is created automatically.

Manual mode

Manual mode opens a browser overlay where you can complete login, dismiss popups, and select the data region yourself:
octoparse detect <url> --manual
octoparse detect <url> --manual --goal "Get article titles and links"
Use --save-session to store cookies for sites that require login, so future local runs can replay the session:
octoparse detect <url> --manual --save-session --session-name my-session --output task.json
Cookie sessions do not cover every site, especially pages that require localStorage, device binding, or fresh verification.

Validate the generated task

After generating a task file, validate it before running:
octoparse task validate <taskId> --task-file task.json
Then run a local sample:
octoparse run <taskId> --task-file task.json --max-rows 10 --headless
Export the sample results:
octoparse data export <taskId> --source local --format xlsx

AI agent workflow

For LLM-driven or automated workflows, use the agent contract instead of --auto. Start by reading the capabilities:
octoparse capabilities --json
The response includes machineContract.recipes.createTaskFromUrlWithAgent — the recommended workflow for agents.

One-shot (fastest)

Use --agent with a trusted local runner that can read a context file and write a plan:
octoparse detect <url> \
  --agent \
  --agent-command "path/to/your/agent-runner" \
  --goal "Extract search results" \
  --output task.json \
  --yes \
  --run-sample 5 \
  --json
The response is a single JSON envelope containing the generated task, preview result, and sample run output.

Auditable step-by-step

For audit or repair scenarios, use the prepare / preview / apply sequence instead of generating the task in one step.
1

Prepare agent context

Export webpage context for agent planning.
octoparse detect <url> \
  --prepare-agent \
  --goal "Extract product titles and prices" \
  --output context.json \
  --json
This generates context.json with candidate data regions, field samples, visual screenshots, and a decisionSummary.
2

Write a plan

Create a plan.json based on context.json.Use the octopus.detect.agent-plan.v1 schema. Open the annotated screenshot path in context.visualArtifacts.annotatedScreenshotPath before choosing fields, and include visualReview evidence in the plan.
3

Preview the plan

Validate the plan before generating the final task file.
octoparse detect \
  --preview-agent-plan plan.json \
  --agent-context context.json \
  --json
If data.pass is false, revise the plan before applying it.
4

Apply the plan

Generate the final local task file.
octoparse detect \
  --apply-agent-plan plan.json \
  --agent-context context.json \
  --output task.json
The generated task.json can then be inspected, validated, or used in a local CLI run.

Troubleshooting detect

IssueWhat to check
Chrome fails to launchRun octoparse doctor and check the chrome entry. Try --chrome-path /path/to/chrome
LINUX_ARM64_UNSUPPORTEDSwitch to a Linux x64 environment or use cloud extraction
LOGIN_SESSION_REQUIREDUse --manual to log in and --save-session to store the session
Plan preview returns pass: falseRevise candidateId or field selection in plan.json and re-run preview
Task produces empty or wrong resultsCheck context.resultValidationPolicy in the agent context; isolated missing fields in ads or heterogeneous rows are often normal

What’s next

Run your first task

Run a generated task locally, check status, and export results.

Command cheatsheet

Full reference for detect, run, cloud, data, and auth commands.