Create tasks from a URL

octoparse detect opens the Octoparse extension browser, inspects the page, and generates a local task file. Use it when you want to create a task from a URL without the Octoparse desktop app. Three modes are available:

Mode	When to use
`--auto`	You want the CLI to pick the best data region automatically
`--manual`	You need to log in, dismiss a paywall, or select the region yourself
AI agent (`--agent`)	An LLM or automation tool is driving the workflow

detect requires a valid Octoparse account and credentials. Local Chrome is also required. detect does not support Linux arm64. See Installation for platform requirements.

Automatic mode

The CLI picks the best candidate data region and generates a task file:

octoparse detect <url> --auto --output task.json

Pass a natural-language description of what you want to extract:

octoparse detect <url> --auto --goal "Extract product titles and prices" --output task.json

Search for a keyword before detecting, useful for search result pages:

octoparse detect <url> --auto --query "keyword" --goal "Extract search results" --output task.json

Use --json for a structured response:

octoparse detect <url> --auto --goal "..." --output task.json --json

If --output is omitted, a detected_<host>.json file is created automatically.

Manual mode

Manual mode opens a browser overlay where you can complete login, dismiss popups, and select the data region yourself:

octoparse detect <url> --manual
octoparse detect <url> --manual --goal "Get article titles and links"

Use --save-session to store cookies for sites that require login, so future local runs can replay the session:

octoparse detect <url> --manual --save-session --session-name my-session --output task.json

Cookie sessions do not cover every site, especially pages that require localStorage, device binding, or fresh verification.

Validate the generated task

After generating a task file, validate it before running:

octoparse task validate <taskId> --task-file task.json

Then run a local sample:

octoparse run <taskId> --task-file task.json --max-rows 10 --headless

Export the sample results:

octoparse data export <taskId> --source local --format xlsx

AI agent workflow

For LLM-driven or automated workflows, use the agent contract instead of --auto. Start by reading the capabilities:

octoparse capabilities --json

The response includes machineContract.recipes.createTaskFromUrlWithAgent — the recommended workflow for agents.

One-shot (fastest)

Use --agent with a trusted local runner that can read a context file and write a plan:

octoparse detect <url> \
  --agent \
  --agent-command "path/to/your/agent-runner" \
  --goal "Extract search results" \
  --output task.json \
  --yes \
  --run-sample 5 \
  --json

The response is a single JSON envelope containing the generated task, preview result, and sample run output.

Auditable step-by-step

For audit or repair scenarios, use the prepare / preview / apply sequence instead of generating the task in one step.

Prepare agent context

Export webpage context for agent planning.

octoparse detect <url> \
  --prepare-agent \
  --goal "Extract product titles and prices" \
  --output context.json \
  --json

This generates context.json with candidate data regions, field samples, visual screenshots, and a decisionSummary.

Write a plan

Create a plan.json based on context.json.Use the octopus.detect.agent-plan.v1 schema. Open the annotated screenshot path in context.visualArtifacts.annotatedScreenshotPath before choosing fields, and include visualReview evidence in the plan.

Preview the plan

Validate the plan before generating the final task file.

octoparse detect \
  --preview-agent-plan plan.json \
  --agent-context context.json \
  --json

If data.pass is false, revise the plan before applying it.

Apply the plan

Generate the final local task file.

octoparse detect \
  --apply-agent-plan plan.json \
  --agent-context context.json \
  --output task.json

The generated task.json can then be inspected, validated, or used in a local CLI run.

Troubleshooting detect

Issue	What to check
Chrome fails to launch	Run `octoparse doctor` and check the `chrome` entry. Try `--chrome-path /path/to/chrome`
`LINUX_ARM64_UNSUPPORTED`	Switch to a Linux x64 environment or use cloud extraction
`LOGIN_SESSION_REQUIRED`	Use `--manual` to log in and `--save-session` to store the session
Plan preview returns `pass: false`	Revise `candidateId` or field selection in `plan.json` and re-run preview
Task produces empty or wrong results	Check `context.resultValidationPolicy` in the agent context; isolated missing fields in ads or heterogeneous rows are often normal

Automatic mode

Manual mode

Validate the generated task

AI agent workflow

One-shot (fastest)

Auditable step-by-step

Troubleshooting detect

What’s next

Run your first task

Command cheatsheet

​Automatic mode

​Manual mode

​Validate the generated task

​AI agent workflow

​One-shot (fastest)

​Auditable step-by-step

​Troubleshooting detect

​What’s next

Run your first task

Command cheatsheet

Automatic mode

Manual mode

Validate the generated task

AI agent workflow

One-shot (fastest)

Auditable step-by-step

Troubleshooting detect

What’s next