Files
ebiz-dify-ai/web/app/(commonLayout)/datasets/template/template.en.mdx
Charlie.Wei 9e7efa45d4 document segmentApi Add get&update&delete operate (#1285)
Co-authored-by: luowei <glpat-EjySCyNjWiLqAED-YmwM>
2023-10-10 13:27:06 +08:00

1090 lines
35 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
import { CodeGroup } from '@/app/components/develop/code.tsx'
import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from '@/app/components/develop/md.tsx'
# Dataset API
<br/>
<br/>
<Heading
url='/datasets'
method='POST'
title='Create an empty dataset'
name='#create_empty_dataset'
/>
<Row>
<Col>
### Request Body
<Properties>
<Property name='name' type='string' key='name'>
Dataset name
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{"name": "name"}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${apiBaseUrl}/v1/datasets' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "name"
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"id": "",
"name": "name",
"description": null,
"provider": "vendor",
"permission": "only_me",
"data_source_type": null,
"indexing_technique": null,
"app_count": 0,
"document_count": 0,
"word_count": 0,
"created_by": "",
"created_at": 1695636173,
"updated_by": "",
"updated_at": 1695636173,
"embedding_model": null,
"embedding_model_provider": null,
"embedding_available": null
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets'
method='GET'
title='Dataset list'
name='#dataset_list'
/>
<Row>
<Col>
### Query
<Properties>
<Property name='page' type='string' key='page'>
Page number
</Property>
<Property name='limit' type='string' key='limit'>
Number of items returned, default 20, range 1-100
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets"
targetCode={`curl --location --request GET '${props.apiBaseUrl}/datasets?page=1&limit=20' \\\n--header 'Authorization: Bearer {api_key}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request GET '${props.apiBaseUrl}/datasets?page=1&limit=20' \
--header 'Authorization: Bearer {api_key}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data": [
{
"id": "",
"name": "name",
"description": "desc",
"permission": "only_me",
"data_source_type": "upload_file",
"indexing_technique": "",
"app_count": 2,
"document_count": 10,
"word_count": 1200,
"created_by": "",
"created_at": "",
"updated_by": "",
"updated_at": ""
},
...
],
"has_more": true,
"limit": 20,
"total": 50,
"page": 1
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/document/create_by_text'
method='POST'
title='Create a document from text'
name='#create_by_text'
/>
<Row>
<Col>
This api is based on an existing dataset and creates a new document through text based on this dataset.
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='name' type='string' key='name'>
Document name
</Property>
<Property name='text' type='string' key='text'>
Document content
</Property>
<Property name='indexing_technique' type='string' key='indexing_technique'>
Index mode
- <code>high_quality</code> High quality: embedding using embedding model, built as vector database index
- <code>economy</code> Economy: Build using inverted index of Keyword Table Index
</Property>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
- <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate
- <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- <code>remove_urls_emails</code> Delete URL, email address
- <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- <code>segmentation</code> (object) segmentation rules
- <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/document/create_by_text"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/create_by_text' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{"name": "text","text": "text","indexing_technique": "high_quality","process_rule": {"mode": "automatic"}}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/create_by_text' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "text",
"text": "text",
"indexing_technique": "high_quality",
"process_rule": {
"mode": "automatic"
}
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"document": {
"id": "",
"position": 1,
"data_source_type": "upload_file",
"data_source_info": {
"upload_file_id": ""
},
"dataset_process_rule_id": "",
"name": "text.txt",
"created_from": "api",
"created_by": "",
"created_at": 1695690280,
"tokens": 0,
"indexing_status": "waiting",
"error": null,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"archived": false,
"display_status": "queuing",
"word_count": 0,
"hit_count": 0,
"doc_form": "text_model"
},
"batch": ""
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/document/create_by_file'
method='POST'
title='Create documents from files'
name='#create_by_file'
/>
<Row>
<Col>
This api is based on an existing dataset and creates a new document through a file based on this dataset.
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='original_document_id' type='string' key='original_document_id'>
Source document ID (optional)
- Used to re-upload the document or modify the document cleaning and segmentation configuration. The missing information is copied from the source document
- The source document cannot be an archived document
- When original_document_id is passed in, the update operation is performed on behalf of the document. process_rule is a fillable item. If not filled in, the segmentation method of the source document will be used by defaul
- When original_document_id is not passed in, the new operation is performed on behalf of the document, and process_rule is required
</Property>
<Property name='file' type='multipart/form-data' key='file'>
Files that need to be uploaded.
</Property>
<Property name='indexing_technique' type='string' key='indexing_technique'>
Index mode
- <code>high_quality</code> High quality: embedding using embedding model, built as vector database index
- <code>economy</code> Economy: Build using inverted index of Keyword Table Index
</Property>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
- <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate
- <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- <code>remove_urls_emails</code> Delete URL, email address
- <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- <code>segmentation</code> (object) segmentation rules
- <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/document/create_by_file"
targetCode={`curl --location POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/create_by_file' \\\n--header 'Authorization: Bearer {api_key}' \\\n--form 'data="{"name":"Dify","indexing_technique":"high_quality","process_rule":{"rules":{"pre_processing_rules":[{"id":"remove_extra_spaces","enabled":true},{"id":"remove_urls_emails","enabled":true}],"segmentation":{"separator":"###","max_tokens":500}},"mode":"custom"}}";type=text/plain' \\\n--form 'file=@"/path/to/file"'`}
>
```bash {{ title: 'cURL' }}
curl --location POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/create_by_file' \
--header 'Authorization: Bearer {api_key}' \
--form 'data="{\"name\":\"Dify\",\"indexing_technique\":\"high_quality\",\"process_rule\":{\"rules\":{\"pre_processing_rules\":[{\"id\":\"remove_extra_spaces\",\"enabled\":true},{\"id\":\"remove_urls_emails\",\"enabled\":true}],\"segmentation\":{\"separator\":\"###\",\"max_tokens\":500}},\"mode\":\"custom\"}}";type=text/plain' \
--form 'file=@"/path/to/file"'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"document": {
"id": "",
"position": 1,
"data_source_type": "upload_file",
"data_source_info": {
"upload_file_id": ""
},
"dataset_process_rule_id": "",
"name": "Dify.txt",
"created_from": "api",
"created_by": "",
"created_at": 1695308667,
"tokens": 0,
"indexing_status": "waiting",
"error": null,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"archived": false,
"display_status": "queuing",
"word_count": 0,
"hit_count": 0,
"doc_form": "text_model"
},
"batch": ""
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}/update_by_text'
method='POST'
title='Update document via text'
name='#update_by_text'
/>
<Row>
<Col>
This api is based on an existing dataset and updates the document through text based on this dataset.
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='name' type='string' key='name'>
Document name (optional)
</Property>
<Property name='text' type='string' key='text'>
Document content (optional)
</Property>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
- <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate
- <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- <code>remove_urls_emails</code> Delete URL, email address
- <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- <code>segmentation</code> (object) segmentation rules
- <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/documents/{document_id}/update_by_text"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/update_by_text' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{"name": "name","text": "text"}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/update_by_text' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "name",
"text": "text"
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"document": {
"id": "",
"position": 1,
"data_source_type": "upload_file",
"data_source_info": {
"upload_file_id": ""
},
"dataset_process_rule_id": "",
"name": "name.txt",
"created_from": "api",
"created_by": "",
"created_at": 1695308667,
"tokens": 0,
"indexing_status": "waiting",
"error": null,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"archived": false,
"display_status": "queuing",
"word_count": 0,
"hit_count": 0,
"doc_form": "text_model"
},
"batch": ""
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}/update_by_file'
method='POST'
title='Update a document from a file'
name='#update_by_file'
/>
<Row>
<Col>
This api is based on an existing dataset, and updates documents through files based on this dataset
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='name' type='string' key='name'>
Document name (optional)
</Property>
<Property name='file' type='multipart/form-data' key='file'>
Files to be uploaded
</Property>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
- <code>mode</code> (string) Cleaning, segmentation mode, automatic / custom
- <code>rules</code> (object) Custom rules (in automatic mode, this field is empty)
- <code>pre_processing_rules</code> (array[object]) Preprocessing rules
- <code>id</code> (string) Unique identifier for the preprocessing rule
- enumerate
- <code>remove_extra_spaces</code> Replace consecutive spaces, newlines, tabs
- <code>remove_urls_emails</code> Delete URL, email address
- <code>enabled</code> (bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
- <code>segmentation</code> (object) segmentation rules
- <code>separator</code> Custom segment identifier, currently only allows one delimiter to be set. Default is \n
- <code>max_tokens</code> Maximum length (token) defaults to 1000
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/documents/{document_id}/update_by_file"
targetCode={`curl --location POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/{document_id}/create_by_file' \\\n--header 'Authorization: Bearer {api_key}' \\\n--form 'data="{"name":"Dify","indexing_technique":"high_quality","process_rule":{"rules":{"pre_processing_rules":[{"id":"remove_extra_spaces","enabled":true},{"id":"remove_urls_emails","enabled":true}],"segmentation":{"separator":"###","max_tokens":500}},"mode":"custom"}}";type=text/plain' \\\n--form 'file=@"/path/to/file"'`}
>
```bash {{ title: 'cURL' }}
curl --location POST '${props.apiBaseUrl}/datasets/{dataset_id}/document/{document_id}/create_by_file' \
--header 'Authorization: Bearer {api_key}' \
--form 'data="{\"name\":\"Dify\",\"indexing_technique\":\"high_quality\",\"process_rule\":{\"rules\":{\"pre_processing_rules\":[{\"id\":\"remove_extra_spaces\",\"enabled\":true},{\"id\":\"remove_urls_emails\",\"enabled\":true}],\"segmentation\":{\"separator\":\"###\",\"max_tokens\":500}},\"mode\":\"custom\"}}";type=text/plain' \
--form 'file=@"/path/to/file"'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"document": {
"id": "",
"position": 1,
"data_source_type": "upload_file",
"data_source_info": {
"upload_file_id": ""
},
"dataset_process_rule_id": "",
"name": "Dify.txt",
"created_from": "api",
"created_by": "",
"created_at": 1695308667,
"tokens": 0,
"indexing_status": "waiting",
"error": null,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"archived": false,
"display_status": "queuing",
"word_count": 0,
"hit_count": 0,
"doc_form": "text_model"
},
"batch": "20230921150427533684"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/batch/{batch}/indexing-status'
method='GET'
title='Get document embedding status (progress)'
name='#indexing_status'
/>
<Row>
<Col>
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
<Property name='batch' type='string' key='batch'>
Batch number of uploaded documents
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="GET"
label="/datasets/{dataset_id}/batch/{batch}/indexing-status"
targetCode={`curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{batch}/indexing-status' \\\n--header 'Authorization: Bearer {api_key}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{batch}/indexing-status' \
--header 'Authorization: Bearer {api_key}' \
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data":[{
"id": "",
"indexing_status": "indexing",
"processing_started_at": 1681623462.0,
"parsing_completed_at": 1681623462.0,
"cleaning_completed_at": 1681623462.0,
"splitting_completed_at": 1681623462.0,
"completed_at": null,
"paused_at": null,
"error": null,
"stopped_at": null,
"completed_segments": 24,
"total_segments": 100
}]
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}'
method='DELETE'
title='Delete document'
name='#delete_document'
/>
<Row>
<Col>
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="DELETE"
label="/datasets/{dataset_id}/documents/{document_id}"
targetCode={`curl --location --request DELETE '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}' \\\n--header 'Authorization: Bearer {api_key}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request DELETE '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}' \
--header 'Authorization: Bearer {api_key}' \
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"result": "success"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents'
method='GET'
title='Dataset document list'
name='#dataset_document_list'
/>
<Row>
<Col>
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
</Properties>
### Query
<Properties>
<Property name='keyword' type='string' key='keyword'>
Search keywords, currently only search document names(optional)
</Property>
<Property name='page' type='string' key='page'>
Page number(optional)
</Property>
<Property name='limit' type='string' key='limit'>
Number of items returned, default 20, range 1-100(optional)
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="GET"
label="/datasets/{dataset_id}/documents"
targetCode={`curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents' \\\n--header 'Authorization: Bearer {api_key}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents' \
--header 'Authorization: Bearer {api_key}' \
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data": [
{
"id": "",
"position": 1,
"data_source_type": "file_upload",
"data_source_info": null,
"dataset_process_rule_id": null,
"name": "dify",
"created_from": "",
"created_by": "",
"created_at": 1681623639,
"tokens": 0,
"indexing_status": "waiting",
"error": null,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"archived": false
},
],
"has_more": false,
"limit": 20,
"total": 9,
"page": 1
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}/segments'
method='POST'
title='Add segment'
name='#create_new_segment'
/>
<Row>
<Col>
### Params
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='segments' type='object list' key='segments'>
- <code>content</code> (text) Text content/question content, required
- <code>answer</code> (text) Answer content, if the mode of the data set is qa mode, pass the value(optional)
- <code>keywords</code> (list) Keywords(optional)
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/documents/{document_id}/segments"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json' \\\n--data-raw '{"segments": [{"content": "1","answer": "1","keywords": ["a"]}]}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json' \
--data-raw '{
"segments": [
{
"content": "1",
"answer": "1",
"keywords": ["a"]
}
]
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data": [{
"id": "",
"position": 1,
"document_id": "",
"content": "1",
"answer": "1",
"word_count": 25,
"tokens": 0,
"keywords": [
"a"
],
"index_node_id": "",
"index_node_hash": "",
"hit_count": 0,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"status": "completed",
"created_by": "",
"created_at": 1695312007,
"indexing_at": 1695312007,
"completed_at": 1695312007,
"error": null,
"stopped_at": null
}],
"doc_form": "text_model"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/documents/{document_id}/segments'
method='GET'
title='get documents segments'
name='#get_segment'
/>
<Row>
<Col>
### Path
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
<Property name='document_id' type='string' key='document_id'>
Document ID
</Property>
</Properties>
### Query
<Properties>
<Property name='keyword' type='string' key='keyword'>
keywordchoosable
</Property>
<Property name='status' type='string' key='status'>
Search statuscompleted
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="GET"
label="/datasets/{dataset_id}/documents/{document_id}/segments"
targetCode={`curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json'`}
>
```bash {{ title: 'cURL' }}
curl --location --request GET '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data": [{
"id": "",
"position": 1,
"document_id": "",
"content": "1",
"answer": "1",
"word_count": 25,
"tokens": 0,
"keywords": [
"a"
],
"index_node_id": "",
"index_node_hash": "",
"hit_count": 0,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"status": "completed",
"created_by": "",
"created_at": 1695312007,
"indexing_at": 1695312007,
"completed_at": 1695312007,
"error": null,
"stopped_at": null
}],
"doc_form": "text_model"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/segments/{segment_id}'
method='DELETE'
title='delete document segment'
name='#delete_segment'
/>
<Row>
<Col>
### Path
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
<Property name='segment_id' type='string' key='segment_id'>
Document Segment ID
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="DELETE"
label="/datasets/{dataset_id}/segments/{segment_id}"
targetCode={`curl --location --request DELETE '${props.apiBaseUrl}/datasets/{dataset_id}/segments/{segment_id}' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json'`}
>
```bash {{ title: 'cURL' }}
curl --location --request DELETE '${props.apiBaseUrl}/datasets/{dataset_id}/segments/{segment_id}' \
--header 'Authorization: Bearer {api_key}' \
--header 'Content-Type: application/json'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"result": "success"
}
```
</CodeGroup>
</Col>
</Row>
---
<Heading
url='/datasets/{dataset_id}/segments/{segment_id}'
method='POST'
title='update document segment'
name='#update_segment'
/>
<Row>
<Col>
### POST
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
</Property>
<Property name='segment_id' type='string' key='segment_id'>
Document Segment ID
</Property>
</Properties>
### Request Body
<Properties>
<Property name='segments' type='object list' key='segments'>
- <code>content</code> (text) text content/question contentrequired
- <code>answer</code> (text) Answer content, not required, passed if the data set is in qa mode
- <code>keywords</code> (list) keyword, not required
- <code>enabled</code> (bool) false/true, not required
</Property>
</Properties>
</Col>
<Col sticky>
<CodeGroup
title="Request"
tag="POST"
label="/datasets/{dataset_id}/segments/{segment_id}"
targetCode={`curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments/{segment_id}' \\\n--header 'Authorization: Bearer {api_key}' \\\n--header 'Content-Type: application/json'\\\n--data-raw '{\"segments\": {\"content\": \"1\",\"answer\": \"1\", \"keywords\": [\"a\"], \"enabled\": false}}'`}
>
```bash {{ title: 'cURL' }}
curl --location --request POST '${props.apiBaseUrl}/datasets/{dataset_id}/documents/{document_id}/segments/{segment_id}' \
--header 'Content-Type: application/json' \
--data-raw '{
"segments": {
"content": "1",
"answer": "1",
"keywords": ["a"],
"enabled": false
}
}'
```
</CodeGroup>
<CodeGroup title="Response">
```json {{ title: 'Response' }}
{
"data": [{
"id": "",
"position": 1,
"document_id": "",
"content": "1",
"answer": "1",
"word_count": 25,
"tokens": 0,
"keywords": [
"a"
],
"index_node_id": "",
"index_node_hash": "",
"hit_count": 0,
"enabled": true,
"disabled_at": null,
"disabled_by": null,
"status": "completed",
"created_by": "",
"created_at": 1695312007,
"indexing_at": 1695312007,
"completed_at": 1695312007,
"error": null,
"stopped_at": null
}],
"doc_form": "text_model"
}
```
</CodeGroup>
</Col>
</Row>
---
<Row>
<Col>
### Error message
<Properties>
<Property name='code' type='string' key='code'>
Error code
</Property>
</Properties>
<Properties>
<Property name='status' type='number' key='status'>
Error status
</Property>
</Properties>
<Properties>
<Property name='message' type='string' key='message'>
Error message
</Property>
</Properties>
</Col>
<Col>
<CodeGroup title="Example">
```json {{ title: 'Response' }}
{
"code": "no_file_uploaded",
"message": "Please upload your file.",
"status": 400
}
```
</CodeGroup>
</Col>
</Row>
<table className="max-w-auto border-collapse border border-slate-400" style={{ maxWidth: 'none', width: 'auto' }}>
<thead style={{ background: '#f9fafc' }}>
<tr>
<th class="p-2 border border-slate-300">code</th>
<th class="p-2 border border-slate-300">status</th>
<th class="p-2 border border-slate-300">message</th>
</tr>
</thead>
<tbody>
<tr>
<td class="p-2 border border-slate-300">no_file_uploaded</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Please upload your file.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">too_many_files</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Only one file is allowed.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">file_too_large</td>
<td class="p-2 border border-slate-300">413</td>
<td class="p-2 border border-slate-300">File size exceeded.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">unsupported_file_type</td>
<td class="p-2 border border-slate-300">415</td>
<td class="p-2 border border-slate-300">File type not allowed.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">high_quality_dataset_only</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Current operation only supports 'high-quality' datasets.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">dataset_not_initialized</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The dataset is still being initialized or indexing. Please wait a moment.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">archived_document_immutable</td>
<td class="p-2 border border-slate-300">403</td>
<td class="p-2 border border-slate-300">The archived document is not editable.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">dataset_name_duplicate</td>
<td class="p-2 border border-slate-300">409</td>
<td class="p-2 border border-slate-300">The dataset name already exists. Please modify your dataset name.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">invalid_action</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Invalid action.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">document_already_finished</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The document has been processed. Please refresh the page or go to the document details.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">document_indexing</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The document is being processed and cannot be edited.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">invalid_metadata</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The metadata content is incorrect. Please check and verify.</td>
</tr>
</tbody>
</table>
<div class="pb-4" />