集合中的实体是共享相同字段集的数据记录。每个数据记录中的字段值构成一个实体。本节介绍如何将实体插入到集合中。
概述
在Milvus中,实体是指集合中的数据记录,这些数据记录共享相同的模式(Schema),每一行中的字段数据构成一个实体。因此,同一集合中的实体具有相同的属性(例如字段名称、数据类型和其他约束)。
当向集合中插入实体时,只有当实体包含模式中定义的所有字段时,插入才会成功。插入的实体将按插入顺序进入名为_default的分区。假如某个分区已存在,也可以通过在插入请求中指定分区名称,将实体插入到该分区中。
Milvus还支持动态字段,以保持集合的可扩展性。当启用动态字段时,可以将模式中未定义的字段插入到集合中。这些字段及其值将作为键值对存储在名为$meta的保留字段中。有关动态字段的更多信息,请参见《动态字段》。
将实体插入集合
在插入数据之前,您需要根据模式将数据组织成字典列表,每个字典代表一个实体,并包含所有在模式中定义的字段。如果集合启用了动态字段,则每个字典还可以包含未在模式中定义的字段。
在本节中,您将向以快速设置方式创建的集合中插入实体。通过这种方式创建的集合只有两个字段,分别为id和vector。此外,该集合已启用动态字段,因此示例代码中的实体包含一个未在模式中定义的字段——color。
pythonfrom pymilvus import MilvusClient
client = MilvusClient(
uri="http://localhost:19530",
token="root:Milvus"
)
data=[
{"id": 0, "vector": [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592], "color": "pink_8682"},
{"id": 1, "vector": [0.19886812562848388, 0.06023560599112088, 0.6976963061752597, 0.2614474506242501, 0.838729485096104], "color": "red_7025"},
{"id": 2, "vector": [0.43742130801983836, -0.5597502546264526, 0.6457887650909682, 0.7894058910881185, 0.20785793220625592], "color": "orange_6781"},
{"id": 3, "vector": [0.3172005263489739, 0.9719044792798428, -0.36981146090600725, -0.4860894583077995, 0.95791889146345], "color": "pink_9298"},
{"id": 4, "vector": [0.4452349528804562, -0.8757026943054742, 0.8220779437047674, 0.46406290649483184, 0.30337481143159106], "color": "red_4794"},
{"id": 5, "vector": [0.985825131989184, -0.8144651566660419, 0.6299267002202009, 0.1206906911183383, -0.1446277761879955], "color": "yellow_4222"},
{"id": 6, "vector": [0.8371977790571115, -0.015764369584852833, -0.31062937026679327, -0.562666951622192, -0.8984947637863987], "color": "red_9392"},
{"id": 7, "vector": [-0.33445148015177995, -0.2567135004164067, 0.8987539745369246, 0.9402995886420709, 0.5378064918413052], "color": "grey_8510"},
{"id": 8, "vector": [0.39524717779832685, 0.4000257286739164, -0.5890507376891594, -0.8650502298996872, -0.6140360785406336], "color": "white_9381"},
{"id": 9, "vector": [0.5718280481994695, 0.24070317428066512, -0.3737913482606834, -0.06726932177492717, -0.6980531615588608], "color": "purple_4976"}
]
res = client.insert(
collection_name="quick_setup",
data=data
)
print(res)
# 输出
# {'insert_count': 10, 'ids': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]}
将实体插入分区
您还可以将实体插入到指定的分区中。以下代码假设您在集合中有一个名为PartitionA
的分区。
pythondata=[
{"id": 10, "vector": [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592], "color": "pink_8682"},
{"id": 11, "vector": [0.19886812562848388, 0.06023560599112088, 0.6976963061752597, 0.2614474506242501, 0.838729485096104], "color": "red_7025"},
{"id": 12, "vector": [0.43742130801983836, -0.5597502546264526, 0.6457887650909682, 0.7894058910881185, 0.20785793220625592], "color": "orange_6781"},
{"id": 13, "vector": [0.3172005263489739, 0.9719044792798428, -0.36981146090600725, -0.4860894583077995, 0.95791889146345], "color": "pink_9298"},
{"id": 14, "vector": [0.4452349528804562, -0.8757026943054742, 0.8220779437047674, 0.46406290649483184, 0.30337481143159106], "color": "red_4794"},
{"id": 15, "vector": [0.985825131989184, -0.8144651566660419, 0.6299267002202009, 0.1206906911183383, -0.1446277761879955], "color": "yellow_4222"},
{"id": 16, "vector": [0.8371977790571115, -0.015764369584852833, -0.31062937026679327, -0.562666951622192, -0.8984947637863987], "color": "red_9392"},
{"id": 17, "vector": [-0.33445148015177995, -0.2567135004164067, 0.8987539745369246, 0.9402995886420709, 0.5378064918413052], "color": "grey_8510"},
{"id": 18, "vector": [0.39524717779832685, 0.4000257286739164, -0.5890507376891594, -0.8650502298996872, -0.6140360785406336], "color": "white_9381"},
{"id": 19, "vector": [0.5718280481994695, 0.24070317428066512, -0.3737913482606834, -0.06726932177492717, -0.6980531615588608], "color": "purple_4976"}
]
res = client.insert(
collection_name="quick_setup",
partition_name="partitionA",
data=data
)
print(res)
# 输出
# {'insert_count': 10, 'ids': [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]}
Upsert 操作结合了更新和插入数据的功能。Milvus 会通过检查主键是否存在来决定执行更新操作还是插入操作。本节将介绍如何进行实体的插入或更新,并阐述在不同场景下 Upsert 操作的具体行为。
当你需要更新集合中的实体,或不确定是进行更新还是插入时,可以尝试使用 Upsert 操作。在使用该操作时,必须确保 Upsert 请求中包含实体的主键,否则会出现错误。收到 Upsert 请求后,Milvus 将执行以下流程:
本节介绍如何将实体插入或更新到通过快速设置方式创建的集合中。通过该方式创建的集合只有两个字段,分别是 id
和 vector
。此外,该集合启用了动态字段,因此示例代码中的实体包含一个名为 color
的字段,这个字段在 Schema 中未定义。
pythonfrom pymilvus import MilvusClient
client = MilvusClient(
uri="http://localhost:19530",
token="root:Milvus"
)
data = [
{"id": 0, "vector": [-0.619954382375778, 0.4479436794798608, -0.17493894838751745, -0.4248030059917294, -0.8648452746018911], "color": "black_9898"},
{"id": 1, "vector": [0.4762662251462588, -0.6942502138717026, -0.4490002642657902, -0.628696575798281, 0.9660395877041965], "color": "red_7319"},
{"id": 2, "vector": [-0.8864122635045097, 0.9260170474445351, 0.801326976181461, 0.6383943392381306, 0.7563037341572827], "color": "white_6465"},
{"id": 3, "vector": [0.14594326235891586, -0.3775407299900644, -0.3765479013078812, 0.20612075380355122, 0.4902678929632145], "color": "orange_7580"},
{"id": 4, "vector": [0.4548498669607359, -0.887610217681605, 0.5655081329910452, 0.19220509387904117, 0.016513983433433577], "color": "red_3314"},
{"id": 5, "vector": [0.11755001847051827, -0.7295149788999611, 0.2608115847524266, -0.1719167007897875, 0.7417611743754855], "color": "black_9955"},
{"id": 6, "vector": [0.9363032158314308, 0.030699901477745373, 0.8365910312319647, 0.7823840208444011, 0.2625222076909237], "color": "yellow_2461"},
{"id": 7, "vector": [0.0754823906014721, -0.6390658668265143, 0.5610517334334937, -0.8986261118798251, 0.9372056764266794], "color": "white_5015"},
{"id": 8, "vector": [-0.3038434006935904, 0.1279149203380523, 0.503958664270957, -0.2622661156746988, 0.7407627307791929], "color": "purple_6414"},
{"id": 9, "vector": [-0.7125086947677588, -0.8050968321012257, -0.32608864121785786, 0.3255654958645424, 0.26227968923834233], "color": "brown_7231"}
]
res = client.upsert(
collection_name='quick_setup',
data=data
)
print(res)
# 输出
# {'upsert_count': 10}
你也可以将实体插入到指定的分区中。以下代码假设你在集合中有一个名为 PartitionA
的分区。
pythondata = [
{"id": 10, "vector": [0.06998888224297328, 0.8582816610326578, -0.9657938677934292, 0.6527905683627726, -0.8668460657158576], "color": "black_3651"},
{"id": 11, "vector": [0.6060703043917468, -0.3765080534566074, -0.7710758854987239, 0.36993888322346136, 0.5507513364206531], "color": "grey_2049"},
{"id": 12, "vector": [-0.9041813104515337, -0.9610546012461163, 0.20033003106083358, 0.11842506351635174, 0.8327356724591011], "color": "blue_6168"},
{"id": 13, "vector": [0.3202914977909075, -0.7279137773695252, -0.04747830871620273, 0.8266053056909548, 0.8277957187455489], "color": "blue_1672"},
{"id": 14, "vector": [0.2975811497890859, 0.2946936202691086, 0.5399463833894609, 0.8385334966677529, -0.4450543984655133], "color": "pink_1601"},
{"id": 15, "vector": [-0.04697464305600074, -0.08509022265734134, 0.9067184632552001, -0.2281912685064822, -0.9747503428652762], "color": "yellow_9925"},
{"id": 16, "vector": [-0.9363075919673911, -0.8153981031085669, 0.7943039120490902, -0.2093886809842529, 0.0771191335807897], "color": "orange_9872"},
{"id": 17, "vector": [-0.050451522820639916, 0.18931572752321935, 0.7522886192190488, -0.9071793089474034, 0.6032647330692296], "color": "red_6450"},
{"id": 18, "vector": [-0.9181544231141592, 0.6700755998126806, -0.014174674636136642, 0.6325780463623432, -0.49662222164032976], "color": "purple_7392"},
{"id": 19, "vector": [0.11426945899602536, 0.6089190684002581, -0.5842735738352236, 0.057050610092692855, -0.035163433018196244], "color": "pink_4996"}
]
res = client.upsert(
collection_name="quick_setup",
data=data,
partition_name="partitionA"
)
print(res)
# 输出
# {'upsert_count': 10}
您可以通过过滤条件或主键删除不再需要的实体。
通过过滤条件删除实体
在批量删除具有相同属性的多个实体时,可以使用过滤表达式。以下示例代码使用 in
运算符批量删除所有颜色字段为 red
和 green
的实体。您还可以使用其他运算符构造满足需求的过滤表达式。如需了解更多关于过滤表达式的信息,请参考元数据过滤。
pythonfrom pymilvus import MilvusClient
client = MilvusClient(
uri="http://localhost:19530",
token="root:Milvus"
)
res = client.delete(
collection_name="quick_setup",
# highlight-next-line
filter="color in ['red_3314', 'purple_7392']"
)
print(res)
# Output
# {'delete_count': 2}
通过主键删除实体
在大多数情况下,主键唯一标识一个实体。您可以通过在删除请求中设置其主键来删除实体。以下示例代码演示了如何删除主键为18和19的两个实体。
pythonres = client.delete(
collection_name="quick_setup",
# highlight-next-line
ids=[18, 19]
)
print(res)
# Output
# {'delete_count': 2}
从分区中删除实体
您还可以删除存储在特定分区中的实体。以下代码假设您的集合中有一个名为 PartitionA
的分区。
pythonres = client.delete(
collection_name="quick_setup",
ids=[18, 19],
partition_name="partitionA"
)
print(res)
# Output
# {'delete_count': 2}
本文作者:Dong
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC。本作品采用《知识共享署名-非商业性使用 4.0 国际许可协议》进行许可。您可以在非商业用途下自由转载和修改,但必须注明出处并提供原作者链接。 许可协议。转载请注明出处!