# Corpus with example sentences corpus = ['A man is eating food.', 'A man is eating a piece of bread.', 'The girl is carrying a baby.', 'A man is riding a horse.', 'A woman is playing violin.', 'Two men pushed carts through the woods.', 'A man is riding a white horse on an enclosed ground.', 'A monkey is playing drums.', 'A cheetah is running behind its prey.' ] corpus_embeddings = embedder.encode(corpus, convert_to_tensor=True)
# Query sentences: queries = ['A man is eating pasta.', 'Someone in a gorilla costume is playing a set of drums.', 'A cheetah chases prey on across a field.']
# Find the closest 5 sentences of the corpus for each query sentence based on cosine similarity top_k = min(5, len(corpus)) for query in queries: query_embedding = embedder.encode(query, convert_to_tensor=True)
# We use cosine-similarity and torch.topk to find the highest 5 scores cos_scores = util.cos_sim(query_embedding, corpus_embeddings)[0] top_results = torch.topk(cos_scores, k=top_k)
print("\n\n======================\n\n") print("Query:", query) print("\nTop 5 most similar sentences in corpus:")
for score, idx inzip(top_results[0], top_results[1]): print(corpus[idx], "(Score: {:.4f})".format(score))
这里贴出第一个查询的输出。
1 2 3 4 5 6 7 8
Query: A man is eating pasta.
Top 5 most similar sentences in corpus: A man is eating food. (Score: 0.6734) A man is eating a piece of bread. (Score: 0.4269) A man is riding a horse. (Score: 0.2086) A man is riding a white horse on an enclosed ground. (Score: 0.1020) A cheetah is running behind its prey. (Score: 0.0566)
排名最高的两句都是关于“吃”的,看来模型确实一定程度上识别到了句子的含义。把”eating food”与”eating pasta”看作从属关系的话,”eating a piece of bread”与原始查询为并列关系,得分低一点也是合理的。