在实际的nlp实际任务中,你有一大堆的人工标注的关键词,来新的一句话,找出这句话中的关键词,以便你以后使用,那如何来做呢? 1)用到正则的 finditer()方法,返回你匹配的关键词的迭代对象,包含起始结束索引 2)增强list循环,提取数据 代码如下: import re s = 'dengyexun' idx = [i.start() for i in re.finditer('y', s)] 这里我只要开始索引,结果如下: 之后,你想怎么用都可以的
一行搞定-统计一句话中每个单词出现的个数 >>> s'i am a boy a bood boy a bad boy' 方式一:>>> dict([(i,s.split().count(i)) for i in s.split()]){'a': 3, 'boy': 3, 'i': 1, 'am': 1, 'bad': 1, 'bood': 1} >>> set([(i,s.split().count(i)) for i in s.split()])se
//在两个数成对出现的数组中找到一个单独的数.比如{1,2,3.3,1,4.2},即找出4 #include <stdio.h> int find(int arr[], int len) { int i = 0; int ret = 0; for (i = 0; i < len; i++) { ret = ret^arr[i]; } return ret; } int main() { int arr1[] = { 1, 2, 2, 3, 1, 5, 3 }; int arr2[] =
Description The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated. As an IBM researcher, you have been tas
Given a string s and a non-empty string p, find all the start indices of p's anagrams in s. Strings consists of lowercase English letters only and the length of both strings s and p will not be larger than 20,100. The order of output does not matter.