真实环境一般不会就几条数据,因此本文采用十几万条数据进行测试,因为使用数据库麻烦,因此数据从文本文件中读取。数据文件点击下载:数据文件 下载完毕后放maven项目下的resource里面。
从文件中读取数据使用到了额外的依赖:
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.8.0</version>
</dependency>
创建一个实体类:
@Data
public class Product {
int id;
String name;
String category;
float price;
String place;
String code;
}
创建一个工具类,用来从文件读取数据,并将数据转换为实体类对应的对象:(下载的数据文件放maven项目的resource下面)
public class ProductUtil {
public static void main(String[] args) throws IOException, InterruptedException, AWTException {
String fileName = "140k_products.txt";
InputStream inputStream = ProductUtil.class.getClassLoader().getResourceAsStream(fileName);
// InputStream inputStream = ProductUtil.class.getResourceAsStream("/"+fileName);
List<Product> products = file2list(inputStream);
System.out.println(products.size());
}
public static List<Product> file2list(InputStream inputStream) throws IOException {
List<String> lines = IOUtils.readLines(inputStream, Charsets.toCharset("UTF-8"));
List<Product> products = new ArrayList<>();
for (String line : lines) {
Product p = line2product(line);
products.add(p);
}
return products;
}
private static Product line2product(String line) {
Product p = new Product();
String[] fields = line.split(",");
p.setId(Integer.parseInt(fields[0]));
p.setName(fields[1]);
p.setCategory(fields[2]);
p.setPrice(Float.parseFloat(fields[3]));
p.setPlace(fields[4]);
p.setCode(fields[5]);
return p;
}
}
点击运行上面的代码,查看能否正常读取文件中的数据,如果可以,则继续编写之后的代码:
public class TestLuceneFor14 {
public static void main(String[] args) throws Exception {
// 1. 准备中文分词器
Analyzer analyzer = new SmartChineseAnalyzer();
// 2. 索引
Directory index = createIndex(analyzer);
// 3. 查询器
Scanner s = new Scanner(System.in);
while (true) {
System.out.print("请输入查询关键字:");
String keyword = s.nextLine();
System.out.println("当前关键字是:" + keyword);
Query query = new QueryParser("name", analyzer).parse(keyword);
// 4. 搜索
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
int numberPerPage = 10;
ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
// 5. 显示查询结果
showSearchResults(searcher, hits, query, analyzer);
// 6. 关闭查询
reader.close();
}
}
private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, Analyzer analyzer) throws Exception {
System.out.println("找到 " + hits.length + " 个命中.");
SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
System.out.println("找到 " + hits.length + " 个命中.");
System.out.println("序号\t匹配度得分\t结果");
for (int i = 0; i < hits.length; ++i) {
ScoreDoc scoreDoc = hits[i];
int docId = scoreDoc.doc;
Document d = searcher.doc(docId);
List<IndexableField> fields = d.getFields();
System.out.print((i + 1));
System.out.print("\t" + scoreDoc.score);
for (IndexableField f : fields) {
if ("name".equals(f.name())) {
TokenStream tokenStream = analyzer.tokenStream(f.name(), new StringReader(d.get(f.name())));
String fieldContent = highlighter.getBestFragment(tokenStream, d.get(f.name()));
System.out.print("\t" + fieldContent);
} else {
System.out.print("\t" + d.get(f.name()));
}
}
System.out.println("<br>");
}
}
private static Directory createIndex(Analyzer analyzer) throws IOException {
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(index, config);
String fileName = "140k_products.txt";
InputStream inputStream = ProductUtil.class.getClassLoader().getResourceAsStream(fileName);
// InputStream inputStream = ProductUtil.class.getResourceAsStream("/"+fileName);
List<Product> products = file2list(inputStream);
int total = products.size();
int count = 0;
int per = 0;
int oldPer = 0;
for (Product p : products) {
addDoc(writer, p);
count++;
per = count * 100 / total;
if (per != oldPer) {
oldPer = per;
System.out.printf("索引中,总共要添加 %d 条记录,当前添加进度是: %d%% %n", total, per);
}
}
writer.close();
return index;
}
private static void addDoc(IndexWriter w, Product p) throws IOException {
Document doc = new Document();
doc.add(new TextField("id", String.valueOf(p.getId()), Field.Store.YES));
doc.add(new TextField("name", p.getName(), Field.Store.YES));
doc.add(new TextField("category", p.getCategory(), Field.Store.YES));
doc.add(new TextField("price", String.valueOf(p.getPrice()), Field.Store.YES));
doc.add(new TextField("place", p.getPlace(), Field.Store.YES));
doc.add(new TextField("code", p.getCode(), Field.Store.YES));
w.addDocument(doc);
}
}
然后运行测试效果。运行截图:
Q.E.D.