Mongodb 笔记 - 性能及Java代码

性能

以下数据都是在千兆网络下测试的结果

写入

数据量的增大会导致内存占满, 因为mongodb会将数据尽可能地载入内存, 索引占用的空间也很可观
非安全模式下, 速度取决于内存是否占满能差一个数量级, 占满时大概1~2MB/s, 未占满时大于20MB/s
安全模式下, 速度也取决于内存是否占满, 但是波动较小. 占满时为非安全模式的一半不到, 约1MB/s, 未占满时有7~8MB/s
批量写入和单个写入速度没区别, 主要受IO速度限制 -- 如果考虑驱动带来的通信时间, 在大量写入时还是推荐使用批量写入
分片和单机的性能差别不大, 在安全模式下分片的性能还更低一点

Update 2019-07-19: 在实际测试中, MongoDB4.0批量写入基本上在10MB/s这个速率以下, 在7~10.xMB/s之间波动. 网卡是千兆网卡. 在同一环境下, 使用 mongodump和mongorestore通过pipe进行数据迁移时能到60MB/s.

查询: 单索引无排序

单机返回能达到80MB/s
分片的话性能差一点, 差不多一半40~50MB/s
查询性能和数据量基本无关

查询: 双索引无排序

平均性能和单索引基本一致, 波动大一些
分片和单机性能基本一致

查询: 单索引有排序

单机性能大概是无排序的80%, 返回是60MB/s左右, 在数据量过亿后会下降到一半左右
分片性能差, 差两个数量级, 应该是增加了聚合排序处理的结果

结论

MongoDB读性能远高于写性能
并发写入能提升写性能, 并发建议控制在64以内, 再高的并发应当采取队列, 对于存在突发写入需求的业务, 前面要加队列
单个数据库的大小应当控制在200G以内, 不要超过300G. 过大的数据库尺寸会严重影响性能 -- 一旦有写操作, 会让读性能快速下降
尽量不要用排序
如果数据量没有大到非分片不可的情况, 尽量不要用分片, 只用replica set.

备份和恢复

有dump+restore 和 export+import, 备份和恢复一般用前者, 最小粒度是collection(一个表), 速度较快并且不容易丢数据.

对于备份过程中产生的数据偏移, 和mysql的锁库处理不一样, mongodb不锁库, 而是在备份结束后再提供一个备份过程中产生的oplog, 这个文件记录了从备份开始到结束过程中的数据变化日志, 保证数据的snapshot是在在备份结束这个时间点.

使用

Insert

public static void main(String[] args) throws IOException {
        MongoClient mongoClient = new MongoClient("localhost", 27017);
        DB db = mongoClient.getDB("mydb");
        DBCollection coll = db.getCollection("questionsCollection");
        mongoClient.setWriteConcern(WriteConcern.JOURNALED);
        GIFTParser p = new GIFTParser();
        BasicDBObject doc = null;
        for (Question q : p.parserGIFT("Data/questionsGIFT")) {
                doc = new BasicDBObject("category", q.getCategory())
                                .append("question", q.getText())
                                .append("correctanswer", q.getCorrectAnswer())
                                .append("wrongAnswers",q.getWrongAnswers());
                coll.insert(doc);
        }
 
        DBCursor cursor = coll.find();
        try {
                while(cursor.hasNext()) {
                        System.out.println(cursor.next());
                }
        } finally {
                cursor.close();
        }
}

写操作确认级别

因为Mongodb默认是直接写入内存, 在一些重要的业务数据上为了保证数据已经持久化, 需要配置合适的确认级别. 有两种实现途径: 一种是每次写操作时, 使用

coll.insert(dbObj, WriteConcern.ACKNOWLEDGED);

另一种是在创建MongoClient的时候设置, 这样所有的写操作默认都是这个属性.

MongoClientOptions.Builder builder = new MongoClientOptions.Builder();
builder.writeConcern(WriteConcern.JOURNALED);
MongoClient mongoClient = new MongoClient(
new ServerAddress("localhost"), builder.build());

上面这样添加了 ACKNOWLEDGED (这是服务器配置的默认的安全写入确认级别, 或者其他指定的确认级别( MAJORITY, JOURNALED, W1, W2, W3), 调用insert后只要不抛异常, 就可以认为写入成功.

The return value and exception both exist for different reasons. If you don't do a safe WriteConcern, the method would never throw an exception, and you can use the WriteResult.getLastError() to determine if it was successful or not. Similarly, if you use WriteConcern.ACKNOWLEDGED, and the write succeeds, WriteResult will have useful information on it such as the number of records that were written(except insert). That said, if you're using WriteConcern.ACKNOWLEDGED and an exception is not thrown, the data was inserted.

另外, 如果不使用WriteConcern, 那么可以使用 WriteResult.getLastError() 来判断写操作是否成功.

写入操作的getN()返回都是0

和update, remove不同, insert的结果中getN()为0, 官方jira是这样解释的: "'n' in this case represents the number of documents matched by the query (for update and remove), and there is no query for insert, so it's always 0".

Upsert参数

collection.update()可以使用 upsert参数, 实现 InsertIfNotExist + UpdateIfExist 的功能.

Java下操作MongoDB

主要是两个途径, 一个是mongodb官方提供的mongo-java-driver, 因为其操作方式比较原始, 需要自己将POJO转换为BasicDBObject, 除非在项目中需要灵活操作多个db和collection, 否则不建议使用这个途径. 项目中一般会用另一个, Spring Data提供的MongoRepository.

参考: https://docs.spring.io/spring-data/mongodb/docs/1.2.0.RELEASE/reference/html/mongo.repositories.html

pom依赖

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>

配置文件

#Local MongoDB config
spring.data.mongodb.authentication-database=admin
spring.data.mongodb.username=root
spring.data.mongodb.password=root
spring.data.mongodb.database=user_db
spring.data.mongodb.port=27017
spring.data.mongodb.host=localhost
 
# App config
server.port=8102
spring.application.name=BootMongo
server.context-path=/user

创建Repository

public interface CustomerRepository extends MongoRepository<Customer, String> {
 
    public Customer findByFirstName(String firstName);
    public List<Customer> findByLastName(String lastName);
 
}

以及

public interface PersonRepository extends MongoRepository<Person, String>
  @Query(value="{ 'firstname' : ?0 }", fields="{ 'firstname' : 1, 'lastname' : 1}")
  List<Person> findByThePersonsFirstname(String firstname);
}

以及

public interface CustomRepository extends MongoRepository<PracticeQuestion, String> {
    @Query(value = "{ 'userId' : ?0, 'questions.questionID' : ?1 }", fields = "{ 'questions.questionID' : 1 }")
    List<PracticeQuestion> findByUserIdAndQuestionsQuestionID(int userId, int questionID);
}

使用

@Autowired
private CustomerRepository repository;
 
...
 
repository.deleteAll();
 
// save a couple of customers
repository.save(new Customer("Alice", "Smith"));
repository.save(new Customer("Bob", "Smith"));
 
// fetch all customers
System.out.println("Customers found with findAll():");
System.out.println("-------------------------------");
for (Customer customer : repository.findAll()) {
	System.out.println(customer);
}
System.out.println();
 
// fetch an individual customer
System.out.println("Customer found with findByFirstName('Alice'):");
System.out.println("--------------------------------");
System.out.println(repository.findByFirstName("Alice"));
 
System.out.println("Customers found with findByLastName('Smith'):");
System.out.println("--------------------------------");
for (Customer customer : repository.findByLastName("Smith")) {
	System.out.println(customer);
}

QPerson person = new QPerson("person");
List<Person> result = repository.findAll(person.address.zipCode.eq("C0123"));
Page<Person> page = repository.findAll(person.lastname.contains("a"), new PageRequest(0, 2, Direction.ASC, "lastname"));

Java下操作MongoDB(2) 通过MongoTemplate进行数据操作

配置文件 application.yml

...
spring:
  application:
    name: demo-commons
  data:
    mongodb:
      uri: @mongo.uri@
...

这里的mongo.url 由maven build的时候赋值, 取值为 mongodb://ip:port/database, 例如 mongodb://192.168.1.11/demo_db

AppConfiguration.java

import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.mongodb.MongoDbFactory;
import org.springframework.data.mongodb.core.MongoTemplate;
import org.springframework.data.mongodb.core.convert.DbRefResolver;
import org.springframework.data.mongodb.core.convert.DefaultDbRefResolver;
import org.springframework.data.mongodb.core.convert.DefaultMongoTypeMapper;
import org.springframework.data.mongodb.core.convert.MappingMongoConverter;
import org.springframework.data.mongodb.core.mapping.MongoMappingContext;
 
@Configuration
public class AppConfiguration {
 
    @Bean
    public MongoTemplate mongoTemplate(MongoDbFactory mongoDbFactory, MongoMappingContext mongoMappingContext) {
        DbRefResolver dbRefResolver = new DefaultDbRefResolver(mongoDbFactory);
        MappingMongoConverter converter = new MappingMongoConverter(dbRefResolver, mongoMappingContext);
        converter.setTypeMapper(new DefaultMongoTypeMapper(null));
        return new MongoTemplate(mongoDbFactory, converter);
    }
}

通过注解, 将_class键删除

ServiceApplication.java

@EnableEurekaClient
@SpringBootApplication
public class ServiceImplApplication {
 
    public static void main(String[] args) {
        SpringApplication.run(ServiceImplApplication.class, args);
    }
 
}

调用mongoTemplate进行db操作

    @Autowired
    private MongoTemplate mongoTemplate;
...
 
    @Override
    @RequestMapping(value = "/add", method = RequestMethod.POST)
    public SessionTraceDTO add(@RequestBody SessionTraceDTO dto) {
        return mongoTemplate.save(dto, "session_trace");
    }
...