大语言模型-4.LLM基础能力实现

书生浦语大模型实战营学习笔记-2.LLM基础能力实现

本文包括第二期实战营的第2课内容。本来是想在笔记中给官方教程做做补充的，没想到官方教程的质量还是相当高的，跟着一步一步做没啥坑。所以这篇笔记主要学习一下官方Demo中的一些代码等细节内容。

本文标题中大语言模型系列博客是笔者在学习大语言模型时做的博客；书生浦语大模型实战营学习笔记是笔者在参加书生浦语大模型第二期实战营做的学习笔记。

大语言模型的对话能力实现：以InternLM2-Chat-1.8B的官方Demo为例

我们来看看官方Demo的代码。首先它导入了一些必要的库，并创建变量存储了模型位置：

import torch

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name_or_path = "/root/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b"

然后使用了AutoTokenizer 和AutoModelForCausalLM。Hugging Face 的 AutoTokenizer 和 AutoModelForCausalLM 类熟悉大模型的不会陌生，用于自动加载预训练模型和相应的tokenizer。

加载模型时设置了相信远端代码以便从HuggingFace拉取确实模型权重，使用bf16量化节省内存，指定使用第一张显卡。同时使用model.eval()来取消梯度计算。

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True, device_map='cuda:0')

model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='cuda:0')

model = model.eval()

下面就是核心业务了设置system_prompt、接收input、调用model.stream_chat()：

system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).

- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.

- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.

"""

messages = [(system_prompt, '')]

print("=============Welcome to InternLM chatbot, type 'exit' to exit.=============")

while True:

    input_text = input("\nUser  >>> ")

    input_text = input_text.replace(' ', '')  # 移除用户输入文本中的空格

    if input_text == "exit":  # 如果要退出，输入exit即可

        break

    length = 0

	# 对模型的 stream_chat 方法进行迭代，该方法会生成一个对话的生成器。迭代过程中，每次生成一个回复消息 response 和一个占位符 _。

    for response, _ in model.stream_chat(tokenizer, input_text, messages):

		# 如果回复消息不为空，则打印回复消息中从上次打印位置 length 开始到结尾的部分，并刷新输出缓冲区。

        if response is not None:

            print(response[length:], flush=True, end="")

			# 更新上次打印的位置，以便下一次打印时从正确位置开始。

            length = len(response)

所以，大模型对话能力的核心就是通过调用model.stream_chat()实现的。

模型运行结果为：

多模态模型的视觉问答能力实现

实现视觉问答和实现对话并没有什么不同。只是调用的API从model.stream_chat()更改为model.chat()。下面是具体代码分析。

首先初始化模型和tokenizer，步骤和之前一样的。这里额外添加了根据args.dtype确定模型加载精度的设置。通过设置半精度可以降低模型推理的资源消耗，是十分必要的。

# init model and tokenizer

model = AutoModel.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True).eval()

if args.dtype == 'fp16':

    model.half().cuda()

elif args.dtype == 'fp32':

    model.cuda()

if args.num_gpus > 1:

    from accelerate import dispatch_model

    device_map = auto_configure_device_map(args.num_gpus)

    model = dispatch_model(model, device_map=device_map)

tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2-vl-7b', trust_remote_code=True)

然后指定你输入的文字和图片，并调用model.chat方法：

text = '<ImageHere>Please describe this image in detail.'

image = 'examples/image1.webp'

with torch.cuda.amp.autocast():

    with torch.no_grad():

        # 实际上，最关键的也就这一行了：调用包装好的`model.chat`方法

        response, _ = model.chat(tokenizer, query=text, image=image, history=[], do_sample=False)

print(response)

然后就能得到运行结果了。

多模态模型的图文生成能力实现

要生成一篇文字夹杂图片的文章对于当今模型来说其实是个大工程，内部分了很多步，但大致流程是：

短文生长文
生成适合插入配图的标题
图片标题生图
根据生成的4张图像选择一张图
图文合并

其中，1-5步全都需要使用模型进行推理。所以，这个demo本身在工程上就是有工作量的。它演示了如何使用一个文生图文的大模型生成一篇图文夹杂文章的工作流。

Demo中部分代码如下。与生成文章无关的代码已删除以确保易于理解。

class ImageProcessor:

    """用于对图片进行预处理.包括resize和normalize."""

    def __init__(self, image_size=224):

        mean = (0.48145466, 0.4578275, 0.40821073)

        std = (0.26862954, 0.26130258, 0.27577711)

        self.normalize = transforms.Normalize(mean, std)

        self.transform = transforms.Compose([

            transforms.Resize((image_size, image_size),

                              interpolation=InterpolationMode.BICUBIC),

            transforms.ToTensor(),

            self.normalize,

        ])

    def __call__(self, item):

        if isinstance(item, str):

            item = Image.open(item).convert('RGB')

        return self.transform(item)

class Demo_UI:

    """用于生成文章的UI界面."""

    def __init__(self, code_path, num_gpus=1):

        self.code_path = code_path

        self.reset()

        tokenizer = AutoTokenizer.from_pretrained(code_path, trust_remote_code=True)

        self.model = AutoModelForCausalLM.from_pretrained(code_path, device_map='cuda', trust_remote_code=True).half().eval()

        self.model.tokenizer = tokenizer

        self.model.vit.resize_pos()

        self.vis_processor = ImageProcessor()

        stop_words_ids = [92397]

        #stop_words_ids = [92542]

        self.stopping_criteria = get_stopping_criteria(stop_words_ids)

        set_random_seed(1234)

        self.r2 = re.compile(r'<Seg[0-9]*>')

        self.withmeta = False

        self.database = Database()

    def text2instruction(self, text):

        """

        将文本转换为instruction.如果withmeta为True,则添加meta信息.

        Args:

            text: 文本内容.

        Returns:

            instruction.如f"[UNUSED_TOKEN_146]user\n{text}[UNUSED_TOKEN_145]\n[UNUSED_TOKEN_146]assistant\n"

        """

        if self.withmeta:

            return f"[UNUSED_TOKEN_146]system\n{meta_instruction}[UNUSED_TOKEN_145]\n[UNUSED_TOKEN_146]user\n{text}[UNUSED_TOKEN_145]\n[UNUSED_TOKEN_146]assistant\n"

        else:

            return f"[UNUSED_TOKEN_146]user\n{text}[UNUSED_TOKEN_145]\n[UNUSED_TOKEN_146]assistant\n"

    def generate(self, text, random, beam, max_length, repetition):

        """生成文章."""

        with torch.no_grad():

            with torch.cuda.amp.autocast():  # use mixed precision

                input_ids = self.model.tokenizer(text, return_tensors="pt")['input_ids']  # tokenize the input text

                len_input_tokens = len(input_ids[0])  # get the length of the input tokens

                generate = self.model.generate(input_ids.cuda(),

                                                do_sample=random,

                                                num_beams=beam,

                                                temperature=1.,

                                                repetition_penalty=float(repetition),

                                                stopping_criteria=self.stopping_criteria,

                                                max_new_tokens=max_length,

                                                top_p=0.8,

                                                top_k=40,

                                                length_penalty=1.0)

        response = generate[0].tolist()

        response = response[len_input_tokens:]

        response = self.model.tokenizer.decode(response, skip_special_tokens=True)  # decode the response

        response = response.replace('[UNUSED_TOKEN_145]', '')  # remove the special tokens

        response = response.replace('[UNUSED_TOKEN_146]', '')  # remove the special tokens

        return response

    def generate_with_emb(self, emb, random, beam, max_length, repetition, im_mask=None):

        with torch.no_grad():

            with torch.cuda.amp.autocast():

                generate = self.model.generate(inputs_embeds=emb,

                                                do_sample=random,

                                                num_beams=beam,

                                                temperature=1.,

                                                repetition_penalty=float(repetition),

                                                stopping_criteria=self.stopping_criteria,

                                                max_new_tokens=max_length,

                                                top_p=0.8,

                                                top_k=40,

                                                length_penalty=1.0,

                                                im_mask=im_mask)

        response = generate[0].tolist()

        response = self.model.tokenizer.decode(response, skip_special_tokens=True)

        response = response.replace('[UNUSED_TOKEN_145]', '')

        response = response.replace('[UNUSED_TOKEN_146]', '')

        return response

    def extract_imgfeat(self, img_paths):

        """提取图片特征."""

        if len(img_paths) == 0:

            return None

        images = []

        for j in range(len(img_paths)):

            image = self.vis_processor(img_paths[j])  # 调用ImageProcessor对图片进行预处理

            images.append(image)

        images = torch.stack(images, dim=0)

        with torch.no_grad():

            with torch.cuda.amp.autocast():

                img_embeds = self.model.encode_img(images)  # 提取图片特征。这是自带的方法。

        return img_embeds

    def generate_loc(self, text_sections, upimages, image_num):

        """生成插入图片的位置.

        Args:

            text_sections: 文本内容.

            upimages: 图片.

            image_num: 图片数量.

        Returns:

            适合插入图像的行和插入图片的位置.

        """

        full_txt = ''.join(text_sections)

        input_text = '<image> ' * len(upimages) + f'给定文章"{full_txt}" 根据上述文章，选择适合插入图像的{image_num}行'

        instruction = self.text2instruction(input_text) + '适合插入图像的行是'

        print(instruction)  # 打印instruction

        if len(upimages) > 0:

            img_embeds = self.extract_imgfeat(upimages)

            input_embeds, im_mask, _ = self.interleav_wrap(instruction, img_embeds)  # 调用interleav_wrap方法

            output_text = self.generate_with_emb(input_embeds, True, 1, 200, 1.005, im_mask=im_mask)

        else:

            # 如果没有图片，直接生成适合插入图像的行

            output_text = self.generate(instruction, True, 1, 200, 1.005)

        inject_text = '适合插入图像的行是' + output_text

        print(inject_text)

        locs = [int(m[4:-1]) for m in self.r2.findall(inject_text)]  # 提取插入图片的位置

        print(locs)

        return inject_text, locs

    def generate_cap(self, text_sections, pos, progress):

        """生成图片标题.通过使用self.generate方法通过prompt生成图片标题.

        Args:

            text_sections: 文本内容.

            pos: 图片位置.

        Returns:

            图片标题.

        """

        pasts = ''

        caps = {}

        for idx, po in progress.tqdm(enumerate(pos), desc="image captioning"):  # 遍历图片位置

            full_txt = ''.join(text_sections[:po + 2])

            if idx > 0:

                past = pasts[:-2] + '。'

            else:

                past = pasts

            #input_text = f' <|User|>: 给定文章"{full_txt}" {past}给出适合在<Seg{po}>后插入的图像对应的标题。' + ' \n<TOKENS_UNUSED_0> <|Bot|>: 标题是"'

            input_text = f'给定文章"{full_txt}" {past}给出适合在<Seg{po}>后插入的图像对应的标题。'

            instruction = self.text2instruction(input_text) + '标题是"'

            print(instruction)

            cap_text = self.generate(instruction, True, 1, 200, 1.005)  # 生成图像的标题

            cap_text = cap_text.split('"')[0].strip()

            print(cap_text)

            caps[po] = cap_text  # po是图片位置，cap_text是图片标题

            if idx == 0:

                pasts = f'现在<Seg{po}>后插入图像对应的标题是"{cap_text}"， '

            else:

                pasts += f'<Seg{po}>后插入图像对应的标题是"{cap_text}"， '

        print(caps)

        return caps

    def interleav_wrap(self, text, image, max_length=4096):

        """

        将文本和图像交织在一起.\n

        通过tokenizer将文本转换为tokens，然后获取tokens的embeddings.

        再将图像的embeddings和文本的embeddings拼接(torch.cat dim=1)在一起.

        """

        device = image.device

        im_len = image.shape[1]

        image_nums = len(image)

        parts = text.split('<image>')

        wrap_embeds, wrap_im_mask = [], []

        temp_len = 0

        need_bos = True

        for idx, part in enumerate(parts):

            if len(part) > 0:

                # tokenize the text

                part_tokens = self.model.tokenizer(part,

                                                    return_tensors='pt',

                                                    padding='longest',

                                                    add_special_tokens=need_bos).to(device)

                if need_bos:

                    need_bos = False

                # get the embeddings of the tokens

                part_embeds = self.model.model.tok_embeddings(part_tokens.input_ids)

                wrap_embeds.append(part_embeds)

                wrap_im_mask.append(torch.zeros(part_embeds.shape[:2]))

                temp_len += part_embeds.shape[1]

            if idx < image_nums:

                wrap_embeds.append(image[idx].unsqueeze(0))

                wrap_im_mask.append(torch.ones(1, image[idx].shape[0]))

                temp_len += im_len

            if temp_len > max_length:

                break

        wrap_embeds = torch.cat(wrap_embeds, dim=1)

        wrap_im_mask = torch.cat(wrap_im_mask, dim=1)

        wrap_embeds = wrap_embeds[:, :max_length].to(device)

        wrap_im_mask = wrap_im_mask[:, :max_length].to(device).bool()

        return wrap_embeds, wrap_im_mask, temp_len

    def model_select_image(self, output_text, locs, images_paths, progress):

        """让模型自己选择图片.通过使用self.model.generate方法生成图片标题."""

        print('model_select_image')

        pre_text = ''

        pre_img = []

        pre_text_list = []

        ans2idx = {'A': 0, 'B': 1, 'C': 2, 'D': 3}

        selected = {k: 0 for k in locs}

        for i, text in enumerate(output_text):

            pre_text += text + '\n'

            if i in locs:

                images = copy.deepcopy(pre_img)

                for j in range(len(images_paths[i])):

                    image = self.vis_processor(images_paths[i][j])

                    images.append(image)

                images = torch.stack(images, dim=0)

                pre_text_list.append(pre_text)

                pre_text = ''

                images = images.cuda()

                text = '根据给定上下文和候选图像，选择合适的配图：' + '<image>'.join(pre_text_list) + '候选图像包括: ' + '\n'.join([chr(ord('A') + j) + '.<image>' for j in range(len(images_paths[i]))])

                input_text = self.text2instruction(text) + '最合适的图是'

                print(input_text)

                with torch.no_grad():

                    with torch.cuda.amp.autocast():

                        img_embeds = self.model.encode_img(images)

                        input_embeds, im_mask, len_input_tokens = self.interleav_wrap(input_text, img_embeds)

                with torch.no_grad():

                    outputs = self.model.generate(

                                            inputs_embeds=input_embeds,

                                            do_sample=True,

                                            temperature=1.,

                                            max_new_tokens=10,

                                            repetition_penalty=1.005,

                                            top_p=0.8,

                                            top_k=40,

                                            length_penalty=1.0,

                                            im_mask=im_mask

                                            )

                response = outputs[0][2:].tolist()   #<s>: C

                #print(response)

                out_text = self.model.tokenizer.decode(response, add_special_tokens=True)

                print(out_text)

                try:

                    # 卧槽这里好草率，直接取了第一个字符，并且只要第一个字符不在里面就直接选第一张图了

                    answer = out_text.lstrip()[0]  # get the first character

                    pre_img.append(images[len(pre_img) + ans2idx[answer]].cpu())

                except:

                    print('Select fail, use first image')

                    answer = 'A'

                    pre_img.append(images[len(pre_img) + ans2idx[answer]].cpu())

                selected[i] = ans2idx[answer]

        return selected

    def model_select_imagebase(self, output_text, locs, imagebase, progress):

        """让模型自己选择图片.通过使用self.model.generate方法生成图片标题.和另一个方法没啥区别"""

        print('model_select_imagebase')

        pre_text = ''

        pre_img = []

        pre_text_list = []

        selected = []

        images = []

        for j in range(len(imagebase)):

            image = self.vis_processor(imagebase[j])

            images.append(image)

        images = torch.stack(images, dim=0).cuda()

        with torch.no_grad():

            with torch.cuda.amp.autocast():

                img_embeds = self.model.encode_img(images)

        for i, text in enumerate(output_text):

            pre_text += text + '\n'

            if i in locs:

                pre_text_list.append(pre_text)

                pre_text = ''

                print(img_embeds.shape)

                cand_embeds = torch.stack([item for j, item in enumerate(img_embeds) if j not in selected], dim=0)

                ans2idx = {}

                count = 0

                for j in range(len(img_embeds)):

                    if j not in selected:

                        ans2idx[chr(ord('A') + count)] = j

                        count += 1

                if cand_embeds.shape[0] > 1:

                    text = '根据给定上下文和候选图像，选择合适的配图：' + '<image>'.join(pre_text_list) + '候选图像包括: ' + '\n'.join([chr(ord('A') + j) + '.<image>' for j in range(len(cand_embeds))])

                    input_text = self.text2instruction(text) + '最合适的图是'

                    print(input_text)

                    all_img = cand_embeds if len(pre_img) == 0 else torch.cat(pre_img + [cand_embeds], dim=0)

                    input_embeds, im_mask, len_input_tokens = self.interleav_wrap(input_text, all_img)

                    with torch.no_grad():

                        outputs = self.model.generate(

                                                inputs_embeds=input_embeds,

                                                do_sample=True,

                                                temperature=1.,

                                                max_new_tokens=10,

                                                repetition_penalty=1.005,

                                                top_p=0.8,

                                                top_k=40,

                                                length_penalty=1.0,

                                                im_mask=im_mask

                                                )

                    response = outputs[0][2:].tolist()   #<s>: C

                    #print(response)

                    out_text = self.model.tokenizer.decode(response, add_special_tokens=True)

                    print(out_text)

                    try:

                        answer = out_text.lstrip()[0]

                    except:

                        print('Select fail, use first image')

                        answer = 'A'

                else:

                    answer = 'A'

                pre_img.append(img_embeds[ans2idx[answer]].unsqueeze(0))

                selected.append(ans2idx[answer])

        selected = {loc: j for loc, j in zip(locs, selected)}

        print(selected)

        return selected

    def show_article(self, show_cap=False):

        """展示文章.主要是操作UI组件."""

        md_shows = []

        imgs_show = []

        edit_bts = []

        for i in range(len(self.texts_imgs)):

            text, img = self.texts_imgs[i]

            md_shows.append(gr.Markdown(visible=True, value=text))

            edit_bts.append(gr.Button(visible=True, interactive=True, ))

            imgs_show.append(gr.Image(visible=False) if img is None else gr.Image(visible=True, value=img.paths[img.pts]))

        print(f'show {len(md_shows)} text sections')

        for _ in range(max_section - len(self.texts_imgs)):

            md_shows.append(gr.Markdown(visible=False, value=''))

            edit_bts.append(gr.Button(visible=False))

            imgs_show.append(gr.Image(visible=False))

        return md_shows + edit_bts + imgs_show

    def generate_article(self, instruction, upimages, beam, repetition, max_length, random, seed):

        """生成文章."""

        self.reset()

        set_random_seed(int(seed))

        self.hash_folder = hashlib.sha256(instruction.encode()).hexdigest()

        self.instruction = instruction

        if upimages is None:

            upimages = []

        else:

            upimages = [t.image.path for t in upimages.root]

        img_instruction = '<image> ' * len(upimages)

        instruction = img_instruction.strip() + instruction  # add the image instruction

        text = self.text2instruction(instruction)  # convert the text to instruction

        print('random generate:{}'.format(random))

        if article_stream_output:

            if len(upimages) == 0:

                input_ids = self.model.tokenizer(text, return_tensors="pt")['input_ids']

                input_embeds = self.model.model.tok_embeddings(input_ids.cuda())  # get the embeddings of the tokens

                im_mask = None

            else:

                images = []

                for j in range(len(upimages)):

                    image = self.vis_processor(upimages[j])  # 调用ImageProcessor对图片进行预处理

                    images.append(image)

                images = torch.stack(images, dim=0)

                with torch.no_grad():

                    with torch.cuda.amp.autocast():

                        img_embeds = self.model.encode_img(images)  # 提取图片特征。这是自带的方法。

                text = self.text2instruction(instruction)  # convert the text to instruction

                input_embeds, im_mask, len_input_tokens = self.interleav_wrap(text, img_embeds)  # 调用interleav_wrap方法交织文本与图像

            print(text)

            generate_params = dict(

                inputs_embeds=input_embeds,

                do_sample=random,

                stopping_criteria=self.stopping_criteria,

                repetition_penalty=float(repetition),

                max_new_tokens=max_length,

                top_p=0.8,

                top_k=40,

                length_penalty=1.0,

                im_mask=im_mask,

            )

            output_text = "▌"

            with self.generate_with_streaming(**generate_params) as generator:

                # 后面都在操作UI组件，不再看了，就Review到这里

                for output in generator:

                    decoded_output = self.model.tokenizer.decode(output[1:])

                    if output[-1] in [self.model.tokenizer.eos_token_id, 92542]:

                        break

                    output_text = decoded_output.replace('\n', '\n\n') + "▌"

                    yield (output_text,) + (gr.Markdown(visible=False),) * (max_section - 1) + (

                            gr.Button(visible=False),) * max_section + (gr.Image(visible=False),) * max_section

                    time.sleep(0.01)

            output_text = output_text[:-1]

            yield (output_text,) + (gr.Markdown(visible=False),) * (max_section - 1) + (

                            gr.Button(visible=False),) * max_section + (gr.Image(visible=False),) * max_section

        else:

            output_text = self.generate(text, random, beam, max_length, repetition)

        output_text = re.sub(r'(\n\s*)+', '\n', output_text.strip())

        print(output_text)

        output_text = output_text.split('\n')[:max_section]

        self.texts_imgs = [[t, None] for t in output_text]

        self.database.addtitle(text, self.hash_folder, params={'beam':beam, 'repetition':repetition, 'max_length':max_length, 'random':random, 'seed':seed})

        if article_stream_output:

            yield self.show_article()

        else:

            return self.show_article()

实际操作时，通过UI调用生成文章的函数，就能得到想要的文章了。

实战

实战部分可以参考博客书生浦语大模型实战营第二期第二节作业的对话Demo：InternLM2-Chat-1.8B 智能对话（使用 InternLM2-Chat-1.8B 模型生成 300 字的小故事）这一小节当中的内容，写得非常详细。这一小节中涉及了搭建环境、下载模型、模型推理3步，而这3步就是当今体验大模型Demo的三步了。虽然教程是基于InternLM2的，但是实际上不管的同义千问还是ChatGLM，步骤都是完全一样的，不一样的只是一些细微的细节。

LLM基础能力实现-书生浦语大模型实战营学习笔记2&大语言模型4的更多相关文章

C语言中setjmp与longjmp学习笔记
C语言中setjmp与longjmp学习笔记一.基础介绍头文件:#include<setjmp.h> 原型: int setjmp(jmp_buf envbuf) ,然而longjm ...
【学习笔记】大数据技术原理与应用（MOOC视频、厦门大学林子雨）
1 大数据概述大数据特性:4v volume velocity variety value 即大量化.快速化.多样化.价值密度低数据量大:大数据摩尔定律快速化:从数据的生成到消耗,时间窗口小,可 ...
【大数据】Sqoop学习笔记
第1章 Sqoop简介 Sqoop是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql.postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MyS ...
【大数据】Hive学习笔记
第1章 Hive基本概念 1.1 什么是Hive Hive:由Facebook开源用于解决海量结构化日志的数据统计. Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张表, ...
【大数据】Scala学习笔记
第 1 章 scala的概述1 1.1 学习sdala的原因 1 1.2 Scala语言诞生小故事 1 1.3 Scala 和 Java 以及 jvm 的关系分析图 2 1.4 Scala语言的特点 ...
【大数据】SparkStreaming学习笔记
第1章 Spark Streaming概述 1.1 Spark Streaming是什么 Spark Streaming用于流式数据的处理.Spark Streaming支持的数据输入源很多,例如:K ...
【大数据】SparkSql学习笔记
第1章 Spark SQL概述 1.1 什么是Spark SQL Spark SQL是Spark用来处理结构化数据的一个模块,它提供了2个编程抽象:DataFrame和 DataSet,并且作为分布式 ...
【大数据】SparkCore学习笔记
第1章 RDD概述 1.1 什么是RDD RDD(Resilient Distributed Dataset)叫做分布式数据集,是Spark中最基本的数据抽象.代码中是一个抽象类,它代表一个不可变.可 ...
【大数据】Kafka学习笔记
第1章 Kafka概述 1.1 消息队列 (1)点对点模式(一对一,消费者主动拉取数据,消息收到后消息清除) 点对点模型通常是一个基于拉取或者轮询的消息传送模型,这种模型从队列中请求信息,而不是将消息 ...
Java 学习笔记两大集合框架Map和Collection
两大框架图解 Collection接口由第一张图,我们可以知道,Collection接口的子接口有三种,分别是List接口,Set接口和Queue接口 List接口允许有重复的元素,元素按照添加的 ...

随机推荐

CornerNet-Lite：CornerNet粗暴优化，加速6倍还提点了 | BMVC 2020
论文对CornerNet进行了性能优化,提出了CornerNet-Saccade和CornerNet-Squeeze两个优化的CornerNet变种,优化的手段具有很高的针对性和局限性,不过依然有很多 ...
#交互，鸽笼原理#CF1776C Library game
题目有一个长度为 \(m\) 的书架,以及 \(n\) 个长度 \(a_1,a_2,\dots,a_n\) Alessia 和 Bernardo 从书架上取书.每次由 Alessia 选择一个之前没 ...
#倍增FFT#CF755G PolandBall and Many Other Balls
题目有一排 \(n\) 个球,定义一个组可以只包含一个球或者包含两个相邻的球. 现在一个球只能分到一个组中,求从这些球中取出 \(k\) 组的方案数. \(n\leq 10^9 ,k<2^{1 ...
RabbitMQ 06 工作队列模式
工作队列模式结构图: 这种模式非常适合多个工人等待任务到来的场景.任务有多个,一个一个丢进消息队列,工人也有很多个,就可以将这些任务分配个各个工人,让他们各自负责一些任务,并且做的快的工人还可以多完成 ...
高并发报错too many clients already或无法创建线程
高并发报错 too many clients already 或无法创建线程本文出处:https://www.modb.pro/db/432236 问题现象高并发执行 SQL,报错"so ...
挑战吧，HarmonyOS应用开发工程师
一年一度属于工程师的专属节日1024已过,但程序员多重活动持续进行中~ 参与活动即有机会获得HUAWEI Freebuds 5i 耳机等精美礼品! 点击"阅读原文"查看更多活动 ...
DevEco Studio 3.1差异化构建打包，提升多版本应用开发效率
原文:https://mp.weixin.qq.com/s/8XtgZ-k0mGXCjKHfSXFoOg,点击链接查看更多技术内容. HUAWEI DevEco Studio是开发Harmo ...
Excel 分组后计算
分组后的计算都类似,仍然采用 groups 函数,分组并同时计算出各洲的 GDP 总量,然后再求一遍各洲的 GDP 总量占全球 GDP 的百分比值. SPL 代码如下: A B 1 =clipbo ...
重新点亮linux 命令树————内存与文件系统的查看[二十七]
前言简单介绍一下内存的查看. 正文常用命令. free top 查看磁盘使用率: fdisk df du du和ls的区别 free -h 查看内存使用率: free -g 显示按G来显示,-m用 ...
使用mmdetection训练自己的coco数据集(免费分享自制数据集文件)
首先需要准备好数据集,这里有labelme标签数据转coco数据集标签的说明:labelme转coco数据集 - 一届书生 - 博客园 (cnblogs.com) 1. 准备工作目录我们的工作目录, ...

LLM基础能力实现-书生浦语大模型实战营学习笔记2&大语言模型4