1 Introduction

In this guide, we prepare a BERT-CNN ensemble which takes the embeddings generated by the BERT base model and feeds them into a CNN. The general logic from this guide can be used to replace the CNN with any other NN of your choice. Future guides will explore other models like Bi-Directional LSTMs and the use of self-attention in embedding layer aggregation.

Like other guides, this walk through provides a complete treatment of the data preparation and training of the BERT-CNN ensemble in PyTorch.

1.1 Prepare Data

We begin by loading and lightly editing our data prior to tokenization.

# prepare and load data
def prepare_df(pkl_location):
    # read pkl as pandas
    df = pd.read_pickle(pkl_location)
    # just keep us/kabul labels
    df = df.loc[(df['target'] == 'US') | (df['target'] == 'Kabul')]
    # mask DV to recode
    us = df['target'] == 'US'
    kabul = df['target'] == 'Kabul'
    # apply mask
    df.loc[us, 'target'] = 1
    df.loc[kabul, 'target'] = 0
    # reset index
    df = df.reset_index(drop=True)
    return df


df = prepare_df('C:\\Users\\Andrew\\Desktop\\df.pkl')


# prepare data
def clean_df(df):
    # strip dash but keep a space
    df['body'] = df['body'].str.replace('-', ' ')
    # prepare keys for punctuation removal
    translator = str.maketrans(dict.fromkeys(string.punctuation))
    # lower case the data
    df['body'] = df['body'].apply(lambda x: x.lower())
    # remove excess spaces near punctuation
    df['body'] = df['body'].apply(lambda x: re.sub(r'\s([?.!"](?:\s|$))', r'\1', x))
    # remove punctuation  -- f1 improves by .05 by disabling this
    #df['body'] = df['body'].apply(lambda x: x.translate(translator))
    # generate a word count
    df['word_count'] = df['body'].apply(lambda x: len(x.split()))
    # remove excess white spaces
    df['body'] = df['body'].apply(lambda x: " ".join(x.split()))

    return df


df = clean_df(df)


# lets remove rare words
def remove_rare_words(df):
    # get counts of each word -- necessary for vocab
    counts = Counter(" ".join(df['body'].values.tolist()).split(" "))
    # remove low counts -- keep those above 2
    counts = {key: value for key, value in counts.items() if value > 2}

    # remove rare words from corpus
    def remove_rare(x):
        return ' '.join(list(filter(lambda x: x in counts.keys(), x.split())))

    # apply funx
    df['body'] = df['body'].apply(remove_rare)
    return df


df = remove_rare_words(df)


# remove transliterated words that GloVe can't find
no_matches = np.load('C:\\Users\\Andrew\\translit_no_match.npy')
no_matches = dict(zip(set(no_matches), range(len(set(no_matches)))))

# remove transliterated words from corpus
df['body'] = df['body'].apply(lambda x: ' '.join(list(filter(lambda x: x not in no_matches.keys(), x.split()))))

1.2 Tokenize

Next, we instantiate the BERT tokenizer from transformers and tokenize our entire corpus.

# instantiate BERT tokenizer with upper + lower case
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')


# a look at some of the BERT vocab

## 
Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]
Downloading: 100%|##########| 232k/232k [00:00<00:00, 7.99MB/s]

word_map = dict(zip(tokenizer.vocab.keys(), range(len(tokenizer))))
word_map.get('the')  # find index value

## 1996

list(tokenizer.vocab.keys())[2000:2010]

## ['to', 'was', 'he', 'is', 'as', 'for', 'on', 'with', 'that', 'it']

len(tokenizer)

## 30522

# tokenize corpus using BERT
def tokenize_corpus(df, tokenizer, max_len):
    # token ID storage
    input_ids = []
    # attension mask storage
    attention_masks = []
    # max len -- 512 is max
    max_len = max_len
    # for every document:
    for doc in df:
        # `encode_plus` will:
        #   (1) Tokenize the sentence.
        #   (2) Prepend the `[CLS]` token to the start.
        #   (3) Append the `[SEP]` token to the end.
        #   (4) Map tokens to their IDs.
        #   (5) Pad or truncate the sentence to `max_length`
        #   (6) Create attention masks for [PAD] tokens.
        encoded_dict = tokenizer.encode_plus(
                            doc,  # document to encode.
                            add_special_tokens=True,  # add '[CLS]' and '[SEP]'
                            max_length=max_len,  # set max length
                            truncation=True,  # truncate longer messages
                            pad_to_max_length=True,  # add padding
                            return_attention_mask=True,  # create attn. masks
                            return_tensors='pt'  # return pytorch tensors
                       )

        # add the tokenized sentence to the list
        input_ids.append(encoded_dict['input_ids'])

        # and its attention mask (differentiates padding from non-padding)
        attention_masks.append(encoded_dict['attention_mask'])

    return torch.cat(input_ids, dim=0), torch.cat(attention_masks, dim=0)


# create tokenized data
input_ids, attention_masks = tokenize_corpus(df['body'].values, tokenizer, 512)

# convert the labels into tensors.
labels = torch.tensor(df['target'].values.astype(np.float32))

1.3 Create Tensor Data Sets

With the corpus tokenized, we now proceed to prepare our data for analysis in PyTorch. The code below creates a TensorDataset comprised of our features, attention masks, and our labels. It then proceeds to spit the data sets into train, validation, and test sets.

For our purposes, we will not actually use the labels as we are simply using the BERT transformer without any specific head on top. The CNN will be our head that we place on-top of the network.

# prepare tensor data sets
def prepare_dataset(padded_tokens, attention_masks, target):
    # prepare target into np array
    target = np.array(target.values, dtype=np.int64).reshape(-1, 1)
    # create tensor data sets
    tensor_df = TensorDataset(padded_tokens, attention_masks, torch.from_numpy(target))
    # 80% of df
    train_size = int(0.8 * len(df))
    # 20% of df
    val_size = len(df) - train_size
    # 50% of validation
    test_size = int(val_size - 0.5*val_size)
    # divide the dataset by randomly selecting samples
    train_dataset, val_dataset = random_split(tensor_df, [train_size, val_size])
    # divide validation by randomly selecting samples
    val_dataset, test_dataset = random_split(val_dataset, [test_size, test_size+1])

    return train_dataset, val_dataset, test_dataset


# create tensor data sets
train_dataset, val_dataset, test_dataset = prepare_dataset(input_ids,
                                                           attention_masks,
                                                           df['target'])

1.4 Samplers and Helper Functions for Imbalanced Classification

Since my corpus is imbalanced, we produce weighted samplers to help balance the distribution of data as it is fed via the data loaders.

# helper function to count target distribution inside tensor data sets
def target_count(tensor_dataset):
    # set empty count containers
    count0 = 0
    count1 = 0
    # set total container to turn into torch tensor
    total = []
    # for every item in the tensor data set
    for i in tensor_dataset:
        # if the target is equal to 0
        if i[2].item() == 0:
            count0 += 1
        # if the target is equal to 1
        elif i[2].item() == 1:
            count1 += 1
    total.append(count0)
    total.append(count1)
    return torch.tensor(total)

# prepare weighted sampling for imbalanced classification
def create_sampler(target_tensor, tensor_dataset):
    # generate class distributions [x, y]
    class_sample_count = target_count(tensor_dataset)
    # weight
    weight = 1. / class_sample_count.float()
    # produce weights for each observation in the data set
    samples_weight = torch.tensor([weight[t[2]] for t in tensor_dataset])
    # prepare sampler
    sampler = torch.utils.data.WeightedRandomSampler(weights=samples_weight,
                                                     num_samples=len(samples_weight),
                                                     replacement=True)
    return sampler


# create samplers for just the training set
train_sampler = create_sampler(target_count(train_dataset), train_dataset)


# time function
def format_time(elapsed):
    '''
    Takes a time in seconds and returns a string hh:mm:ss
    '''
    # round to the nearest second.
    elapsed_rounded = int(round((elapsed)))
    # format as hh:mm:ss
    return str(datetime.timedelta(seconds=elapsed_rounded))

1.5 Prepare Data Loaders

Now we instantiate the data loaders.

# create DataLoaders with samplers
train_dataloader = DataLoader(train_dataset,
                              batch_size=8,
                              sampler=train_sampler,
                              shuffle=False)

valid_dataloader = DataLoader(val_dataset,
                              batch_size=8,
                              shuffle=True)

test_dataloader = DataLoader(test_dataset,
                              batch_size=8,
                              shuffle=True)

1.6 Prepare Modified Kim CNN

Here, we modify the previously used CNN class. We strip out the nn.Embedding layers as we are no longer providing a look-up table for embedding vectors. Instead, we are injecting the embedding vectors directly into the CNN from BERT.

# Build Kim Yoon CNN
class KimCNN(nn.Module):

    def __init__(self, config):
        super().__init__()
        output_channel = config.output_channel  # number of kernels
        num_classes = config.num_classes  # number of targets to predict
        dropout = config.dropout  # dropout value
        embedding_dim = config.embedding_dim  # length of embedding dim

        ks = 3  # three conv nets here

        # input_channel = word embeddings at a value of 1; 3 for RGB images
        input_channel = 4  # for single embedding, input_channel = 1

        # [3, 4, 5] = window height
        # padding = padding to account for height of search window

        # 3 convolutional nets
        self.conv1 = nn.Conv2d(input_channel, output_channel, (3, embedding_dim), padding=(2, 0), groups=4)
        self.conv2 = nn.Conv2d(input_channel, output_channel, (4, embedding_dim), padding=(3, 0), groups=4)
        self.conv3 = nn.Conv2d(input_channel, output_channel, (5, embedding_dim), padding=(4, 0), groups=4)

        # apply dropout
        self.dropout = nn.Dropout(dropout)

        # fully connected layer for classification
        # 3x conv nets * output channel
        self.fc1 = nn.Linear(ks * output_channel, num_classes)

    def forward(self, x, **kwargs):
        #x = x.unsqueeze(1)  # get another dimension at first index pos
        # squeeze to get size; (batch, channel_output, ~=sent_len) * ks
        x = [F.relu(self.conv1(x)).squeeze(3), F.relu(self.conv2(x)).squeeze(3), F.relu(self.conv3(x)).squeeze(3)]
        # max-over-time pooling; # (batch, channel_output) * ks
        x = [F.max_pool1d(i, i.size(2)).squeeze(2) for i in x]
        # concat results; (batch, channel_output * ks)
        x = torch.cat(x, 1)
        # add dropout
        x = self.dropout(x)
        # generate logits (batch, target_size)
        logit = self.fc1(x)
        return logit

1.7 Instantiate Training Functions

Now, we prepare functions to train, validate, and test our data.

def train(model, dataloader, optimizer):

    # capture time
    total_t0 = time.time()

    # Perform one full pass over the training set.
    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch + 1, epochs))
    print('Training...')

    # reset total loss for epoch
    train_total_loss = 0
    total_train_f1 = 0

    # put both models into traning mode
    model.train()
    kim_model.train()

    # for each batch of training data...
    for step, batch in enumerate(dataloader):

        # progress update every 40 batches.
        if step % 40 == 0 and not step == 0:

            # Report progress.
            print('  Batch {:>5,}  of  {:>5,}.'.format(step, len(dataloader)))

        # Unpack this training batch from our dataloader:
        #
        # As we unpack the batch, we'll also copy each tensor to the GPU
        #
        # `batch` contains three pytorch tensors:
        #   [0]: input ids
        #   [1]: attention masks
        #   [2]: labels
        b_input_ids = batch[0].cuda()
        b_input_mask = batch[1].cuda()
        b_labels = batch[2].cuda().long()

        # clear previously calculated gradients
        optimizer.zero_grad()

        # runs the forward pass with autocasting.
        with autocast():
            # forward propagation (evaluate model on training batch)
            outputs = model(input_ids=b_input_ids, attention_mask=b_input_mask)

            hidden_layers = outputs[2]  # get hidden layers

            hidden_layers = torch.stack(hidden_layers, dim=1)  # stack the layers

            hidden_layers = hidden_layers[:, -4:]  # get the last 4 layers

        logits = kim_model(hidden_layers)

        loss = criterion(logits.view(-1, 2), b_labels.view(-1))

        # sum the training loss over all batches for average loss at end
        # loss is a tensor containing a single value
        train_total_loss += loss.item()

        # Scales loss. Calls backward() on scaled loss to create scaled gradients.
        # Backward passes under autocast are not recommended.
        # Backward ops run in the same dtype autocast chose for corresponding forward ops.
        scaler.scale(loss).backward()

        # scaler.step() first unscales the gradients of the optimizer's assigned params.
        # If these gradients do not contain infs or NaNs, optimizer.step() is then called,
        # otherwise, optimizer.step() is skipped.
        scaler.step(optimizer)

        # Updates the scale for next iteration.
        scaler.update()

        # Update the scheduler
        scheduler.step()

        # calculate preds
        _, predicted = torch.max(logits, 1)

        # move logits and labels to CPU
        predicted = predicted.detach().cpu().numpy()
        y_true = b_labels.detach().cpu().numpy()

        # calculate f1
        total_train_f1 += f1_score(predicted, y_true,
                                   average='weighted',
                                   labels=np.unique(predicted))

    # calculate the average loss over all of the batches
    avg_train_loss = train_total_loss / len(dataloader)

    # calculate the average f1 over all of the batches
    avg_train_f1 = total_train_f1 / len(dataloader)

    # training time end
    training_time = format_time(time.time() - total_t0)

    # Record all statistics from this epoch.
    training_stats.append(
        {
            'Train Loss': avg_train_loss,
            'Train F1': avg_train_f1,
            'Train Time': training_time
        }
    )

    # print result summaries
    print("")
    print("summary results")
    print("epoch | trn loss | trn f1 | trn time ")
    print(f"{epoch+1:5d} | {avg_train_loss:.5f} | {avg_train_f1:.5f} | {training_time:}")

    #torch.cuda.empty_cache()

    return None


def validating(model, dataloader):

    # capture validation time
    total_t0 = time.time()

    # After the completion of each training epoch, measure our performance on
    # our validation set.
    print("")
    print("Running Validation...")

    # put both models in evaluation mode
    model.eval()
    kim_model.eval()

    # track variables
    total_valid_accuracy = 0
    total_valid_loss = 0
    total_valid_f1 = 0
    total_valid_recall = 0
    total_valid_precision = 0
    total_bert_valid_loss = 0

    # evaluate data for one epoch
    for batch in dataloader:

        # Unpack this training batch from our dataloader:
        # `batch` contains three pytorch tensors:
        #   [0]: input ids
        #   [1]: attention masks
        #   [2]: labels
        b_input_ids = batch[0].cuda()
        b_input_mask = batch[1].cuda()
        b_labels = batch[2].cuda().long()

        # tell pytorch not to bother calculating gradients
        with torch.no_grad():
            # forward propagation (evaluate model on training batch)
            outputs = model(input_ids=b_input_ids, attention_mask=b_input_mask)

            hidden_layers = outputs[2]  # get hidden layers

            hidden_layers = torch.stack(hidden_layers, dim=1)  # stack the layers

            hidden_layers = hidden_layers[:, -4:]  # get the last 4 layers

        logits = kim_model(hidden_layers)

        loss = criterion(logits.view(-1, 2), b_labels.view(-1))

        # accumulate validation loss
        total_valid_loss += loss.item()

        # calculate preds
        _, predicted = torch.max(logits, 1)

        # move logits and labels to CPU
        predicted = predicted.detach().cpu().numpy()
        y_true = b_labels.detach().cpu().numpy()

        # calculate f1
        total_valid_f1 += f1_score(predicted, y_true,
                                   average='weighted',
                                   labels=np.unique(predicted))

        # calculate accuracy
        total_valid_accuracy += accuracy_score(predicted, y_true)

        # calculate precision
        total_valid_precision += precision_score(predicted, y_true,
                                                 average='weighted',
                                                 labels=np.unique(predicted))

        # calculate recall
        total_valid_recall += recall_score(predicted, y_true,
                                                 average='weighted',
                                                 labels=np.unique(predicted))

    # report final accuracy of validation run
    avg_accuracy = total_valid_accuracy / len(dataloader)

    # report final f1 of validation run
    global avg_val_f1
    avg_val_f1 = total_valid_f1 / len(dataloader)

    # report final f1 of validation run
    avg_precision = total_valid_precision / len(dataloader)

    # report final f1 of validation run
    avg_recall = total_valid_recall / len(dataloader)

    # calculate the average loss over all of the batches.
    global avg_val_loss
    avg_val_loss = total_valid_loss / len(dataloader)

    # capture end validation time
    training_time = format_time(time.time() - total_t0)

    # Record all statistics from this epoch.
    valid_stats.append(
        {
            'Val Loss': avg_val_loss,
            'Val Accur.': avg_accuracy,
            'Val precision': avg_precision,
            'Val recall': avg_recall,
            'Val F1': avg_val_f1,
            'Val Time': training_time
        }
    )

    # print result summaries
    print("")
    print("summary results")
    print("epoch | val loss | val f1 | val time")
    print(f"{epoch+1:5d} | {avg_val_loss:.5f} | {avg_val_f1:.5f} | {training_time:}")

    return None


def testing(model, dataloader):

    print("")
    print("Running Testing...")

    # capture test time
    total_t0 = time.time()

    # put both models in evaluation mode
    model.eval()
    kim_model.eval()

    # track variables
    total_test_accuracy = 0
    total_test_loss = 0
    total_test_f1 = 0
    total_test_recall = 0
    total_test_precision = 0

    # evaluate data for one epoch
    for batch in dataloader:

        # Unpack this training batch from our dataloader:
        # `batch` contains three pytorch tensors:
        #   [0]: input ids
        #   [1]: attention masks
        #   [2]: labels
        b_input_ids = batch[0].cuda()
        b_input_mask = batch[1].cuda()
        b_labels = batch[2].cuda().long()

        # tell pytorch not to bother calculating gradients
        with torch.no_grad():
            # forward propagation (evaluate model on training batch)
            outputs = model(input_ids=b_input_ids, attention_mask=b_input_mask)

            hidden_layers = outputs[2]  # get hidden layers

            hidden_layers = torch.stack(hidden_layers, dim=1)  # stack the layers

            hidden_layers = hidden_layers[:, -4:]  # get the last 4 layers

        logits = kim_model(hidden_layers)

        loss = criterion(logits.view(-1, 2), b_labels.view(-1))

        # accumulate validation loss
        total_test_loss += loss.item()

        # calculate preds
        _, predicted = torch.max(logits, 1)

        # move logits and labels to CPU
        predicted = predicted.detach().cpu().numpy()
        y_true = b_labels.detach().cpu().numpy()

        # calculate f1
        total_test_f1 += f1_score(predicted, y_true,
                                   average='weighted',
                                   labels=np.unique(predicted))

        # calculate accuracy
        total_test_accuracy += accuracy_score(predicted, y_true)

        # calculate precision
        total_test_precision += precision_score(predicted, y_true,
                                                 average='weighted',
                                                 labels=np.unique(predicted))

        # calculate recall
        total_test_recall += recall_score(predicted, y_true,
                                                 average='weighted',
                                                 labels=np.unique(predicted))

    # report final accuracy of test run
    avg_accuracy = total_test_accuracy / len(dataloader)

    # report final f1 of test run
    avg_test_f1 = total_test_f1 / len(dataloader)

    # report final f1 of test run
    avg_precision = total_test_precision / len(dataloader)

    # report final f1 of test run
    avg_recall = total_test_recall / len(dataloader)

    # calculate the average loss over all of the batches.
    avg_test_loss = total_test_loss / len(dataloader)

    # capture end testing time
    training_time = format_time(time.time() - total_t0)

    # Record all statistics from this epoch.
    test_stats.append(
        {
            'Test Loss': avg_test_loss,
            'Test Accur.': avg_accuracy,
            'Test precision': avg_precision,
            'Test recall': avg_recall,
            'Test F1': avg_test_f1,
            'Test Time': training_time
        }
    )
    # print result summaries
    print("")
    print("summary results")
    print("epoch | test loss | test f1 | test time")
    print(f"{epoch+1:5d} | {avg_test_loss:.5f} | {avg_test_f1:.5f} | {training_time:}")

    return None

1.8 Prepare Models for Training

Now we instantiate our models and attach them to the GPU. A few other preparatory objects are created like the loss criteria, epochs, the optimizer, and our optimizer scheduler.

# instantiate BERT model with hidden states
model = BertModel.from_pretrained('bert-base-uncased', output_hidden_states=True).cuda()

# instantiate CNN config

## 
Downloading:   0%|          | 0.00/433 [00:00<?, ?B/s]
Downloading: 100%|##########| 433/433 [00:00<00:00, 432kB/s]
## 
Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]
Downloading:   0%|          | 688k/440M [00:00<01:04, 6.87MB/s]
Downloading:   1%|          | 2.25M/440M [00:00<00:53, 8.25MB/s]
Downloading:   1%|          | 3.80M/440M [00:00<00:45, 9.59MB/s]
Downloading:   1%|1         | 5.20M/440M [00:00<00:41, 10.6MB/s]
Downloading:   1%|1         | 6.53M/440M [00:00<00:38, 11.3MB/s]
Downloading:   2%|1         | 7.99M/440M [00:00<00:35, 12.1MB/s]
Downloading:   2%|2         | 9.25M/440M [00:00<00:35, 12.2MB/s]
Downloading:   2%|2         | 10.7M/440M [00:00<00:33, 12.8MB/s]
Downloading:   3%|2         | 12.1M/440M [00:00<00:32, 13.0MB/s]
Downloading:   3%|3         | 13.5M/440M [00:01<00:32, 13.3MB/s]
Downloading:   3%|3         | 14.9M/440M [00:01<00:31, 13.5MB/s]
Downloading:   4%|3         | 16.4M/440M [00:01<00:30, 13.7MB/s]
Downloading:   4%|4         | 17.8M/440M [00:01<00:30, 13.9MB/s]
Downloading:   4%|4         | 19.2M/440M [00:01<00:30, 13.7MB/s]
Downloading:   5%|4         | 20.6M/440M [00:01<00:30, 13.8MB/s]
Downloading:   5%|4         | 22.0M/440M [00:01<00:30, 13.7MB/s]
Downloading:   5%|5         | 23.4M/440M [00:01<00:30, 13.6MB/s]
Downloading:   6%|5         | 24.8M/440M [00:01<00:29, 13.9MB/s]
Downloading:   6%|5         | 26.4M/440M [00:01<00:29, 14.2MB/s]
Downloading:   6%|6         | 27.8M/440M [00:02<00:32, 12.8MB/s]
Downloading:   7%|6         | 29.1M/440M [00:02<00:35, 11.4MB/s]
Downloading:   7%|6         | 30.4M/440M [00:02<00:34, 11.9MB/s]
Downloading:   7%|7         | 31.9M/440M [00:02<00:32, 12.6MB/s]
Downloading:   8%|7         | 33.3M/440M [00:02<00:30, 13.2MB/s]
Downloading:   8%|7         | 34.7M/440M [00:02<00:30, 13.3MB/s]
Downloading:   8%|8         | 36.1M/440M [00:02<00:30, 13.1MB/s]
Downloading:   8%|8         | 37.4M/440M [00:02<00:33, 11.9MB/s]
Downloading:   9%|8         | 38.8M/440M [00:02<00:32, 12.4MB/s]
Downloading:   9%|9         | 40.2M/440M [00:03<00:31, 12.9MB/s]
Downloading:   9%|9         | 41.7M/440M [00:03<00:30, 13.3MB/s]
Downloading:  10%|9         | 43.1M/440M [00:03<00:29, 13.5MB/s]
Downloading:  10%|#         | 44.6M/440M [00:03<00:28, 13.9MB/s]
Downloading:  10%|#         | 46.0M/440M [00:03<00:28, 13.9MB/s]
Downloading:  11%|#         | 47.4M/440M [00:03<00:28, 14.0MB/s]
Downloading:  11%|#1        | 48.9M/440M [00:03<00:27, 14.2MB/s]
Downloading:  11%|#1        | 50.3M/440M [00:03<00:27, 14.1MB/s]
Downloading:  12%|#1        | 51.7M/440M [00:03<00:28, 13.7MB/s]
Downloading:  12%|#2        | 53.1M/440M [00:03<00:28, 13.8MB/s]
Downloading:  12%|#2        | 54.5M/440M [00:04<00:28, 13.7MB/s]
Downloading:  13%|#2        | 55.9M/440M [00:04<00:37, 10.3MB/s]
Downloading:  13%|#2        | 57.3M/440M [00:04<00:34, 11.1MB/s]
Downloading:  13%|#3        | 58.7M/440M [00:04<00:32, 11.9MB/s]
Downloading:  14%|#3        | 60.0M/440M [00:04<00:32, 11.6MB/s]
Downloading:  14%|#3        | 61.4M/440M [00:04<00:31, 12.2MB/s]
Downloading:  14%|#4        | 62.6M/440M [00:04<00:30, 12.3MB/s]
Downloading:  15%|#4        | 64.0M/440M [00:04<00:30, 12.5MB/s]
Downloading:  15%|#4        | 65.4M/440M [00:05<00:29, 12.7MB/s]
Downloading:  15%|#5        | 66.8M/440M [00:05<00:28, 13.0MB/s]
Downloading:  16%|#5        | 68.3M/440M [00:05<00:27, 13.5MB/s]
Downloading:  16%|#5        | 69.7M/440M [00:05<00:27, 13.6MB/s]
Downloading:  16%|#6        | 71.1M/440M [00:05<00:26, 13.8MB/s]
Downloading:  16%|#6        | 72.6M/440M [00:05<00:26, 14.1MB/s]
Downloading:  17%|#6        | 74.0M/440M [00:05<00:26, 14.1MB/s]
Downloading:  17%|#7        | 75.4M/440M [00:05<00:26, 13.9MB/s]
Downloading:  17%|#7        | 76.8M/440M [00:05<00:31, 11.5MB/s]
Downloading:  18%|#7        | 78.1M/440M [00:06<00:32, 11.0MB/s]
Downloading:  18%|#8        | 79.6M/440M [00:06<00:30, 11.9MB/s]
Downloading:  18%|#8        | 81.0M/440M [00:06<00:28, 12.4MB/s]
Downloading:  19%|#8        | 82.3M/440M [00:06<00:27, 12.8MB/s]
Downloading:  19%|#9        | 83.9M/440M [00:06<00:26, 13.5MB/s]
Downloading:  19%|#9        | 85.3M/440M [00:06<00:26, 13.4MB/s]
Downloading:  20%|#9        | 86.8M/440M [00:06<00:25, 13.8MB/s]
Downloading:  20%|##        | 88.2M/440M [00:06<00:25, 13.6MB/s]
Downloading:  20%|##        | 89.7M/440M [00:06<00:24, 14.1MB/s]
Downloading:  21%|##        | 91.1M/440M [00:06<00:28, 12.4MB/s]
Downloading:  21%|##1       | 92.6M/440M [00:07<00:26, 13.0MB/s]
Downloading:  21%|##1       | 94.0M/440M [00:07<00:26, 13.2MB/s]
Downloading:  22%|##1       | 95.4M/440M [00:07<00:25, 13.5MB/s]
Downloading:  22%|##1       | 96.9M/440M [00:07<00:24, 13.9MB/s]
Downloading:  22%|##2       | 98.3M/440M [00:07<00:24, 13.9MB/s]
Downloading:  23%|##2       | 99.7M/440M [00:07<00:24, 13.7MB/s]
Downloading:  23%|##2       | 101M/440M [00:07<00:26, 13.0MB/s] 
Downloading:  23%|##3       | 102M/440M [00:07<00:25, 13.2MB/s]
Downloading:  24%|##3       | 104M/440M [00:07<00:24, 13.5MB/s]
Downloading:  24%|##3       | 105M/440M [00:08<00:24, 13.5MB/s]
Downloading:  24%|##4       | 107M/440M [00:08<00:24, 13.4MB/s]
Downloading:  25%|##4       | 108M/440M [00:08<00:25, 13.1MB/s]
Downloading:  25%|##4       | 109M/440M [00:08<00:25, 13.1MB/s]
Downloading:  25%|##5       | 111M/440M [00:08<00:26, 12.7MB/s]
Downloading:  25%|##5       | 112M/440M [00:08<00:24, 13.3MB/s]
Downloading:  26%|##5       | 114M/440M [00:08<00:24, 13.3MB/s]
Downloading:  26%|##6       | 115M/440M [00:08<00:23, 13.6MB/s]
Downloading:  26%|##6       | 116M/440M [00:08<00:23, 13.9MB/s]
Downloading:  27%|##6       | 118M/440M [00:08<00:23, 14.0MB/s]
Downloading:  27%|##7       | 119M/440M [00:09<00:22, 14.0MB/s]
Downloading:  27%|##7       | 121M/440M [00:09<00:22, 14.0MB/s]
Downloading:  28%|##7       | 122M/440M [00:09<00:23, 13.8MB/s]
Downloading:  28%|##8       | 124M/440M [00:09<00:22, 13.9MB/s]
Downloading:  28%|##8       | 125M/440M [00:09<00:23, 13.7MB/s]
Downloading:  29%|##8       | 126M/440M [00:09<00:22, 14.0MB/s]
Downloading:  29%|##9       | 128M/440M [00:09<00:22, 14.1MB/s]
Downloading:  29%|##9       | 129M/440M [00:09<00:22, 13.9MB/s]
Downloading:  30%|##9       | 131M/440M [00:09<00:22, 14.1MB/s]
Downloading:  30%|###       | 132M/440M [00:09<00:21, 14.2MB/s]
Downloading:  30%|###       | 134M/440M [00:10<00:21, 14.3MB/s]
Downloading:  31%|###       | 135M/440M [00:10<00:20, 14.7MB/s]
Downloading:  31%|###1      | 137M/440M [00:10<00:21, 14.0MB/s]
Downloading:  31%|###1      | 138M/440M [00:10<00:21, 13.8MB/s]
Downloading:  32%|###1      | 140M/440M [00:10<00:21, 14.0MB/s]
Downloading:  32%|###2      | 141M/440M [00:10<00:21, 13.8MB/s]
Downloading:  32%|###2      | 142M/440M [00:10<00:21, 13.7MB/s]
Downloading:  33%|###2      | 144M/440M [00:10<00:21, 13.9MB/s]
Downloading:  33%|###2      | 145M/440M [00:10<00:21, 14.0MB/s]
Downloading:  33%|###3      | 147M/440M [00:10<00:21, 13.8MB/s]
Downloading:  34%|###3      | 148M/440M [00:11<00:20, 14.1MB/s]
Downloading:  34%|###3      | 150M/440M [00:11<00:20, 14.1MB/s]
Downloading:  34%|###4      | 151M/440M [00:11<00:20, 13.8MB/s]
Downloading:  35%|###4      | 152M/440M [00:11<00:20, 13.9MB/s]
Downloading:  35%|###4      | 154M/440M [00:11<00:20, 13.8MB/s]
Downloading:  35%|###5      | 155M/440M [00:11<00:20, 13.7MB/s]
Downloading:  36%|###5      | 157M/440M [00:11<00:20, 13.6MB/s]
Downloading:  36%|###5      | 158M/440M [00:11<00:20, 13.7MB/s]
Downloading:  36%|###6      | 159M/440M [00:11<00:20, 13.6MB/s]
Downloading:  37%|###6      | 161M/440M [00:12<00:20, 13.7MB/s]
Downloading:  37%|###6      | 162M/440M [00:12<00:19, 14.1MB/s]
Downloading:  37%|###7      | 164M/440M [00:12<00:20, 13.7MB/s]
Downloading:  38%|###7      | 165M/440M [00:12<00:19, 14.2MB/s]
Downloading:  38%|###7      | 167M/440M [00:12<00:19, 14.2MB/s]
Downloading:  38%|###8      | 168M/440M [00:12<00:18, 14.4MB/s]
Downloading:  39%|###8      | 170M/440M [00:12<00:18, 14.4MB/s]
Downloading:  39%|###8      | 171M/440M [00:12<00:18, 14.2MB/s]
Downloading:  39%|###9      | 173M/440M [00:12<00:18, 14.3MB/s]
Downloading:  40%|###9      | 174M/440M [00:12<00:18, 14.2MB/s]
Downloading:  40%|###9      | 175M/440M [00:13<00:18, 14.1MB/s]
Downloading:  40%|####      | 177M/440M [00:13<00:30, 8.50MB/s]
Downloading:  40%|####      | 178M/440M [00:13<00:26, 9.72MB/s]
Downloading:  41%|####      | 180M/440M [00:13<00:24, 10.8MB/s]
Downloading:  41%|####1     | 181M/440M [00:13<00:22, 11.6MB/s]
Downloading:  41%|####1     | 183M/440M [00:13<00:20, 12.3MB/s]
Downloading:  42%|####1     | 184M/440M [00:13<00:19, 12.8MB/s]
Downloading:  42%|####2     | 185M/440M [00:13<00:19, 13.1MB/s]
Downloading:  42%|####2     | 187M/440M [00:14<00:18, 13.4MB/s]
Downloading:  43%|####2     | 188M/440M [00:14<00:18, 13.4MB/s]
Downloading:  43%|####3     | 190M/440M [00:14<00:18, 13.4MB/s]
Downloading:  43%|####3     | 191M/440M [00:14<00:18, 13.4MB/s]
Downloading:  44%|####3     | 192M/440M [00:14<00:18, 13.5MB/s]
Downloading:  44%|####3     | 194M/440M [00:14<00:18, 13.4MB/s]
Downloading:  44%|####4     | 195M/440M [00:14<00:17, 13.7MB/s]
Downloading:  45%|####4     | 197M/440M [00:14<00:17, 13.7MB/s]
Downloading:  45%|####4     | 198M/440M [00:14<00:17, 14.0MB/s]
Downloading:  45%|####5     | 199M/440M [00:14<00:17, 14.0MB/s]
Downloading:  46%|####5     | 201M/440M [00:15<00:17, 13.9MB/s]
Downloading:  46%|####5     | 202M/440M [00:15<00:16, 14.2MB/s]
Downloading:  46%|####6     | 204M/440M [00:15<00:17, 13.6MB/s]
Downloading:  47%|####6     | 205M/440M [00:15<00:16, 13.9MB/s]
Downloading:  47%|####6     | 207M/440M [00:15<00:17, 13.7MB/s]
Downloading:  47%|####7     | 208M/440M [00:15<00:17, 13.6MB/s]
Downloading:  48%|####7     | 209M/440M [00:15<00:17, 13.6MB/s]
Downloading:  48%|####7     | 211M/440M [00:15<00:16, 13.7MB/s]
Downloading:  48%|####8     | 212M/440M [00:15<00:16, 13.8MB/s]
Downloading:  49%|####8     | 214M/440M [00:16<00:16, 14.0MB/s]
Downloading:  49%|####8     | 215M/440M [00:16<00:15, 14.1MB/s]
Downloading:  49%|####9     | 217M/440M [00:16<00:16, 13.8MB/s]
Downloading:  50%|####9     | 218M/440M [00:16<00:15, 14.1MB/s]
Downloading:  50%|####9     | 220M/440M [00:16<00:15, 14.0MB/s]
Downloading:  50%|#####     | 221M/440M [00:16<00:15, 14.3MB/s]
Downloading:  51%|#####     | 223M/440M [00:16<00:15, 14.3MB/s]
Downloading:  51%|#####     | 224M/440M [00:16<00:15, 14.1MB/s]
Downloading:  51%|#####1    | 225M/440M [00:16<00:15, 14.0MB/s]
Downloading:  51%|#####1    | 227M/440M [00:16<00:15, 13.9MB/s]
Downloading:  52%|#####1    | 228M/440M [00:17<00:15, 13.7MB/s]
Downloading:  52%|#####2    | 230M/440M [00:17<00:15, 13.8MB/s]
Downloading:  52%|#####2    | 231M/440M [00:17<00:15, 13.9MB/s]
Downloading:  53%|#####2    | 232M/440M [00:17<00:14, 14.0MB/s]
Downloading:  53%|#####3    | 234M/440M [00:17<00:14, 14.1MB/s]
Downloading:  53%|#####3    | 235M/440M [00:17<00:14, 14.2MB/s]
Downloading:  54%|#####3    | 237M/440M [00:17<00:14, 14.0MB/s]
Downloading:  54%|#####4    | 238M/440M [00:17<00:14, 13.5MB/s]
Downloading:  54%|#####4    | 240M/440M [00:17<00:14, 13.9MB/s]
Downloading:  55%|#####4    | 241M/440M [00:17<00:14, 13.6MB/s]
Downloading:  55%|#####5    | 242M/440M [00:18<00:14, 13.5MB/s]
Downloading:  55%|#####5    | 244M/440M [00:18<00:14, 13.5MB/s]
Downloading:  56%|#####5    | 245M/440M [00:18<00:13, 13.9MB/s]
Downloading:  56%|#####6    | 247M/440M [00:18<00:13, 14.0MB/s]
Downloading:  56%|#####6    | 248M/440M [00:18<00:13, 14.1MB/s]
Downloading:  57%|#####6    | 250M/440M [00:18<00:13, 14.0MB/s]
Downloading:  57%|#####7    | 251M/440M [00:18<00:13, 13.8MB/s]
Downloading:  57%|#####7    | 253M/440M [00:18<00:13, 13.9MB/s]
Downloading:  58%|#####7    | 254M/440M [00:18<00:13, 14.1MB/s]
Downloading:  58%|#####7    | 255M/440M [00:19<00:13, 14.1MB/s]
Downloading:  58%|#####8    | 257M/440M [00:19<00:12, 14.2MB/s]
Downloading:  59%|#####8    | 258M/440M [00:19<00:12, 14.1MB/s]
Downloading:  59%|#####8    | 260M/440M [00:19<00:12, 13.9MB/s]
Downloading:  59%|#####9    | 261M/440M [00:19<00:12, 14.0MB/s]
Downloading:  60%|#####9    | 263M/440M [00:19<00:13, 13.4MB/s]
Downloading:  60%|#####9    | 264M/440M [00:19<00:13, 13.4MB/s]
Downloading:  60%|######    | 265M/440M [00:19<00:12, 13.5MB/s]
Downloading:  61%|######    | 267M/440M [00:19<00:12, 13.5MB/s]
Downloading:  61%|######    | 268M/440M [00:19<00:12, 13.4MB/s]
Downloading:  61%|######1   | 270M/440M [00:20<00:12, 13.7MB/s]
Downloading:  62%|######1   | 271M/440M [00:20<00:12, 13.1MB/s]
Downloading:  62%|######1   | 272M/440M [00:20<00:18, 9.17MB/s]
Downloading:  62%|######2   | 273M/440M [00:20<00:19, 8.78MB/s]
Downloading:  62%|######2   | 274M/440M [00:20<00:19, 8.55MB/s]
Downloading:  63%|######2   | 276M/440M [00:20<00:17, 9.65MB/s]
Downloading:  63%|######2   | 277M/440M [00:20<00:15, 10.7MB/s]
Downloading:  63%|######3   | 279M/440M [00:20<00:13, 11.7MB/s]
Downloading:  64%|######3   | 280M/440M [00:21<00:13, 12.0MB/s]
Downloading:  64%|######3   | 281M/440M [00:21<00:13, 12.2MB/s]
Downloading:  64%|######4   | 283M/440M [00:21<00:11, 13.2MB/s]
Downloading:  65%|######4   | 285M/440M [00:21<00:11, 13.9MB/s]
Downloading:  65%|######4   | 286M/440M [00:21<00:11, 14.0MB/s]
Downloading:  65%|######5   | 287M/440M [00:21<00:11, 13.8MB/s]
Downloading:  66%|######5   | 289M/440M [00:21<00:11, 13.7MB/s]
Downloading:  66%|######5   | 290M/440M [00:21<00:10, 13.8MB/s]
Downloading:  66%|######6   | 292M/440M [00:21<00:10, 13.6MB/s]
Downloading:  67%|######6   | 293M/440M [00:21<00:10, 13.8MB/s]
Downloading:  67%|######6   | 295M/440M [00:22<00:10, 13.7MB/s]
Downloading:  67%|######7   | 296M/440M [00:22<00:10, 13.8MB/s]
Downloading:  67%|######7   | 297M/440M [00:22<00:10, 13.7MB/s]
Downloading:  68%|######7   | 299M/440M [00:22<00:10, 13.7MB/s]
Downloading:  68%|######8   | 300M/440M [00:22<00:10, 13.8MB/s]
Downloading:  68%|######8   | 302M/440M [00:22<00:09, 14.0MB/s]
Downloading:  69%|######8   | 303M/440M [00:22<00:09, 13.9MB/s]
Downloading:  69%|######9   | 304M/440M [00:22<00:09, 14.0MB/s]
Downloading:  69%|######9   | 306M/440M [00:22<00:09, 13.6MB/s]
Downloading:  70%|######9   | 307M/440M [00:23<00:09, 13.4MB/s]
Downloading:  70%|#######   | 309M/440M [00:23<00:09, 13.7MB/s]
Downloading:  70%|#######   | 310M/440M [00:23<00:09, 13.6MB/s]
Downloading:  71%|#######   | 311M/440M [00:23<00:09, 13.4MB/s]
Downloading:  71%|#######1  | 313M/440M [00:23<00:09, 13.6MB/s]
Downloading:  71%|#######1  | 314M/440M [00:23<00:09, 13.4MB/s]
Downloading:  72%|#######1  | 315M/440M [00:23<00:09, 13.4MB/s]
Downloading:  72%|#######1  | 317M/440M [00:23<00:09, 13.7MB/s]
Downloading:  72%|#######2  | 318M/440M [00:23<00:08, 14.1MB/s]
Downloading:  73%|#######2  | 320M/440M [00:23<00:08, 13.9MB/s]
Downloading:  73%|#######2  | 321M/440M [00:24<00:08, 13.9MB/s]
Downloading:  73%|#######3  | 323M/440M [00:24<00:08, 14.0MB/s]
Downloading:  74%|#######3  | 324M/440M [00:24<00:08, 14.2MB/s]
Downloading:  74%|#######3  | 326M/440M [00:24<00:08, 13.7MB/s]
Downloading:  74%|#######4  | 327M/440M [00:24<00:08, 13.9MB/s]
Downloading:  75%|#######4  | 329M/440M [00:24<00:08, 13.5MB/s]
Downloading:  75%|#######4  | 330M/440M [00:24<00:08, 13.5MB/s]
Downloading:  75%|#######5  | 331M/440M [00:24<00:07, 13.7MB/s]
Downloading:  76%|#######5  | 333M/440M [00:24<00:07, 13.7MB/s]
Downloading:  76%|#######5  | 334M/440M [00:24<00:08, 13.0MB/s]
Downloading:  76%|#######6  | 335M/440M [00:25<00:08, 12.3MB/s]
Downloading:  76%|#######6  | 337M/440M [00:25<00:10, 10.2MB/s]
Downloading:  77%|#######6  | 338M/440M [00:25<00:09, 11.2MB/s]
Downloading:  77%|#######7  | 339M/440M [00:25<00:08, 11.5MB/s]
Downloading:  77%|#######7  | 341M/440M [00:25<00:08, 12.2MB/s]
Downloading:  78%|#######7  | 342M/440M [00:25<00:07, 13.0MB/s]
Downloading:  78%|#######8  | 344M/440M [00:25<00:07, 13.4MB/s]
Downloading:  78%|#######8  | 345M/440M [00:25<00:07, 13.5MB/s]
Downloading:  79%|#######8  | 347M/440M [00:25<00:06, 13.4MB/s]
Downloading:  79%|#######9  | 348M/440M [00:26<00:06, 13.5MB/s]
Downloading:  79%|#######9  | 349M/440M [00:26<00:06, 13.5MB/s]
Downloading:  80%|#######9  | 351M/440M [00:26<00:06, 13.6MB/s]
Downloading:  80%|#######9  | 352M/440M [00:26<00:06, 13.6MB/s]
Downloading:  80%|########  | 354M/440M [00:26<00:06, 13.9MB/s]
Downloading:  81%|########  | 355M/440M [00:26<00:06, 13.8MB/s]
Downloading:  81%|########  | 356M/440M [00:26<00:06, 13.5MB/s]
Downloading:  81%|########1 | 358M/440M [00:26<00:05, 13.8MB/s]
Downloading:  82%|########1 | 359M/440M [00:26<00:05, 14.0MB/s]
Downloading:  82%|########1 | 361M/440M [00:27<00:05, 14.2MB/s]
Downloading:  82%|########2 | 362M/440M [00:27<00:05, 14.5MB/s]
Downloading:  83%|########2 | 364M/440M [00:27<00:05, 14.5MB/s]
Downloading:  83%|########2 | 365M/440M [00:27<00:05, 13.9MB/s]
Downloading:  83%|########3 | 367M/440M [00:27<00:05, 14.3MB/s]
Downloading:  84%|########3 | 368M/440M [00:27<00:04, 14.5MB/s]
Downloading:  84%|########3 | 370M/440M [00:27<00:04, 14.6MB/s]
Downloading:  84%|########4 | 371M/440M [00:27<00:04, 14.6MB/s]
Downloading:  85%|########4 | 373M/440M [00:27<00:04, 14.6MB/s]
Downloading:  85%|########4 | 374M/440M [00:27<00:04, 14.2MB/s]
Downloading:  85%|########5 | 376M/440M [00:28<00:04, 14.2MB/s]
Downloading:  86%|########5 | 377M/440M [00:28<00:04, 14.4MB/s]
Downloading:  86%|########5 | 379M/440M [00:28<00:04, 14.2MB/s]
Downloading:  86%|########6 | 380M/440M [00:28<00:04, 14.2MB/s]
Downloading:  87%|########6 | 381M/440M [00:28<00:04, 14.2MB/s]
Downloading:  87%|########6 | 383M/440M [00:28<00:03, 14.4MB/s]
Downloading:  87%|########7 | 384M/440M [00:28<00:03, 14.4MB/s]
Downloading:  88%|########7 | 386M/440M [00:28<00:03, 14.2MB/s]
Downloading:  88%|########7 | 387M/440M [00:28<00:03, 14.2MB/s]
Downloading:  88%|########8 | 389M/440M [00:28<00:03, 14.1MB/s]
Downloading:  89%|########8 | 390M/440M [00:29<00:03, 14.0MB/s]
Downloading:  89%|########8 | 392M/440M [00:29<00:03, 14.0MB/s]
Downloading:  89%|########9 | 393M/440M [00:29<00:03, 14.2MB/s]
Downloading:  90%|########9 | 394M/440M [00:29<00:03, 14.2MB/s]
Downloading:  90%|########9 | 396M/440M [00:29<00:03, 14.2MB/s]
Downloading:  90%|######### | 397M/440M [00:29<00:03, 14.3MB/s]
Downloading:  91%|######### | 399M/440M [00:29<00:02, 14.0MB/s]
Downloading:  91%|######### | 400M/440M [00:29<00:02, 13.8MB/s]
Downloading:  91%|#########1| 402M/440M [00:29<00:02, 14.0MB/s]
Downloading:  92%|#########1| 403M/440M [00:29<00:02, 13.9MB/s]
Downloading:  92%|#########1| 404M/440M [00:30<00:02, 13.9MB/s]
Downloading:  92%|#########2| 406M/440M [00:30<00:02, 14.2MB/s]
Downloading:  92%|#########2| 407M/440M [00:30<00:02, 14.0MB/s]
Downloading:  93%|#########2| 409M/440M [00:30<00:02, 14.0MB/s]
Downloading:  93%|#########3| 410M/440M [00:30<00:02, 14.2MB/s]
Downloading:  93%|#########3| 412M/440M [00:30<00:02, 14.0MB/s]
Downloading:  94%|#########3| 413M/440M [00:30<00:02, 13.4MB/s]
Downloading:  94%|#########4| 415M/440M [00:30<00:01, 13.7MB/s]
Downloading:  94%|#########4| 416M/440M [00:30<00:01, 14.0MB/s]
Downloading:  95%|#########4| 418M/440M [00:31<00:01, 14.5MB/s]
Downloading:  95%|#########5| 419M/440M [00:31<00:01, 14.4MB/s]
Downloading:  95%|#########5| 421M/440M [00:31<00:01, 14.6MB/s]
Downloading:  96%|#########5| 422M/440M [00:31<00:01, 14.1MB/s]
Downloading:  96%|#########6| 424M/440M [00:31<00:01, 14.3MB/s]
Downloading:  97%|#########6| 425M/440M [00:31<00:01, 14.5MB/s]
Downloading:  97%|#########6| 427M/440M [00:31<00:00, 14.2MB/s]
Downloading:  97%|#########7| 428M/440M [00:31<00:00, 14.0MB/s]
Downloading:  97%|#########7| 429M/440M [00:31<00:00, 14.1MB/s]
Downloading:  98%|#########7| 431M/440M [00:31<00:00, 14.0MB/s]
Downloading:  98%|#########8| 432M/440M [00:32<00:00, 14.0MB/s]
Downloading:  98%|#########8| 434M/440M [00:32<00:00, 13.9MB/s]
Downloading:  99%|#########8| 435M/440M [00:32<00:00, 13.7MB/s]
Downloading:  99%|#########9| 436M/440M [00:32<00:00, 13.6MB/s]
Downloading:  99%|#########9| 438M/440M [00:32<00:00, 13.7MB/s]
Downloading: 100%|#########9| 439M/440M [00:32<00:00, 13.4MB/s]
Downloading: 100%|##########| 440M/440M [00:32<00:00, 13.5MB/s]

class config:
    def __init__(self):
        config.num_classes = 2  # binary
        config.output_channel = 16  # number of kernels
        config.embedding_dim = 768  # embed dimension
        config.dropout = 0.4  # dropout value
        return None


# create config
config1 = config()

# instantiate CNN
kim_model = KimCNN(config1).cuda()

# set loss
criterion = nn.CrossEntropyLoss()

# set number of epochs
epochs = 4

# only train the last 4 layers; saves ~600mb of GPU mem and 30s of compute
BERT_parameters = []
allowed_layers = [11, 10, 9, 8]

for name, param in model.named_parameters():
    for layer_num in allowed_layers:
        layer_num = str(layer_num)
        if ".{}.".format(layer_num) in name:
            BERT_parameters.append(param)

# set optimizer
optimizer = AdamW([{'params': BERT_parameters, 'lr': 2e-5}], weight_decay=1.0)


# set LR scheduler
total_steps = len(train_dataloader) * epochs
scheduler = get_linear_schedule_with_warmup(optimizer,
                                            num_warmup_steps=0,
                                            num_training_steps=total_steps)

# create gradient scaler for mixed precision
scaler = GradScaler()

1.9 Train

Finally we are ready to train. Two containers are created to store the results of each training and validation epoch

# create training result storage
training_stats = []
valid_stats = []
best_valid_loss = float('inf')

# for each epoch
for epoch in range(epochs):
    # train
    train(model, train_dataloader, optimizer)
    # validate
    validating(model, valid_dataloader)
    # check validation loss
    if valid_stats[epoch]['Val Loss'] < best_valid_loss:
        best_valid_loss = valid_stats[epoch]['Val Loss']
        # save best model for use later
        torch.save(model.state_dict(), 'bert-cnn-model1.pt')  # torch save
        model_to_save = model.module if hasattr(model, 'module') else model
        model_to_save.save_pretrained('./model_save/bert-cnn/')  # transformers save
        tokenizer.save_pretrained('./model_save/bert-cnn/')  # transformers save

## 
## ======== Epoch 1 / 4 ========
## Training...
##   Batch    40  of  1,005.
##   Batch    80  of  1,005.
##   Batch   120  of  1,005.
##   Batch   160  of  1,005.
##   Batch   200  of  1,005.
##   Batch   240  of  1,005.
##   Batch   280  of  1,005.
##   Batch   320  of  1,005.
##   Batch   360  of  1,005.
##   Batch   400  of  1,005.
##   Batch   440  of  1,005.
##   Batch   480  of  1,005.
##   Batch   520  of  1,005.
##   Batch   560  of  1,005.
##   Batch   600  of  1,005.
##   Batch   640  of  1,005.
##   Batch   680  of  1,005.
##   Batch   720  of  1,005.
##   Batch   760  of  1,005.
##   Batch   800  of  1,005.
##   Batch   840  of  1,005.
##   Batch   880  of  1,005.
##   Batch   920  of  1,005.
##   Batch   960  of  1,005.
##   Batch 1,000  of  1,005.
## 
## summary results
## epoch | trn loss | trn f1 | trn time 
##     1 | 0.39960 | 0.83102 | 0:12:21
## 
## Running Validation...
## 
## summary results
## epoch | val loss | val f1 | val time
##     1 | 0.27511 | 0.84472 | 0:00:17
## ('./model_save/bert-cnn/vocab.txt', './model_save/bert-cnn/special_tokens_map.json', './model_save/bert-cnn/added_tokens.json')
## 
## ======== Epoch 2 / 4 ========
## Training...
##   Batch    40  of  1,005.
##   Batch    80  of  1,005.
##   Batch   120  of  1,005.
##   Batch   160  of  1,005.
##   Batch   200  of  1,005.
##   Batch   240  of  1,005.
##   Batch   280  of  1,005.
##   Batch   320  of  1,005.
##   Batch   360  of  1,005.
##   Batch   400  of  1,005.
##   Batch   440  of  1,005.
##   Batch   480  of  1,005.
##   Batch   520  of  1,005.
##   Batch   560  of  1,005.
##   Batch   600  of  1,005.
##   Batch   640  of  1,005.
##   Batch   680  of  1,005.
##   Batch   720  of  1,005.
##   Batch   760  of  1,005.
##   Batch   800  of  1,005.
##   Batch   840  of  1,005.
##   Batch   880  of  1,005.
##   Batch   920  of  1,005.
##   Batch   960  of  1,005.
##   Batch 1,000  of  1,005.
## 
## summary results
## epoch | trn loss | trn f1 | trn time 
##     2 | 0.29237 | 0.88316 | 0:12:24
## 
## Running Validation...
## 
## summary results
## epoch | val loss | val f1 | val time
##     2 | 0.30805 | 0.82737 | 0:00:17
## 
## ======== Epoch 3 / 4 ========
## Training...
##   Batch    40  of  1,005.
##   Batch    80  of  1,005.
##   Batch   120  of  1,005.
##   Batch   160  of  1,005.
##   Batch   200  of  1,005.
##   Batch   240  of  1,005.
##   Batch   280  of  1,005.
##   Batch   320  of  1,005.
##   Batch   360  of  1,005.
##   Batch   400  of  1,005.
##   Batch   440  of  1,005.
##   Batch   480  of  1,005.
##   Batch   520  of  1,005.
##   Batch   560  of  1,005.
##   Batch   600  of  1,005.
##   Batch   640  of  1,005.
##   Batch   680  of  1,005.
##   Batch   720  of  1,005.
##   Batch   760  of  1,005.
##   Batch   800  of  1,005.
##   Batch   840  of  1,005.
##   Batch   880  of  1,005.
##   Batch   920  of  1,005.
##   Batch   960  of  1,005.
##   Batch 1,000  of  1,005.
## 
## summary results
## epoch | trn loss | trn f1 | trn time 
##     3 | 0.27105 | 0.89014 | 0:12:11
## 
## Running Validation...
## 
## summary results
## epoch | val loss | val f1 | val time
##     3 | 0.30127 | 0.84481 | 0:00:17
## 
## ======== Epoch 4 / 4 ========
## Training...
##   Batch    40  of  1,005.
##   Batch    80  of  1,005.
##   Batch   120  of  1,005.
##   Batch   160  of  1,005.
##   Batch   200  of  1,005.
##   Batch   240  of  1,005.
##   Batch   280  of  1,005.
##   Batch   320  of  1,005.
##   Batch   360  of  1,005.
##   Batch   400  of  1,005.
##   Batch   440  of  1,005.
##   Batch   480  of  1,005.
##   Batch   520  of  1,005.
##   Batch   560  of  1,005.
##   Batch   600  of  1,005.
##   Batch   640  of  1,005.
##   Batch   680  of  1,005.
##   Batch   720  of  1,005.
##   Batch   760  of  1,005.
##   Batch   800  of  1,005.
##   Batch   840  of  1,005.
##   Batch   880  of  1,005.
##   Batch   920  of  1,005.
##   Batch   960  of  1,005.
##   Batch 1,000  of  1,005.
## 
## summary results
## epoch | trn loss | trn f1 | trn time 
##     4 | 0.25257 | 0.89834 | 0:11:23
## 
## Running Validation...
## 
## summary results
## epoch | val loss | val f1 | val time
##     4 | 0.27479 | 0.85058 | 0:00:16
## ('./model_save/bert-cnn/vocab.txt', './model_save/bert-cnn/special_tokens_map.json', './model_save/bert-cnn/added_tokens.json')
## 
## C:\Users\Andrew\Anaconda3\envs\my_ml\lib\site-packages\sklearn\metrics\classification.py:1437: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
##   'precision', 'predicted', average, warn_for)
## C:\Users\Andrew\Anaconda3\envs\my_ml\lib\site-packages\sklearn\metrics\classification.py:1437: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.
##   'precision', 'predicted', average, warn_for)

1.10 Show Results

After training, we organize the results nicely in pandas.

# organize results
pd.set_option('precision', 3)
df_train_stats = pd.DataFrame(data=training_stats)
df_valid_stats = pd.DataFrame(data=valid_stats)
df_stats = pd.concat([df_train_stats, df_valid_stats], axis=1)
df_stats.insert(0, 'Epoch', range(1, len(df_stats)+1))
df_stats = df_stats.set_index('Epoch')
df_stats

##        Train Loss  Train F1 Train Time  ...  Val recall  Val F1  Val Time
## Epoch                                   ...                              
## 1           0.400     0.831    0:12:21  ...       0.861   0.845   0:00:17
## 2           0.292     0.883    0:12:24  ...       0.850   0.827   0:00:17
## 3           0.271     0.890    0:12:11  ...       0.862   0.845   0:00:17
## 4           0.253     0.898    0:11:23  ...       0.867   0.851   0:00:16
## 
## [4 rows x 9 columns]

1.11 Test the Model

And lastly we run our final test:

# test the model
test_stats = []
model.load_state_dict(torch.load('bert-cnn-model1.pt'))

## <All keys matched successfully>

testing(model, test_dataloader)

## 
## Running Testing...
## 
## summary results
## epoch | test loss | test f1 | test time
##     4 | 0.31259 | 0.83967 | 0:00:16
## 
## C:\Users\Andrew\Anaconda3\envs\my_ml\lib\site-packages\sklearn\metrics\classification.py:1437: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
##   'precision', 'predicted', average, warn_for)
## C:\Users\Andrew\Anaconda3\envs\my_ml\lib\site-packages\sklearn\metrics\classification.py:1437: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples.
##   'precision', 'predicted', average, warn_for)

df_test_stats = pd.DataFrame(data=test_stats)
df_test_stats

##    Test Loss  Test Accur.  Test precision  Test recall  Test F1 Test Time
## 0      0.313        0.854           0.862        0.854     0.84   0:00:16

The results show a slight improvement over our standard BERT model at the cost of 4-5x the training time.

Classification: BERT-CNN Ensemble

Andrew Fogarty

7/18/2020