Yunzhong from Concave Temple
qubit report | public account QbitAI
In natural language processing, sentiment analysis is generally Refers to judging the emotional state expressed by a piece of text. Among them, a piece of text can be a sentence, a paragraph or a document. Emotional states can be of two types, such as (positive, negative), (happy, sad); it can also be three types, such as (positive, negative, neutral) and so on.soulmate connection
Sentiment analysis has a wide range of application scenarios, such as dividing the comments posted by users on shopping websites (Amazon, Tmall, Taobao, etc.), travel websites, and movie review websites into positive and negative comments; Analyze users overall experience with a product, capture user reviews of products and perform sentiment analysis, etc.soulmate connection
Today is May 20th, PaddlePaddle teaches you to use sentiment analysis algorithm to understand the goddess mind.soulmate connection
In the following, we will take sentiment analysis as an example to introduce the use of deep learning for end-to-end short text classification, and use PaddlePaddle to complete all related experiments.soulmate connection
Project address:soulmate connection
https://github.com/PaddlePaddle/book/blob/develop/06.understand_sentiment/README.cn.md
< strong>Application backgroundsoulmate connection
In natural language processing,sentiment analysissoulmate connectionis a typical text classification problem, that is, the text that needs to be subjected to sentiment analysis is divided into its category . Text classification involves two issues,text representationandclassification methods.
Before the deep learning method appeared, the mainstream text representation methods were BOW (bag of words), topic model, etc.; classification methods were SVM (support vector machine), LR (logistic regression) )and many more.soulmate connection
For a piece of text, BOW means that the word order, grammar and syntax will be ignored, and the text will only be regarded as a set of words, so the BOW method cannot fully represent the semantic information of the text.soulmate connection
For example, the sentences This movie sucks and a bland, empty, and meaningless work have high semantic similarity in sentiment analysis, but their BOW representation has a similarity of 0. For another example, a sentence with an empty sentence and no connotation has a high BOW similarity with a non-empty and connotative work, but in fact their meanings are very different.soulmate connection
In this tutorial, the deep learning model we will introduce overcomes the above shortcomings of BOW representation. It maps text to a low-dimensional semantic space on the basis of considering word order, and usesEnd-to-end (end-to-end) method for text representation and classification, its performance is significantly improved compared to traditional methods [1].soulmate connection
Model overviewsoulmate connection
The text representation models used in this tutorial are Convolutional Neural Networks and Recurrent Neural Networks and its extensions. These models are introduced in turn.soulmate connection
Introduction to Text Convolutional Neural Networks (CNN)soulmate connection
For convolutional neural networks, first use convolution to process the input word vector sequence to generate A feature map, which uses the max pooling over time operation on the feature map to obtain the features of the entire sentence corresponding to this convolution kernel. Finally, the features obtained by all the convolution kernels are spliced. It is a fixed-length vector representation of the text. For text classification problems, connecting it to softmax builds a complete model.soulmate connection
In practical applications, we will use multiple convolution kernels to process sentences, and convolution kernels with the same window size are stacked to form a matrix, which can complete the operation more efficiently. In addition, we can also use convolution kernels with different window sizes to process sentences. Figure 1 shows the convolutional neural network text classification model, and different colors represent convolution kernel operations of different sizes.soulmate connection
For general short text classification problems, the simple text convolutional network described above can achieve very high Correct rate [1]. If you want to get a more abstract and advanced text feature representation, you can build a deep text convolutional neural network [2, 3].soulmate connection
Recurrent Neural Network (RNN)soulmate connection
Recurrent Neural Network is a powerful tool for accurate modeling of sequence data. In fact, the theoretical computing power of RNNs is Turing-complete [4]. Natural language is a typical sequence data (word sequence). In recent years, recurrent neural networks and their variants (such as long short term memory [5], etc.) have been used in many fields of natural language processing, such as language modeling, syntax parsing , semantic role annotation (or general sequence annotation), semantic representation, image and text generation, dialogue, machine translation and other tasks have performed well and even become the best method at present.soulmate connection
The cyclic neural network is expanded by time as shown in Figure 2: At time t, the network reads the t-th input X< sub>t (vector representation) and the state value of the hidden layer at the previous moment ht-1soulmate connection(vector representation, h0is generally initialized to a 0 vector) , calculate the state value htof the hidden layer at this moment, and repeat this step until all inputs are read. If the function represented by the recurrent neural network is denoted as f, its formula can be expressed as:
where Wxh is The matrix parameter input to the hidden layer, Whh is the matrix parameter from the hidden layer to the hidden layer, bh is the bias vector (bias) parameter of the hidden layer, and σ is the sigmoid function.soulmate connection
When processing natural language, the word (one-hot representation) is generally first mapped to its word vector representation, and then used as the input Xt of the recurrent neural network at each moment. In addition, other layers can be connected on the hidden layer of the recurrent neural network according to the actual needs. For example, you can connect the output of the hidden layer of one RNN to the input of the next RNN to build a deep or stacked RNN, or extract the state of the hidden layer at the last moment as a sentence representation to use a classification model, etc. .soulmate connection
Long Short-Term Memory (LSTM)soulmate connection
For longer sequence data, gradient disappearance or explosion is easy to occur during the training process of RNN[ 6]. LSTM can solve this problem. Compared with a simple recurrent neural network, LSTM adds a memory unit c, an input gate i, a forgetting gate f and an output gate o. The combination of these gates and memory units greatly improves the ability of recurrent neural networks to process long sequences of data. If the function represented by the LSTM-based recurrent neural network is denoted as F, its formula is:soulmate connection
F is composed of the following formulas [7]:soulmate connection
Among them, it, ft, ct, ot represent input respectively The vector values of gates, forget gates, memory units and output gates, W and b with superscripts are model parameters, tanh is a hyperbolic tangent function, and ⊙ represents an elementwise multiplication operation. The input gate controls the strength of the new input entering the memory cell c, the forget gate controls the strength of the memory cell to maintain the value at the previous moment, and the output gate controls the strength of the output memory cell. The calculation methods of the three gates are similar, but they have completely different parameters, and they control the memory unit c in different ways, as shown in Figure 3:soulmate connection
LSTM enhances its ability to deal with long-range dependencies by adding memory and control gates to simple recurrent neural networks. An improvement on a similar principle is the Gated Recurrent Unit (GRU) [8], which has a more compact design. Although these improvements are different, their macro description is the same as that of a simple recurrent neural network (as shown in Figure 2), that is, the hidden state changes according to the current input and the hidden state of the previous moment, and the process is continuously cycled Until input is processed:soulmate connection
Where Recrurent can represent a simple recurrent neural network, GRU or LSTM.soulmate connection
Stacked Bidirectional LSTM (Stacked Bidirectional LSTM)soulmate connection
For a normal sequence of recurrent neural networks, ht contains the input information before time t, which is information above. Similarly, to get the following information, we can use a recurrent neural network in the reverse direction (processing the input in reverse order). Combined with the method of building a deep recurrent neural network (deep neural networks can often get more abstract and advanced feature representation), we can build a more powerful LSTM-based stack bidirectional recurrent neural network [9], to perform time series data analysis. modeling.soulmate connection
As shown in Figure 4 (taking three layers as an example), the odd-numbered layer LSTM is forward, the even-numbered layer LSTM is reverse, and the LSTM of the higher layer uses the information of the lower layer of LSTM and all previous layers as input , the fixed-length vector representation of the text can be obtained by using the maximum pooling in the temporal dimension on the top-level LSTM sequence (this representation fully integrates the contextual information of the text and abstracts the text deeply), and finally we represent the text Connect to softmax to build a classification model.soulmate connection
PaddlePaddle-based combatsoulmate connection
PaddlePaddle introductionsoulmate connection
strong>PaddlePaddle (paddlepaddle.org) is a deep learning framework developed by Baidu. In addition to the core framework, PaddlePaddle also provides rich tool components. Officially open sourced a number of industrial-grade application models, covering natural language processing, computer vision, recommendation engines and other fields, and opened up a number of leading pre-trained Chinese models. At the Deep Learning Developer Summit on April 23, PaddlePaddle released a series of new features and application cases.soulmate connection
Dataset introductionsoulmate connection
We take the IMDB sentiment analysis dataset as an example to introduce. The training and test sets of the IMDB dataset each contain 25,000 annotated movie reviews. Among them, the score of negative comments is less than or equal to 4, the score of positive comments is greater than or equal to 7, and the full score is 10 points.soulmate connection
aclImdb |- test |-- neg |-- pos |- train |-- neg |-- pos
PaddlePaddle implements imdb dataset in dataset/imdb.py The automatic download and read, and provides the read dictionary, training data, test data and other APIs.soulmate connection
Configuring the Modelsoulmate connection
In this example, we implement two text classification algorithms, a text convolutional neural network, and a stacked bidirectional LSTM. We first introduce the libraries to be used and define global variables:soulmate connection
manifesting soulmate
from __future__ import print_function import paddle import paddle.fluid as fluid import numpy as np import sys import math CLASS_DIM = 2 Number of categories for sentiment classification EMB_DIM = 128 word vector dimension HID_DIM = 512 hidden layer dimension STACKED_NUM = 3 LSTM bidirectional stack layers BATCH_SIZE = 128 batch size
text convolutional neural networksoulmate connection
We build the neural network convolution_net, the sample code is as follows. It should be noted that:fluid.nets.sequence_conv_pool contains two operations of convolution and pooling layers.soulmate connection
Text Convolutional Neural Network def convolution_net(data, input_dim, class_dim, emb_dim, hid_dim): emb = fluid.layers.embedding( input=data, size=[input_dim, emb_dim], is_sparse=True) conv_3 = fluid.nets.sequence_conv_pool( input=emb, num_filters=hid_dim, filter_size=3, act="tanh", pool_type="sqrt") conv_4 = fluid.nets.sequence_conv_pool( input=emb, num_filters= hid_dim, filter_size=4, act="tanh", pool_type="sqrt") prediction = fluid.layers.fc( input=[conv_3, conv_4], size=class_dim, act="softmax") return prediction
The input input_dim of the network represents the size of the dictionary, and class_dim represents the number of categories. Here, we implement the convolution and pooling operations using the sequence_conv_pool API.soulmate connection
Stacked Bidirectional LSTMsoulmate connection
The code snippet of the stacked bidirectional neural network stacked_lstm_net is as follows:soulmate connection
Stacked Bidirectional LSTM def stacked_lstm_net( data, input_dim, class_dim, emb_dim, hid_dim, stacked_num): assert stacked_num % 2 == 1 Calculate word vector emb = fluid.layers.embedding( input=data, size=[input_dim, emb_dim], is_sparse=True) first layer Stack fully connected layer fc1 = fluid.layers.fc(input=emb, size=hid_dim) lstm layer lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=hid_dim) inputs = [fc1, lstm1] all the rest Stack structure for i in range(2, stacked_num + 1): fc = fluid.layers.fc(input=inputs, size=hid_dim) lstm, cell = fluid.layers.dynamic_lstm( input=fc, size=hid_dim, is_reverse= (i % 2) == 0) inputs = [fc, lstm] pooling layer fc_last = fluid.layers.sequence_pool(input=inputs[0], pool_type=max) lstm_last = fluid.layers.sequence_pool(input=inputs[ 1], pool_type=max) fully connected layer, softmax prediction prediction = fluid.layers.fc( input=[fc_last, lstm_last], size=class_dim, act=softmax) return prediction
The above stack type Bidirectional LSTMs abstract high-level features and map them to vectors of the same size as the number of classification categories. The softmax activation function of the last fully connected layer is used to calculate the probability that the classification belongs to a certain class.soulmate connection
To reiterate, here we can call any network structure of convolution_net or stacked_lstm_net for training and learning. Lets take convolution_net as an example.soulmate connection
Next we define the prediction program (inference_program). The predictor uses convolution_net to make predictions on the input of fluid.layer.data.soulmate connection
def inference_program(word_dict): data = fluid.layers.data( name="words", shape=[1], dtype="int64", lod_level=1) dict_dim = len(word_dict) net = convolution_net(data, dict_dim, CLASS_DIM, EMB_DIM, HID_DIM) net = stacked_lstm_net(data, dict_dim, CLASS_DIM, EMB_DIM, HID_DIM, STACKED_NUM) return net
We define training_program here. It uses the result returned from inference_program to calculate the error. We also define the optimization function optimizer_func.soulmate connection
Because it is supervised learning, the labels of the training set are also defined in fluid.layers.data. During training, cross-entropy is used in fluid.layer.cross_entropy as the loss function.soulmate connection
During testing, the classifier calculates the probability of each output. The first value returned is specified as cost.soulmate connection
def train_program(prediction): label = fluid.layers.data(name="label", shape=[1], dtype="int64") cost = fluid.layers.cross_entropy(input=prediction , label=label) avg_cost = fluid.layers.mean(cost) accuracy = fluid.layers.accuracy(input=prediction, label=label) return [avg_cost, accuracy] Returns the average cost and accuracy acc optimization function def optimizer_func() : return fluid.optimizer.Adagrad(learning_rate=0.002)
Training modelsoulmate connection
Define training environmentsoulmate connection
< p>Define whether your training is on CPU or GPU:use_cuda = False train on cpu place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
Define data providerssoulmate connection
The next step is to define data providers for training and testing. The provider reads in a data of size BATCH_SIZE.paddle.dataset.imdb.word_dict will provide a data of size BATCH_SIZE after reordering, and the size of reordering is the cache size buf_size.soulmate connection
Note: It may take several minutes to read IMDB data, please be patient.soulmate connection
print("Loading IMDB word dict....") word_dict = paddle.dataset.imdb.word_dict() print ("Reading training data....") train_reader = paddle.batch( paddle .reader.shuffle( paddle.dataset.imdb.train(word_dict), buf_size=25000), batch_size=BATCH_SIZE) print("Reading testing data....") test_reader = paddle.batch( paddle.dataset.imdb.test (word_dict), batch_size=BATCH_SIZE) feed_order = [words, label] pass_num = 1
word_dict is a dictionary sequence, which is the correspondence between words and labels, run the next line to see the specific content:soulmate connection
word_dict
Each line is a correspondence such as (limited: 1726), which indicates that the label corresponding to the word limited is 1726.soulmate connection
Constructing the trainersoulmate connection
The trainer requires a training program and a training optimization function.soulmate connection
main_program = fluid.default_main_program() star_program = fluid.default_startup_program() prediction = inference_program(word_dict) train_func_outputs = train_program(prediction) avg_cost = train_func_outputs[0] test_program = main_program.clone(for_test=True) sgd_optimizer = optimizer_func() sgd_optimizer.minimize(avg_cost) exe = fluid.Executor(place)
This function is used to calculate the results of the training model on the test datasetsoulmate connection
def train_test( program, reader): count = 0 feed_var_list = [ program.global_block().var(var_name) for var_name in feed_order ] feeder_test = fluid.DataFeeder(feed_list=feed_var_list, place=place) test_exe = fluid.Executor(place) accumulated = len([avg_cost, accuracy]) * [0] for test_data in reader(): avg_cost_np = test_exe.run( program=program, feed=feeder_test.feed(test_data), fetch_list=[avg_cost, accuracy]) accumulated = [ x [0] + x[1][0] for x in zip(accumulated, avg_cost_np) ] count += 1 return [x / count for x in accumulated]
Provide data and build main Training loopsoulmate connection
feed_order Used to define the mapping relationship between each piece of generated data and fluid.layers.data. For example, the first column of data generated by imdb.train corresponds to the feature of words.soulmate connection
Specify the directory path to save the parameters params_dirname = "understand_sentiment_conv.inference.model" feed_order = [words, label] pass_num = 1 number of rounds of the training loop program main loop part def train_loop(): boot on Feed_var_list_loop = [ main_program.global_block().var(var_name) for var_name in feed_order ] feeder = fluid.DataFeeder(feed_list=feed_var_list_loop,place=place) exe.run(star_program) training loop for epoch_id in range( pass_num): for step_id, data in enumerate(train_reader()): run trainer metrics = exe.run(main_program, feed=feeder.feed(data), fetch_list=[var.name for var in train_func_outputs]) test result print ("step: {0}, Metrics {1}".format( step_id, list(map(np.array, metrics)))) if (step_id + 1) % 10 == 0: avg_cost_test, acc_test = train_test(test_program , test_reader) print(Step {0}, Test Loss {1:0.2}, Acc {2:0.2}.format( step_id, avg_cost_test, acc_test)) print("Step {0}, Epoch {1} Metrics {2} ".format( step_id, epoch_id, list(map(np.array, metrics)))) if math.isnan(float(metrics[0])): sys.exit("got Na N loss, training failed.") if params_dirname is not None: fluid.io.save_inference_model(params_dirname, ["words"], prediction, exe) save model train_loop()
training process processingsoulmate connection
We print the output of each step in the main training loop, so we can observe the training situation.soulmate connection
Start trainingsoulmate connection
Finally, we start the main training loop to start training. The training time is long. If you want to return results faster, you can shorten the training time at the cost of reducing the accuracy by adjusting the loss value range or the number of training steps.soulmate connection
train_loop(fluid.default_main_program())
Apply Modelsoulmate connection
Build Predictorsoulmate connection
p>Same as the training process, we need to create a prediction process and use the trained model and parameters to make predictions. params_dirname is used to store the parameters in the training process.soulmate connection
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() exe = fluid.Executor(place) inference_scope = fluid.core.Scope()
Generate test input data
To make predictions, we arbitrarily select 3 reviews. Feel free to choose 3 of your favorites. We map each word in the review to an id in word_dict. If the word does not exist in the dictionary, it is set to unknown. Then we use create_lod_tensor to create the level of detail tensorsoulmate connection
reviews_str = [ read the book forget the movie, this is a great movie, this is very bad ] reviews = [c.split() for c in reviews_str] UNK = word_dict[] lod = [] for c in reviews: lod.append([word_dict.get(words, UNK) for words in c]) base_shape = [[len(c) for c in lod]] tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
Apply the model and make predictions
Now we can positive each review or negative predictions.soulmate connection
with fluid.scope_guard(inference_scope): [inferencer, feed_target_names, fetch_targets] = fluid.io.load_inference_model(params_dirname, exe) reviews_str = [ read the book forget the moive,this is a great moive , this is very bad ] reviews = [c.split() for c in reviews_str] UNK = word_dict[] lod = [] for c in reviews: lod.append([np.int64(word_dict.get( words, UNK)) for words in c]) base_shape = [[len(c) for c in lod]] tensor_words = fluid.create_lod_tensor(lod, base_shape,place) assert feed_target_names[0] == "words" results = exe .run(inferencer, feed={feed_target_names[0]: tensor_words}, fetch_list=fetch_targets, return_numpy=False) np_data = np.array(results[0]) for i, r in enumerate(np_data): print("Predict probability of ", r[0], " to be positive and ", r[1], " to be negative for review \", reviews_str[i], "\")
Interested The small partners of PaddlePaddle can read other related documents on the official website of PaddlePaddle:http://www.paddlepaddle.org/
References:
- Kim Y. Convolutional ne ural networks for sentence classification[J]. arXiv preprint arXiv:1408.5882, 2014.
- Kalchbrenner N, Grefenstette E, Blunsom PA convolutional neural network for modelling sentences[J]. arXiv preprint arXiv:1404.2188, 2014.
- Yann N.Dauphin, et al.Language Modeling with Gated Convolutional Networks[J] arXiv preprint arXiv:1612.08083, 2016.
- Siegelmann HT, Sontag E D. On the computational power of neural nets[C]//Proceedings of the fifth annual workshop on Computational learning theory.ACM, 1992: 440-449.
- Hochreiter S, Schmidhuber J.Long short-term memory[J].Neural computation, 1997, 9(8): 1735-1780.
- Bengio Y, Simard P, Frasconi P.Learning long-term dependencies with gradient descent is difficult[J].IEEE transactions on neural networks, 1994 , 5(2): 157-166.
- Graves A.Generating sequences with recurrent neural networks[J]. arXiv preprint arXiv:1308.0850, 2013.
- Cho K, Van Merriënboer B, Gulcehre C, et al.Learning phrase representations usin g RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv:1406.1078, 2014.
- Zhou J, Xu W.End-to-end learning of semantic role labeling using recurrent neural networks[C ]//Proceedings of the Annual Meeting of the Association for Computational Linguistics.2015.
—End—soulmate connection
Sincerely recruiting
Qubits are Recruiting editors/reporters, working in Zhongguancun, Beijing. We look forward to having talented and enthusiastic students to join us! For relevant details, please reply to the word “Recruitment” in the dialogue interface of the Qubit Public Account (QbitAI).soulmate connection
Qubit QbitAI · Toutiao Signing Author
վᴗ ի Track new trends in AI technology and products
do soulmates look alike