520giftpackagesentimentanalysisalgorithmfromprincipletoPaddlePaddleactualcombatfullsolutiontelepathic connection with soulmate

Yunzhong from Concave Temple

qubit report | public account QbitAI

In natural language processing, sentiment analysis is generally Refers to judging the emotional state expressed by a piece of text. Among them, a piece of text can be a sentence, a paragraph or a document. Emotional states can be of two types, such as (positive, negative), (happy, sad); it can also be three types, such as (positive, negative, neutral) and so on.soulmate connection

Sentiment analysis has a wide range of application scenarios, such as dividing the comments posted by users on shopping websites (Amazon, Tmall, Taobao, etc.), travel websites, and movie review websites into positive and negative comments; Analyze users overall experience with a product, capture user reviews of products and perform sentiment analysis, etc.soulmate connection

Today is May 20th, PaddlePaddle teaches you to use sentiment analysis algorithm to understand the goddess mind.soulmate connection

In the following, we will take sentiment analysis as an example to introduce the use of deep learning for end-to-end short text classification, and use PaddlePaddle to complete all related experiments.soulmate connection

Project address:soulmate connection


< strong>Application backgroundsoulmate connection

In natural language processing,sentiment analysissoulmate connectionis a typical text classification problem, that is, the text that needs to be subjected to sentiment analysis is divided into its category . Text classification involves two issues,text representationandclassification methods.

Before the deep learning method appeared, the mainstream text representation methods were BOW (bag of words), topic model, etc.; classification methods were SVM (support vector machine), LR (logistic regression) )and many more.soulmate connection

For a piece of text, BOW means that the word order, grammar and syntax will be ignored, and the text will only be regarded as a set of words, so the BOW method cannot fully represent the semantic information of the text.soulmate connection

For example, the sentences This movie sucks and a bland, empty, and meaningless work have high semantic similarity in sentiment analysis, but their BOW representation has a similarity of 0. For another example, a sentence with an empty sentence and no connotation has a high BOW similarity with a non-empty and connotative work, but in fact their meanings are very different.soulmate connection

In this tutorial, the deep learning model we will introduce overcomes the above shortcomings of BOW representation. It maps text to a low-dimensional semantic space on the basis of considering word order, and usesEnd-to-end (end-to-end) method for text representation and classification, its performance is significantly improved compared to traditional methods [1].soulmate connection

Model overviewsoulmate connection

The text representation models used in this tutorial are Convolutional Neural Networks and Recurrent Neural Networks and its extensions. These models are introduced in turn.soulmate connection

Introduction to Text Convolutional Neural Networks (CNN)soulmate connection

For convolutional neural networks, first use convolution to process the input word vector sequence to generate A feature map, which uses the max pooling over time operation on the feature map to obtain the features of the entire sentence corresponding to this convolution kernel. Finally, the features obtained by all the convolution kernels are spliced. It is a fixed-length vector representation of the text. For text classification problems, connecting it to softmax builds a complete model.soulmate connection

In practical applications, we will use multiple convolution kernels to process sentences, and convolution kernels with the same window size are stacked to form a matrix, which can complete the operation more efficiently. In addition, we can also use convolution kernels with different window sizes to process sentences. Figure 1 shows the convolutional neural network text classification model, and different colors represent convolution kernel operations of different sizes.soulmate connection

4b7b79b29c944a398bb354290a5b229c~noop.image? iz=58558&from=article

For general short text classification problems, the simple text convolutional network described above can achieve very high Correct rate [1]. If you want to get a more abstract and advanced text feature representation, you can build a deep text convolutional neural network [2, 3].soulmate connection

Recurrent Neural Network (RNN)soulmate connection

Recurrent Neural Network is a powerful tool for accurate modeling of sequence data. In fact, the theoretical computing power of RNNs is Turing-complete [4]. Natural language is a typical sequence data (word sequence). In recent years, recurrent neural networks and their variants (such as long short term memory [5], etc.) have been used in many fields of natural language processing, such as language modeling, syntax parsing , semantic role annotation (or general sequence annotation), semantic representation, image and text generation, dialogue, machine translation and other tasks have performed well and even become the best method at present.soulmate connection

80f541e37cfb4dfba606242012581bf5~noop.image? iz=58558&from=article

The cyclic neural network is expanded by time as shown in Figure 2: At time t, the network reads the t-th input X< sub>t (vector representation) and the state value of the hidden layer at the previous moment ht-1soulmate connection(vector representation, h0is generally initialized to a 0 vector) , calculate the state value htof the hidden layer at this moment, and repeat this step until all inputs are read. If the function represented by the recurrent neural network is denoted as f, its formula can be expressed as:

d89eacde65e94415b3964faeae3fc6b5~ noop.image? iz=58558&from=article

where Wxh is The matrix parameter input to the hidden layer, Whh is the matrix parameter from the hidden layer to the hidden layer, bh is the bias vector (bias) parameter of the hidden layer, and σ is the sigmoid function.soulmate connection

When processing natural language, the word (one-hot representation) is generally first mapped to its word vector representation, and then used as the input Xt of the recurrent neural network at each moment. In addition, other layers can be connected on the hidden layer of the recurrent neural network according to the actual needs. For example, you can connect the output of the hidden layer of one RNN to the input of the next RNN to build a deep or stacked RNN, or extract the state of the hidden layer at the last moment as a sentence representation to use a classification model, etc. .soulmate connection

Long Short-Term Memory (LSTM)soulmate connection

For longer sequence data, gradient disappearance or explosion is easy to occur during the training process of RNN[ 6]. LSTM can solve this problem. Compared with a simple recurrent neural network, LSTM adds a memory unit c, an input gate i, a forgetting gate f and an output gate o. The combination of these gates and memory units greatly improves the ability of recurrent neural networks to process long sequences of data. If the function represented by the LSTM-based recurrent neural network is denoted as F, its formula is:soulmate connection

bff41724324b47dcaf7bca2cadca970d~ noop.image? iz=58558&from=article

F is composed of the following formulas [7]:soulmate connection

598b46fe5b3a4863ba341e56f476fd67~noop.image? iz=58558&from=article

Among them, it, ft, ct, ot represent input respectively The vector values of gates, forget gates, memory units and output gates, W and b with superscripts are model parameters, tanh is a hyperbolic tangent function, and ⊙ represents an elementwise multiplication operation. The input gate controls the strength of the new input entering the memory cell c, the forget gate controls the strength of the memory cell to maintain the value at the previous moment, and the output gate controls the strength of the output memory cell. The calculation methods of the three gates are similar, but they have completely different parameters, and they control the memory unit c in different ways, as shown in Figure 3:soulmate connection

LSTM enhances its ability to deal with long-range dependencies by adding memory and control gates to simple recurrent neural networks. An improvement on a similar principle is the Gated Recurrent Unit (GRU) [8], which has a more compact design. Although these improvements are different, their macro description is the same as that of a simple recurrent neural network (as shown in Figure 2), that is, the hidden state changes according to the current input and the hidden state of the previous moment, and the process is continuously cycled Until input is processed:soulmate connection

b4dcba6bbf814704b476662a610bff34~noop.image? iz=58558&from=article

Where Recrurent can represent a simple recurrent neural network, GRU or LSTM.soulmate connection

Stacked Bidirectional LSTM (Stacked Bidirectional LSTM)soulmate connection

For a normal sequence of recurrent neural networks, ht contains the input information before time t, which is information above. Similarly, to get the following information, we can use a recurrent neural network in the reverse direction (processing the input in reverse order). Combined with the method of building a deep recurrent neural network (deep neural networks can often get more abstract and advanced feature representation), we can build a more powerful LSTM-based stack bidirectional recurrent neural network [9], to perform time series data analysis. modeling.soulmate connection

As shown in Figure 4 (taking three layers as an example), the odd-numbered layer LSTM is forward, the even-numbered layer LSTM is reverse, and the LSTM of the higher layer uses the information of the lower layer of LSTM and all previous layers as input , the fixed-length vector representation of the text can be obtained by using the maximum pooling in the temporal dimension on the top-level LSTM sequence (this representation fully integrates the contextual information of the text and abstracts the text deeply), and finally we represent the text Connect to softmax to build a classification model.soulmate connection

6504244699a64bd398f45bea7aba789c~noop.image? iz=58558&from=article

PaddlePaddle-based combatsoulmate connection

PaddlePaddle introductionsoulmate connection


PaddlePaddle (paddlepaddle.org) is a deep learning framework developed by Baidu. In addition to the core framework, PaddlePaddle also provides rich tool components. Officially open sourced a number of industrial-grade application models, covering natural language processing, computer vision, recommendation engines and other fields, and opened up a number of leading pre-trained Chinese models. At the Deep Learning Developer Summit on April 23, PaddlePaddle released a series of new features and application cases.soulmate connection

Dataset introductionsoulmate connection

We take the IMDB sentiment analysis dataset as an example to introduce. The training and test sets of the IMDB dataset each contain 25,000 annotated movie reviews. Among them, the score of negative comments is less than or equal to 4, the score of positive comments is greater than or equal to 7, and the full score is 10 points.soulmate connection

aclImdb |- test |-- neg |-- pos |- train |-- neg |-- pos

PaddlePaddle implements imdb dataset in dataset/imdb.py The automatic download and read, and provides the read dictionary, training data, test data and other APIs.soulmate connection

Configuring the Modelsoulmate connection

In this example, we implement two text classification algorithms, a text convolutional neural network, and a stacked bidirectional LSTM. We first introduce the libraries to be used and define global variables:soulmate connection

manifesting soulmate

from __future__ import print_function import paddle import paddle.fluid as fluid import numpy as np import sys import math CLASS_DIM = 2 Number of categories for sentiment classification EMB_DIM = 128 word vector dimension HID_DIM = 512 hidden layer dimension STACKED_NUM = 3 LSTM bidirectional stack layers BATCH_SIZE = 128 batch size

text convolutional neural networksoulmate connection

We build the neural network convolution_net, the sample code is as follows. It should be noted that:fluid.nets.sequence_conv_pool contains two operations of convolution and pooling layers.soulmate connection

Text Convolutional Neural Network def convolution_net(data, input_dim, class_dim, emb_dim, hid_dim): emb = fluid.layers.embedding( input=data, size=[input_dim, emb_dim], is_sparse=True) conv_3 = fluid.nets.sequence_conv_pool( input=emb, num_filters=hid_dim, filter_size=3, act="tanh", pool_type="sqrt") conv_4 = fluid.nets.sequence_conv_pool( input=emb, num_filters= hid_dim, filter_size=4, act="tanh", pool_type="sqrt") prediction = fluid.layers.fc( input=[conv_3, conv_4], size=class_dim, act="softmax") return prediction

The input input_dim of the network represents the size of the dictionary, and class_dim represents the number of categories. Here, we implement the convolution and pooling operations using the sequence_conv_pool API.soulmate connection

Stacked Bidirectional LSTMsoulmate connection

The code snippet of the stacked bidirectional neural network stacked_lstm_net is as follows:soulmate connection

Stacked Bidirectional LSTM def stacked_lstm_net( data, input_dim, class_dim, emb_dim, hid_dim, stacked_num): assert stacked_num % 2 == 1 Calculate word vector emb = fluid.layers.embedding( input=data, size=[input_dim, emb_dim], is_sparse=True) first layer Stack fully connected layer fc1 = fluid.layers.fc(input=emb, size=hid_dim) lstm layer lstm1, cell1 = fluid.layers.dynamic_lstm(input=fc1, size=hid_dim) inputs = [fc1, lstm1] all the rest Stack structure for i in range(2, stacked_num + 1): fc = fluid.layers.fc(input=inputs, size=hid_dim) lstm, cell = fluid.layers.dynamic_lstm( input=fc, size=hid_dim, is_reverse= (i % 2) == 0) inputs = [fc, lstm] pooling layer fc_last = fluid.layers.sequence_pool(input=inputs[0], pool_type=max) lstm_last = fluid.layers.sequence_pool(input=inputs[ 1], pool_type=max) fully connected layer, softmax prediction prediction = fluid.layers.fc( input=[fc_last, lstm_last], size=class_dim, act=softmax) return prediction

The above stack type Bidirectional LSTMs abstract high-level features and map them to vectors of the same size as the number of classification categories. The softmax activation function of the last fully connected layer is used to calculate the probability that the classification belongs to a certain class.soulmate connection

To reiterate, here we can call any network structure of convolution_net or stacked_lstm_net for training and learning. Lets take convolution_net as an example.soulmate connection

Next we define the prediction program (inference_program). The predictor uses convolution_net to make predictions on the input of fluid.layer.data.soulmate connection

def inference_program(word_dict): data = fluid.layers.data( name="words", shape=[1], dtype="int64", lod_level=1) dict_dim = len(word_dict) net = convolution_net(data, dict_dim, CLASS_DIM, EMB_DIM, HID_DIM) net = stacked_lstm_net(data, dict_dim, CLASS_DIM, EMB_DIM, HID_DIM, STACKED_NUM) return net

We define training_program here. It uses the result returned from inference_program to calculate the error. We also define the optimization function optimizer_func.soulmate connection

Because it is supervised learning, the labels of the training set are also defined in fluid.layers.data. During training, cross-entropy is used in fluid.layer.cross_entropy as the loss function.soulmate connection

During testing, the classifier calculates the probability of each output. The first value returned is specified as cost.soulmate connection

def train_program(prediction): label = fluid.layers.data(name="label", shape=[1], dtype="int64") cost = fluid.layers.cross_entropy(input=prediction , label=label) avg_cost = fluid.layers.mean(cost) accuracy = fluid.layers.accuracy(input=prediction, label=label) return [avg_cost, accuracy] Returns the average cost and accuracy acc optimization function def optimizer_func() : return fluid.optimizer.Adagrad(learning_rate=0.002)

Training modelsoulmate connection

Define training environmentsoulmate connection

< p>Define whether your training is on CPU or GPU:
use_cuda = False train on cpu place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

Define data providerssoulmate connection

The next step is to define data providers for training and testing. The provider reads in a data of size BATCH_SIZE.paddle.dataset.imdb.word_dict will provide a data of size BATCH_SIZE after reordering, and the size of reordering is the cache size buf_size.soulmate connection

Note: It may take several minutes to read IMDB data, please be patient.soulmate connection

print("Loading IMDB word dict....") word_dict = paddle.dataset.imdb.word_dict() print ("Reading training data....") train_reader = paddle.batch( paddle .reader.shuffle( paddle.dataset.imdb.train(word_dict), buf_size=25000), batch_size=BATCH_SIZE) print("Reading testing data....") test_reader = paddle.batch( paddle.dataset.imdb.test (word_dict), batch_size=BATCH_SIZE) feed_order = [words, label] pass_num = 1

word_dict is a dictionary sequence, which is the correspondence between words and labels, run the next line to see the specific content:soulmate connection


Each line is a correspondence such as (limited: 1726), which indicates that the label corresponding to the word limited is 1726.soulmate connection

Constructing the trainersoulmate connection

The trainer requires a training program and a training optimization function.soulmate connection

main_program = fluid.default_main_program() star_program = fluid.default_startup_program() prediction = inference_program(word_dict) train_func_outputs = train_program(prediction) avg_cost = train_func_outputs[0] test_program = main_program.clone(for_test=True) sgd_optimizer = optimizer_func() sgd_optimizer.minimize(avg_cost) exe = fluid.Executor(place)

This function is used to calculate the results of the training model on the test datasetsoulmate connection

def train_test( program, reader): count = 0 feed_var_list = [ program.global_block().var(var_name) for var_name in feed_order ] feeder_test = fluid.DataFeeder(feed_list=feed_var_list, place=place) test_exe = fluid.Executor(place) accumulated = len([avg_cost, accuracy]) * [0] for test_data in reader(): avg_cost_np = test_exe.run( program=program, feed=feeder_test.feed(test_data), fetch_list=[avg_cost, accuracy]) accumulated = [ x [0] + x[1][0] for x in zip(accumulated, avg_cost_np) ] count += 1 return [x / count for x in accumulated]

Provide data and build main Training loopsoulmate connection

feed_order Used to define the mapping relationship between each piece of generated data and fluid.layers.data. For example, the first column of data generated by imdb.train corresponds to the feature of words.soulmate connection

Specify the directory path to save the parameters params_dirname = "understand_sentiment_conv.inference.model" feed_order = [words, label] pass_num = 1 number of rounds of the training loop program main loop part def train_loop(): boot on Feed_var_list_loop = [ main_program.global_block().var(var_name) for var_name in feed_order ] feeder = fluid.DataFeeder(feed_list=feed_var_list_loop,place=place) exe.run(star_program) training loop for epoch_id in range( pass_num): for step_id, data in enumerate(train_reader()): run trainer metrics = exe.run(main_program, feed=feeder.feed(data), fetch_list=[var.name for var in train_func_outputs]) test result print ("step: {0}, Metrics {1}".format( step_id, list(map(np.array, metrics)))) if (step_id + 1) % 10 == 0: avg_cost_test, acc_test = train_test(test_program , test_reader) print(Step {0}, Test Loss {1:0.2}, Acc {2:0.2}.format( step_id, avg_cost_test, acc_test)) print("Step {0}, Epoch {1} Metrics {2} ".format( step_id, epoch_id, list(map(np.array, metrics)))) if math.isnan(float(metrics[0])): sys.exit("got Na N loss, training failed.") if params_dirname is not None: fluid.io.save_inference_model(params_dirname, ["words"], prediction, exe) save model train_loop()

training process processingsoulmate connection

We print the output of each step in the main training loop, so we can observe the training situation.soulmate connection

Start trainingsoulmate connection

Finally, we start the main training loop to start training. The training time is long. If you want to return results faster, you can shorten the training time at the cost of reducing the accuracy by adjusting the loss value range or the number of training steps.soulmate connection


Apply Modelsoulmate connection

Build Predictorsoulmate connection


Same as the training process, we need to create a prediction process and use the trained model and parameters to make predictions. params_dirname is used to store the parameters in the training process.soulmate connection

place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() exe = fluid.Executor(place) inference_scope = fluid.core.Scope()

Generate test input data

To make predictions, we arbitrarily select 3 reviews. Feel free to choose 3 of your favorites. We map each word in the review to an id in word_dict. If the word does not exist in the dictionary, it is set to unknown. Then we use create_lod_tensor to create the level of detail tensorsoulmate connection

reviews_str = [ read the book forget the movie, this is a great movie, this is very bad ] reviews = [c.split() for c in reviews_str] UNK = word_dict[] lod = [] for c in reviews: lod.append([word_dict.get(words, UNK) for words in c]) base_shape = [[len(c) for c in lod]] tensor_words = fluid.create_lod_tensor(lod, base_shape, place)

Apply the model and make predictions

Now we can positive each review or negative predictions.soulmate connection

with fluid.scope_guard(inference_scope): [inferencer, feed_target_names, fetch_targets] = fluid.io.load_inference_model(params_dirname, exe) reviews_str = [ read the book forget the moive,this is a great moive , this is very bad ] reviews = [c.split() for c in reviews_str] UNK = word_dict[] lod = [] for c in reviews: lod.append([np.int64(word_dict.get( words, UNK)) for words in c]) base_shape = [[len(c) for c in lod]] tensor_words = fluid.create_lod_tensor(lod, base_shape,place) assert feed_target_names[0] == "words" results = exe .run(inferencer, feed={feed_target_names[0]: tensor_words}, fetch_list=fetch_targets, return_numpy=False) np_data = np.array(results[0]) for i, r in enumerate(np_data): print("Predict probability of ", r[0], " to be positive and ", r[1], " to be negative for review \", reviews_str[i], "\")

Interested The small partners of PaddlePaddle can read other related documents on the official website of PaddlePaddle:http://www.paddlepaddle.org/


  1. Kim Y. Convolutional ne ural networks for sentence classification[J]. arXiv preprint arXiv:1408.5882, 2014.
  2. Kalchbrenner N, Grefenstette E, Blunsom PA convolutional neural network for modelling sentences[J]. arXiv preprint arXiv:1404.2188, 2014.
  3. Yann N.Dauphin, et al.Language Modeling with Gated Convolutional Networks[J] arXiv preprint arXiv:1612.08083, 2016.
  4. Siegelmann HT, Sontag E D. On the computational power of neural nets[C]//Proceedings of the fifth annual workshop on Computational learning theory.ACM, 1992: 440-449.
  5. Hochreiter S, Schmidhuber J.Long short-term memory[J].Neural computation, 1997, 9(8): 1735-1780.
  6. Bengio Y, Simard P, Frasconi P.Learning long-term dependencies with gradient descent is difficult[J].IEEE transactions on neural networks, 1994 , 5(2): 157-166.
  7. Graves A.Generating sequences with recurrent neural networks[J]. arXiv preprint arXiv:1308.0850, 2013.
  8. Cho K, Van Merriënboer B, Gulcehre C, et al.Learning phrase representations usin g RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv:1406.1078, 2014.
  9. Zhou J, Xu W.End-to-end learning of semantic role labeling using recurrent neural networks[C ]//Proceedings of the Annual Meeting of the Association for Computational Linguistics.2015.

—End—soulmate connection

Sincerely recruiting

Qubits are Recruiting editors/reporters, working in Zhongguancun, Beijing. We look forward to having talented and enthusiastic students to join us! For relevant details, please reply to the word “Recruitment” in the dialogue interface of the Qubit Public Account (QbitAI).soulmate connection

Qubit QbitAI · Toutiao Signing Author

վᴗ ի Track new trends in AI technology and products

do soulmates look alike

Leave a Reply

Your email address will not be published. Required fields are marked *