protobuf/src/google/protobuf/io/tokenizer.cc
Go to the documentation of this file.
1 // Protocol Buffers - Google's data interchange format
2 // Copyright 2008 Google Inc. All rights reserved.
3 // https://developers.google.com/protocol-buffers/
4 //
5 // Redistribution and use in source and binary forms, with or without
6 // modification, are permitted provided that the following conditions are
7 // met:
8 //
9 // * Redistributions of source code must retain the above copyright
10 // notice, this list of conditions and the following disclaimer.
11 // * Redistributions in binary form must reproduce the above
12 // copyright notice, this list of conditions and the following disclaimer
13 // in the documentation and/or other materials provided with the
14 // distribution.
15 // * Neither the name of Google Inc. nor the names of its
16 // contributors may be used to endorse or promote products derived from
17 // this software without specific prior written permission.
18 //
19 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
20 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
21 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
22 // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
23 // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
24 // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
25 // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
26 // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
27 // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
28 // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 
31 // Author: kenton@google.com (Kenton Varda)
32 // Based on original Protocol Buffers design by
33 // Sanjay Ghemawat, Jeff Dean, and others.
34 //
35 // Here we have a hand-written lexer. At first you might ask yourself,
36 // "Hand-written text processing? Is Kenton crazy?!" Well, first of all,
37 // yes I am crazy, but that's beside the point. There are actually reasons
38 // why I ended up writing this this way.
39 //
40 // The traditional approach to lexing is to use lex to generate a lexer for
41 // you. Unfortunately, lex's output is ridiculously ugly and difficult to
42 // integrate cleanly with C++ code, especially abstract code or code meant
43 // as a library. Better parser-generators exist but would add dependencies
44 // which most users won't already have, which we'd like to avoid. (GNU flex
45 // has a C++ output option, but it's still ridiculously ugly, non-abstract,
46 // and not library-friendly.)
47 //
48 // The next approach that any good software engineer should look at is to
49 // use regular expressions. And, indeed, I did. I have code which
50 // implements this same class using regular expressions. It's about 200
51 // lines shorter. However:
52 // - Rather than error messages telling you "This string has an invalid
53 // escape sequence at line 5, column 45", you get error messages like
54 // "Parse error on line 5". Giving more precise errors requires adding
55 // a lot of code that ends up basically as complex as the hand-coded
56 // version anyway.
57 // - The regular expression to match a string literal looks like this:
58 // kString = new RE("(\"([^\"\\\\]|" // non-escaped
59 // "\\\\[abfnrtv?\"'\\\\0-7]|" // normal escape
60 // "\\\\x[0-9a-fA-F])*\"|" // hex escape
61 // "\'([^\'\\\\]|" // Also support single-quotes.
62 // "\\\\[abfnrtv?\"'\\\\0-7]|"
63 // "\\\\x[0-9a-fA-F])*\')");
64 // Verifying the correctness of this line noise is actually harder than
65 // verifying the correctness of ConsumeString(), defined below. I'm not
66 // even confident that the above is correct, after staring at it for some
67 // time.
68 // - PCRE is fast, but there's still more overhead involved than the code
69 // below.
70 // - Sadly, regular expressions are not part of the C standard library, so
71 // using them would require depending on some other library. For the
72 // open source release, this could be really annoying. Nobody likes
73 // downloading one piece of software just to find that they need to
74 // download something else to make it work, and in all likelihood
75 // people downloading Protocol Buffers will already be doing so just
76 // to make something else work. We could include a copy of PCRE with
77 // our code, but that obligates us to keep it up-to-date and just seems
78 // like a big waste just to save 200 lines of code.
79 //
80 // On a similar but unrelated note, I'm even scared to use ctype.h.
81 // Apparently functions like isalpha() are locale-dependent. So, if we used
82 // that, then if this code is being called from some program that doesn't
83 // have its locale set to "C", it would behave strangely. We can't just set
84 // the locale to "C" ourselves since we might break the calling program that
85 // way, particularly if it is multi-threaded. WTF? Someone please let me
86 // (Kenton) know if I'm missing something here...
87 //
88 // I'd love to hear about other alternatives, though, as this code isn't
89 // exactly pretty.
90 
91 #include <google/protobuf/io/tokenizer.h>
92 
93 #include <google/protobuf/stubs/common.h>
94 #include <google/protobuf/stubs/logging.h>
95 #include <google/protobuf/stubs/stringprintf.h>
96 #include <google/protobuf/stubs/strutil.h>
97 #include <google/protobuf/io/strtod.h>
98 #include <google/protobuf/io/zero_copy_stream.h>
99 #include <google/protobuf/stubs/stl_util.h>
100 
101 namespace google {
102 namespace protobuf {
103 namespace io {
104 namespace {
105 
106 // As mentioned above, I don't trust ctype.h due to the presence of "locales".
107 // So, I have written replacement functions here. Someone please smack me if
108 // this is a bad idea or if there is some way around this.
109 //
110 // These "character classes" are designed to be used in template methods.
111 // For instance, Tokenizer::ConsumeZeroOrMore<Whitespace>() will eat
112 // whitespace.
113 
114 // Note: No class is allowed to contain '\0', since this is used to mark end-
115 // of-input and is handled specially.
116 
117 #define CHARACTER_CLASS(NAME, EXPRESSION) \
118  class NAME { \
119  public: \
120  static inline bool InClass(char c) { return EXPRESSION; } \
121  }
122 
123 CHARACTER_CLASS(Whitespace, c == ' ' || c == '\n' || c == '\t' || c == '\r' ||
124  c == '\v' || c == '\f');
125 CHARACTER_CLASS(WhitespaceNoNewline,
126  c == ' ' || c == '\t' || c == '\r' || c == '\v' || c == '\f');
127 
128 CHARACTER_CLASS(Unprintable, c<' ' && c> '\0');
129 
130 CHARACTER_CLASS(Digit, '0' <= c && c <= '9');
131 CHARACTER_CLASS(OctalDigit, '0' <= c && c <= '7');
132 CHARACTER_CLASS(HexDigit, ('0' <= c && c <= '9') || ('a' <= c && c <= 'f') ||
133  ('A' <= c && c <= 'F'));
134 
135 CHARACTER_CLASS(Letter,
136  ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z') || (c == '_'));
137 
138 CHARACTER_CLASS(Alphanumeric, ('a' <= c && c <= 'z') ||
139  ('A' <= c && c <= 'Z') ||
140  ('0' <= c && c <= '9') || (c == '_'));
141 
142 CHARACTER_CLASS(Escape, c == 'a' || c == 'b' || c == 'f' || c == 'n' ||
143  c == 'r' || c == 't' || c == 'v' || c == '\\' ||
144  c == '?' || c == '\'' || c == '\"');
145 
146 #undef CHARACTER_CLASS
147 
148 // Given a char, interpret it as a numeric digit and return its value.
149 // This supports any number base up to 36.
150 inline int DigitValue(char digit) {
151  if ('0' <= digit && digit <= '9') return digit - '0';
152  if ('a' <= digit && digit <= 'z') return digit - 'a' + 10;
153  if ('A' <= digit && digit <= 'Z') return digit - 'A' + 10;
154  return -1;
155 }
156 
157 // Inline because it's only used in one place.
158 inline char TranslateEscape(char c) {
159  switch (c) {
160  case 'a':
161  return '\a';
162  case 'b':
163  return '\b';
164  case 'f':
165  return '\f';
166  case 'n':
167  return '\n';
168  case 'r':
169  return '\r';
170  case 't':
171  return '\t';
172  case 'v':
173  return '\v';
174  case '\\':
175  return '\\';
176  case '?':
177  return '\?'; // Trigraphs = :(
178  case '\'':
179  return '\'';
180  case '"':
181  return '\"';
182 
183  // We expect escape sequences to have been validated separately.
184  default:
185  return '?';
186  }
187 }
188 
189 } // anonymous namespace
190 
192 
193 // ===================================================================
194 
196  ErrorCollector* error_collector)
197  : input_(input),
198  error_collector_(error_collector),
199  buffer_(NULL),
200  buffer_size_(0),
201  buffer_pos_(0),
202  read_error_(false),
203  line_(0),
204  column_(0),
205  record_target_(NULL),
206  record_start_(-1),
207  allow_f_after_float_(false),
208  comment_style_(CPP_COMMENT_STYLE),
209  require_space_after_number_(true),
210  allow_multiline_strings_(false) {
211  current_.line = 0;
212  current_.column = 0;
213  current_.end_column = 0;
214  current_.type = TYPE_START;
215 
216  Refresh();
217 }
218 
219 Tokenizer::~Tokenizer() {
220  // If we had any buffer left unread, return it to the underlying stream
221  // so that someone else can read it.
222  if (buffer_size_ > buffer_pos_) {
223  input_->BackUp(buffer_size_ - buffer_pos_);
224  }
225 }
226 
227 bool Tokenizer::report_whitespace() const { return report_whitespace_; }
228 // Note: `set_report_whitespace(false)` implies `set_report_newlines(false)`.
229 void Tokenizer::set_report_whitespace(bool report) {
230  report_whitespace_ = report;
231  report_newlines_ &= report;
232 }
233 
234 // If true, newline tokens are reported by Next().
235 bool Tokenizer::report_newlines() const { return report_newlines_; }
236 // Note: `set_report_newlines(true)` implies `set_report_whitespace(true)`.
237 void Tokenizer::set_report_newlines(bool report) {
238  report_newlines_ = report;
239  report_whitespace_ |= report; // enable report_whitespace if necessary
240 }
241 
242 // -------------------------------------------------------------------
243 // Internal helpers.
244 
245 void Tokenizer::NextChar() {
246  // Update our line and column counters based on the character being
247  // consumed.
248  if (current_char_ == '\n') {
249  ++line_;
250  column_ = 0;
251  } else if (current_char_ == '\t') {
252  column_ += kTabWidth - column_ % kTabWidth;
253  } else {
254  ++column_;
255  }
256 
257  // Advance to the next character.
258  ++buffer_pos_;
259  if (buffer_pos_ < buffer_size_) {
260  current_char_ = buffer_[buffer_pos_];
261  } else {
262  Refresh();
263  }
264 }
265 
266 void Tokenizer::Refresh() {
267  if (read_error_) {
268  current_char_ = '\0';
269  return;
270  }
271 
272  // If we're in a token, append the rest of the buffer to it.
273  if (record_target_ != NULL && record_start_ < buffer_size_) {
274  record_target_->append(buffer_ + record_start_,
275  buffer_size_ - record_start_);
276  record_start_ = 0;
277  }
278 
279  const void* data = NULL;
280  buffer_ = NULL;
281  buffer_pos_ = 0;
282  do {
283  if (!input_->Next(&data, &buffer_size_)) {
284  // end of stream (or read error)
285  buffer_size_ = 0;
286  read_error_ = true;
287  current_char_ = '\0';
288  return;
289  }
290  } while (buffer_size_ == 0);
291 
292  buffer_ = static_cast<const char*>(data);
293 
294  current_char_ = buffer_[0];
295 }
296 
297 inline void Tokenizer::RecordTo(std::string* target) {
298  record_target_ = target;
299  record_start_ = buffer_pos_;
300 }
301 
302 inline void Tokenizer::StopRecording() {
303  // Note: The if() is necessary because some STL implementations crash when
304  // you call string::append(NULL, 0), presumably because they are trying to
305  // be helpful by detecting the NULL pointer, even though there's nothing
306  // wrong with reading zero bytes from NULL.
307  if (buffer_pos_ != record_start_) {
308  record_target_->append(buffer_ + record_start_,
309  buffer_pos_ - record_start_);
310  }
311  record_target_ = NULL;
312  record_start_ = -1;
313 }
314 
315 inline void Tokenizer::StartToken() {
316  current_.type = TYPE_START; // Just for the sake of initializing it.
317  current_.text.clear();
318  current_.line = line_;
319  current_.column = column_;
320  RecordTo(&current_.text);
321 }
322 
323 inline void Tokenizer::EndToken() {
324  StopRecording();
325  current_.end_column = column_;
326 }
327 
328 // -------------------------------------------------------------------
329 // Helper methods that consume characters.
330 
331 template <typename CharacterClass>
332 inline bool Tokenizer::LookingAt() {
333  return CharacterClass::InClass(current_char_);
334 }
335 
336 template <typename CharacterClass>
337 inline bool Tokenizer::TryConsumeOne() {
338  if (CharacterClass::InClass(current_char_)) {
339  NextChar();
340  return true;
341  } else {
342  return false;
343  }
344 }
345 
346 inline bool Tokenizer::TryConsume(char c) {
347  if (current_char_ == c) {
348  NextChar();
349  return true;
350  } else {
351  return false;
352  }
353 }
354 
355 template <typename CharacterClass>
356 inline void Tokenizer::ConsumeZeroOrMore() {
357  while (CharacterClass::InClass(current_char_)) {
358  NextChar();
359  }
360 }
361 
362 template <typename CharacterClass>
363 inline void Tokenizer::ConsumeOneOrMore(const char* error) {
364  if (!CharacterClass::InClass(current_char_)) {
365  AddError(error);
366  } else {
367  do {
368  NextChar();
369  } while (CharacterClass::InClass(current_char_));
370  }
371 }
372 
373 // -------------------------------------------------------------------
374 // Methods that read whole patterns matching certain kinds of tokens
375 // or comments.
376 
377 void Tokenizer::ConsumeString(char delimiter) {
378  while (true) {
379  switch (current_char_) {
380  case '\0':
381  AddError("Unexpected end of string.");
382  return;
383 
384  case '\n': {
385  if (!allow_multiline_strings_) {
386  AddError("String literals cannot cross line boundaries.");
387  return;
388  }
389  NextChar();
390  break;
391  }
392 
393  case '\\': {
394  // An escape sequence.
395  NextChar();
396  if (TryConsumeOne<Escape>()) {
397  // Valid escape sequence.
398  } else if (TryConsumeOne<OctalDigit>()) {
399  // Possibly followed by two more octal digits, but these will
400  // just be consumed by the main loop anyway so we don't need
401  // to do so explicitly here.
402  } else if (TryConsume('x')) {
403  if (!TryConsumeOne<HexDigit>()) {
404  AddError("Expected hex digits for escape sequence.");
405  }
406  // Possibly followed by another hex digit, but again we don't care.
407  } else if (TryConsume('u')) {
408  if (!TryConsumeOne<HexDigit>() || !TryConsumeOne<HexDigit>() ||
409  !TryConsumeOne<HexDigit>() || !TryConsumeOne<HexDigit>()) {
410  AddError("Expected four hex digits for \\u escape sequence.");
411  }
412  } else if (TryConsume('U')) {
413  // We expect 8 hex digits; but only the range up to 0x10ffff is
414  // legal.
415  if (!TryConsume('0') || !TryConsume('0') ||
416  !(TryConsume('0') || TryConsume('1')) ||
417  !TryConsumeOne<HexDigit>() || !TryConsumeOne<HexDigit>() ||
418  !TryConsumeOne<HexDigit>() || !TryConsumeOne<HexDigit>() ||
419  !TryConsumeOne<HexDigit>()) {
420  AddError(
421  "Expected eight hex digits up to 10ffff for \\U escape "
422  "sequence");
423  }
424  } else {
425  AddError("Invalid escape sequence in string literal.");
426  }
427  break;
428  }
429 
430  default: {
431  if (current_char_ == delimiter) {
432  NextChar();
433  return;
434  }
435  NextChar();
436  break;
437  }
438  }
439  }
440 }
441 
442 Tokenizer::TokenType Tokenizer::ConsumeNumber(bool started_with_zero,
443  bool started_with_dot) {
444  bool is_float = false;
445 
446  if (started_with_zero && (TryConsume('x') || TryConsume('X'))) {
447  // A hex number (started with "0x").
448  ConsumeOneOrMore<HexDigit>("\"0x\" must be followed by hex digits.");
449 
450  } else if (started_with_zero && LookingAt<Digit>()) {
451  // An octal number (had a leading zero).
452  ConsumeZeroOrMore<OctalDigit>();
453  if (LookingAt<Digit>()) {
454  AddError("Numbers starting with leading zero must be in octal.");
455  ConsumeZeroOrMore<Digit>();
456  }
457 
458  } else {
459  // A decimal number.
460  if (started_with_dot) {
461  is_float = true;
462  ConsumeZeroOrMore<Digit>();
463  } else {
464  ConsumeZeroOrMore<Digit>();
465 
466  if (TryConsume('.')) {
467  is_float = true;
468  ConsumeZeroOrMore<Digit>();
469  }
470  }
471 
472  if (TryConsume('e') || TryConsume('E')) {
473  is_float = true;
474  TryConsume('-') || TryConsume('+');
475  ConsumeOneOrMore<Digit>("\"e\" must be followed by exponent.");
476  }
477 
478  if (allow_f_after_float_ && (TryConsume('f') || TryConsume('F'))) {
479  is_float = true;
480  }
481  }
482 
483  if (LookingAt<Letter>() && require_space_after_number_) {
484  AddError("Need space between number and identifier.");
485  } else if (current_char_ == '.') {
486  if (is_float) {
487  AddError(
488  "Already saw decimal point or exponent; can't have another one.");
489  } else {
490  AddError("Hex and octal numbers must be integers.");
491  }
492  }
493 
494  return is_float ? TYPE_FLOAT : TYPE_INTEGER;
495 }
496 
497 void Tokenizer::ConsumeLineComment(std::string* content) {
498  if (content != NULL) RecordTo(content);
499 
500  while (current_char_ != '\0' && current_char_ != '\n') {
501  NextChar();
502  }
503  TryConsume('\n');
504 
505  if (content != NULL) StopRecording();
506 }
507 
508 void Tokenizer::ConsumeBlockComment(std::string* content) {
509  int start_line = line_;
510  int start_column = column_ - 2;
511 
512  if (content != NULL) RecordTo(content);
513 
514  while (true) {
515  while (current_char_ != '\0' && current_char_ != '*' &&
516  current_char_ != '/' && current_char_ != '\n') {
517  NextChar();
518  }
519 
520  if (TryConsume('\n')) {
521  if (content != NULL) StopRecording();
522 
523  // Consume leading whitespace and asterisk;
524  ConsumeZeroOrMore<WhitespaceNoNewline>();
525  if (TryConsume('*')) {
526  if (TryConsume('/')) {
527  // End of comment.
528  break;
529  }
530  }
531 
532  if (content != NULL) RecordTo(content);
533  } else if (TryConsume('*') && TryConsume('/')) {
534  // End of comment.
535  if (content != NULL) {
536  StopRecording();
537  // Strip trailing "*/".
538  content->erase(content->size() - 2);
539  }
540  break;
541  } else if (TryConsume('/') && current_char_ == '*') {
542  // Note: We didn't consume the '*' because if there is a '/' after it
543  // we want to interpret that as the end of the comment.
544  AddError(
545  "\"/*\" inside block comment. Block comments cannot be nested.");
546  } else if (current_char_ == '\0') {
547  AddError("End-of-file inside block comment.");
548  error_collector_->AddError(start_line, start_column,
549  " Comment started here.");
550  if (content != NULL) StopRecording();
551  break;
552  }
553  }
554 }
555 
556 Tokenizer::NextCommentStatus Tokenizer::TryConsumeCommentStart() {
557  if (comment_style_ == CPP_COMMENT_STYLE && TryConsume('/')) {
558  if (TryConsume('/')) {
559  return LINE_COMMENT;
560  } else if (TryConsume('*')) {
561  return BLOCK_COMMENT;
562  } else {
563  // Oops, it was just a slash. Return it.
564  current_.type = TYPE_SYMBOL;
565  current_.text = "/";
566  current_.line = line_;
567  current_.column = column_ - 1;
568  current_.end_column = column_;
569  return SLASH_NOT_COMMENT;
570  }
571  } else if (comment_style_ == SH_COMMENT_STYLE && TryConsume('#')) {
572  return LINE_COMMENT;
573  } else {
574  return NO_COMMENT;
575  }
576 }
577 
578 bool Tokenizer::TryConsumeWhitespace() {
579  if (report_newlines_) {
580  if (TryConsumeOne<WhitespaceNoNewline>()) {
581  ConsumeZeroOrMore<WhitespaceNoNewline>();
582  current_.type = TYPE_WHITESPACE;
583  return true;
584  }
585  return false;
586  }
587  if (TryConsumeOne<Whitespace>()) {
588  ConsumeZeroOrMore<Whitespace>();
589  current_.type = TYPE_WHITESPACE;
590  return report_whitespace_;
591  }
592  return false;
593 }
594 
595 bool Tokenizer::TryConsumeNewline() {
596  if (!report_whitespace_ || !report_newlines_) {
597  return false;
598  }
599  if (TryConsume('\n')) {
600  current_.type = TYPE_NEWLINE;
601  return true;
602  }
603  return false;
604 }
605 
606 // -------------------------------------------------------------------
607 
608 bool Tokenizer::Next() {
609  previous_ = current_;
610 
611  while (!read_error_) {
612  StartToken();
613  bool report_token = TryConsumeWhitespace() || TryConsumeNewline();
614  EndToken();
615  if (report_token) {
616  return true;
617  }
618 
619  switch (TryConsumeCommentStart()) {
620  case LINE_COMMENT:
621  ConsumeLineComment(NULL);
622  continue;
623  case BLOCK_COMMENT:
624  ConsumeBlockComment(NULL);
625  continue;
626  case SLASH_NOT_COMMENT:
627  return true;
628  case NO_COMMENT:
629  break;
630  }
631 
632  // Check for EOF before continuing.
633  if (read_error_) break;
634 
635  if (LookingAt<Unprintable>() || current_char_ == '\0') {
636  AddError("Invalid control characters encountered in text.");
637  NextChar();
638  // Skip more unprintable characters, too. But, remember that '\0' is
639  // also what current_char_ is set to after EOF / read error. We have
640  // to be careful not to go into an infinite loop of trying to consume
641  // it, so make sure to check read_error_ explicitly before consuming
642  // '\0'.
643  while (TryConsumeOne<Unprintable>() ||
644  (!read_error_ && TryConsume('\0'))) {
645  // Ignore.
646  }
647 
648  } else {
649  // Reading some sort of token.
650  StartToken();
651 
652  if (TryConsumeOne<Letter>()) {
653  ConsumeZeroOrMore<Alphanumeric>();
654  current_.type = TYPE_IDENTIFIER;
655  } else if (TryConsume('0')) {
656  current_.type = ConsumeNumber(true, false);
657  } else if (TryConsume('.')) {
658  // This could be the beginning of a floating-point number, or it could
659  // just be a '.' symbol.
660 
661  if (TryConsumeOne<Digit>()) {
662  // It's a floating-point number.
663  if (previous_.type == TYPE_IDENTIFIER &&
664  current_.line == previous_.line &&
665  current_.column == previous_.end_column) {
666  // We don't accept syntax like "blah.123".
667  error_collector_->AddError(
668  line_, column_ - 2,
669  "Need space between identifier and decimal point.");
670  }
671  current_.type = ConsumeNumber(false, true);
672  } else {
673  current_.type = TYPE_SYMBOL;
674  }
675  } else if (TryConsumeOne<Digit>()) {
676  current_.type = ConsumeNumber(false, false);
677  } else if (TryConsume('\"')) {
678  ConsumeString('\"');
679  current_.type = TYPE_STRING;
680  } else if (TryConsume('\'')) {
681  ConsumeString('\'');
682  current_.type = TYPE_STRING;
683  } else {
684  // Check if the high order bit is set.
685  if (current_char_ & 0x80) {
686  error_collector_->AddError(
687  line_, column_,
688  StringPrintf("Interpreting non ascii codepoint %d.",
689  static_cast<unsigned char>(current_char_)));
690  }
691  NextChar();
692  current_.type = TYPE_SYMBOL;
693  }
694 
695  EndToken();
696  return true;
697  }
698  }
699 
700  // EOF
701  current_.type = TYPE_END;
702  current_.text.clear();
703  current_.line = line_;
704  current_.column = column_;
705  current_.end_column = column_;
706  return false;
707 }
708 
709 namespace {
710 
711 // Helper class for collecting comments and putting them in the right places.
712 //
713 // This basically just buffers the most recent comment until it can be decided
714 // exactly where that comment should be placed. When Flush() is called, the
715 // current comment goes into either prev_trailing_comments or detached_comments.
716 // When the CommentCollector is destroyed, the last buffered comment goes into
717 // next_leading_comments.
718 class CommentCollector {
719  public:
720  CommentCollector(std::string* prev_trailing_comments,
721  std::vector<std::string>* detached_comments,
729  if (prev_trailing_comments != NULL) prev_trailing_comments->clear();
730  if (detached_comments != NULL) detached_comments->clear();
731  if (next_leading_comments != NULL) next_leading_comments->clear();
732  }
733 
734  ~CommentCollector() {
735  // Whatever is in the buffer is a leading comment.
736  if (next_leading_comments_ != NULL && has_comment_) {
738  }
739  }
740 
741  // About to read a line comment. Get the comment buffer pointer in order to
742  // read into it.
743  std::string* GetBufferForLineComment() {
744  // We want to combine with previous line comments, but not block comments.
745  if (has_comment_ && !is_line_comment_) {
746  Flush();
747  }
748  has_comment_ = true;
749  is_line_comment_ = true;
750  return &comment_buffer_;
751  }
752 
753  // About to read a block comment. Get the comment buffer pointer in order to
754  // read into it.
755  std::string* GetBufferForBlockComment() {
756  if (has_comment_) {
757  Flush();
758  }
759  has_comment_ = true;
760  is_line_comment_ = false;
761  return &comment_buffer_;
762  }
763 
764  void ClearBuffer() {
765  comment_buffer_.clear();
766  has_comment_ = false;
767  }
768 
769  // Called once we know that the comment buffer is complete and is *not*
770  // connected to the next token.
771  void Flush() {
772  if (has_comment_) {
773  if (can_attach_to_prev_) {
774  if (prev_trailing_comments_ != NULL) {
776  }
777  can_attach_to_prev_ = false;
778  } else {
779  if (detached_comments_ != NULL) {
781  }
782  }
783  ClearBuffer();
784  }
785  }
786 
787  void DetachFromPrev() { can_attach_to_prev_ = false; }
788 
789  private:
791  std::vector<std::string>* detached_comments_;
793 
795 
796  // True if any comments were read into comment_buffer_. This can be true even
797  // if comment_buffer_ is empty, namely if the comment was "/**/".
799 
800  // Is the comment in the comment buffer a line comment?
802 
803  // Is it still possible that we could be reading a comment attached to the
804  // previous token?
806 };
807 
808 } // namespace
809 
810 bool Tokenizer::NextWithComments(std::string* prev_trailing_comments,
811  std::vector<std::string>* detached_comments,
813  CommentCollector collector(prev_trailing_comments, detached_comments,
815 
816  if (current_.type == TYPE_START) {
817  // Ignore unicode byte order mark(BOM) if it appears at the file
818  // beginning. Only UTF-8 BOM (0xEF 0xBB 0xBF) is accepted.
819  if (TryConsume(static_cast<char>(0xEF))) {
820  if (!TryConsume(static_cast<char>(0xBB)) ||
821  !TryConsume(static_cast<char>(0xBF))) {
822  AddError(
823  "Proto file starts with 0xEF but not UTF-8 BOM. "
824  "Only UTF-8 is accepted for proto file.");
825  return false;
826  }
827  }
828  collector.DetachFromPrev();
829  } else {
830  // A comment appearing on the same line must be attached to the previous
831  // declaration.
832  ConsumeZeroOrMore<WhitespaceNoNewline>();
833  switch (TryConsumeCommentStart()) {
834  case LINE_COMMENT:
835  ConsumeLineComment(collector.GetBufferForLineComment());
836 
837  // Don't allow comments on subsequent lines to be attached to a trailing
838  // comment.
839  collector.Flush();
840  break;
841  case BLOCK_COMMENT:
842  ConsumeBlockComment(collector.GetBufferForBlockComment());
843 
844  ConsumeZeroOrMore<WhitespaceNoNewline>();
845  if (!TryConsume('\n')) {
846  // Oops, the next token is on the same line. If we recorded a comment
847  // we really have no idea which token it should be attached to.
848  collector.ClearBuffer();
849  return Next();
850  }
851 
852  // Don't allow comments on subsequent lines to be attached to a trailing
853  // comment.
854  collector.Flush();
855  break;
856  case SLASH_NOT_COMMENT:
857  return true;
858  case NO_COMMENT:
859  if (!TryConsume('\n')) {
860  // The next token is on the same line. There are no comments.
861  return Next();
862  }
863  break;
864  }
865  }
866 
867  // OK, we are now on the line *after* the previous token.
868  while (true) {
869  ConsumeZeroOrMore<WhitespaceNoNewline>();
870 
871  switch (TryConsumeCommentStart()) {
872  case LINE_COMMENT:
873  ConsumeLineComment(collector.GetBufferForLineComment());
874  break;
875  case BLOCK_COMMENT:
876  ConsumeBlockComment(collector.GetBufferForBlockComment());
877 
878  // Consume the rest of the line so that we don't interpret it as a
879  // blank line the next time around the loop.
880  ConsumeZeroOrMore<WhitespaceNoNewline>();
881  TryConsume('\n');
882  break;
883  case SLASH_NOT_COMMENT:
884  return true;
885  case NO_COMMENT:
886  if (TryConsume('\n')) {
887  // Completely blank line.
888  collector.Flush();
889  collector.DetachFromPrev();
890  } else {
891  bool result = Next();
892  if (!result || current_.text == "}" || current_.text == "]" ||
893  current_.text == ")") {
894  // It looks like we're at the end of a scope. In this case it
895  // makes no sense to attach a comment to the following token.
896  collector.Flush();
897  }
898  return result;
899  }
900  break;
901  }
902  }
903 }
904 
905 // -------------------------------------------------------------------
906 // Token-parsing helpers. Remember that these don't need to report
907 // errors since any errors should already have been reported while
908 // tokenizing. Also, these can assume that whatever text they
909 // are given is text that the tokenizer actually parsed as a token
910 // of the given type.
911 
912 bool Tokenizer::ParseInteger(const std::string& text, uint64_t max_value,
913  uint64_t* output) {
914  // Sadly, we can't just use strtoul() since it is only 32-bit and strtoull()
915  // is non-standard. I hate the C standard library. :(
916 
917  // return strtoull(text.c_str(), NULL, 0);
918 
919  const char* ptr = text.c_str();
920  int base = 10;
921  if (ptr[0] == '0') {
922  if (ptr[1] == 'x' || ptr[1] == 'X') {
923  // This is hex.
924  base = 16;
925  ptr += 2;
926  } else {
927  // This is octal.
928  base = 8;
929  }
930  }
931 
932  uint64_t result = 0;
933  for (; *ptr != '\0'; ptr++) {
934  int digit = DigitValue(*ptr);
935  if (digit < 0 || digit >= base) {
936  // The token provided by Tokenizer is invalid. i.e., 099 is an invalid
937  // token, but Tokenizer still think it's integer.
938  return false;
939  }
940  if (static_cast<uint64_t>(digit) > max_value ||
941  result > (max_value - digit) / base) {
942  // Overflow.
943  return false;
944  }
945  result = result * base + digit;
946  }
947 
948  *output = result;
949  return true;
950 }
951 
952 double Tokenizer::ParseFloat(const std::string& text) {
953  const char* start = text.c_str();
954  char* end;
955  double result = NoLocaleStrtod(start, &end);
956 
957  // "1e" is not a valid float, but if the tokenizer reads it, it will
958  // report an error but still return it as a valid token. We need to
959  // accept anything the tokenizer could possibly return, error or not.
960  if (*end == 'e' || *end == 'E') {
961  ++end;
962  if (*end == '-' || *end == '+') ++end;
963  }
964 
965  // If the Tokenizer had allow_f_after_float_ enabled, the float may be
966  // suffixed with the letter 'f'.
967  if (*end == 'f' || *end == 'F') {
968  ++end;
969  }
970 
971  GOOGLE_LOG_IF(DFATAL,
972  static_cast<size_t>(end - start) != text.size() || *start == '-')
973  << " Tokenizer::ParseFloat() passed text that could not have been"
974  " tokenized as a float: "
975  << CEscape(text);
976  return result;
977 }
978 
979 // Helper to append a Unicode code point to a string as UTF8, without bringing
980 // in any external dependencies.
981 static void AppendUTF8(uint32_t code_point, std::string* output) {
982  uint32_t tmp = 0;
983  int len = 0;
984  if (code_point <= 0x7f) {
985  tmp = code_point;
986  len = 1;
987  } else if (code_point <= 0x07ff) {
988  tmp = 0x0000c080 | ((code_point & 0x07c0) << 2) | (code_point & 0x003f);
989  len = 2;
990  } else if (code_point <= 0xffff) {
991  tmp = 0x00e08080 | ((code_point & 0xf000) << 4) |
992  ((code_point & 0x0fc0) << 2) | (code_point & 0x003f);
993  len = 3;
994  } else if (code_point <= 0x10ffff) {
995  tmp = 0xf0808080 | ((code_point & 0x1c0000) << 6) |
996  ((code_point & 0x03f000) << 4) | ((code_point & 0x000fc0) << 2) |
997  (code_point & 0x003f);
998  len = 4;
999  } else {
1000  // Unicode code points end at 0x10FFFF, so this is out-of-range.
1001  // ConsumeString permits hex values up to 0x1FFFFF, and FetchUnicodePoint
1002  // doesn't perform a range check.
1003  StringAppendF(output, "\\U%08x", code_point);
1004  return;
1005  }
1006  tmp = ghtonl(tmp);
1007  output->append(reinterpret_cast<const char*>(&tmp) + sizeof(tmp) - len, len);
1008 }
1009 
1010 // Try to read <len> hex digits from ptr, and stuff the numeric result into
1011 // *result. Returns true if that many digits were successfully consumed.
1012 static bool ReadHexDigits(const char* ptr, int len, uint32_t* result) {
1013  *result = 0;
1014  if (len == 0) return false;
1015  for (const char* end = ptr + len; ptr < end; ++ptr) {
1016  if (*ptr == '\0') return false;
1017  *result = (*result << 4) + DigitValue(*ptr);
1018  }
1019  return true;
1020 }
1021 
1022 // Handling UTF-16 surrogate pairs. UTF-16 encodes code points in the range
1023 // 0x10000...0x10ffff as a pair of numbers, a head surrogate followed by a trail
1024 // surrogate. These numbers are in a reserved range of Unicode code points, so
1025 // if we encounter such a pair we know how to parse it and convert it into a
1026 // single code point.
1027 static const uint32_t kMinHeadSurrogate = 0xd800;
1028 static const uint32_t kMaxHeadSurrogate = 0xdc00;
1029 static const uint32_t kMinTrailSurrogate = 0xdc00;
1030 static const uint32_t kMaxTrailSurrogate = 0xe000;
1031 
1032 static inline bool IsHeadSurrogate(uint32_t code_point) {
1033  return (code_point >= kMinHeadSurrogate) && (code_point < kMaxHeadSurrogate);
1034 }
1035 
1036 static inline bool IsTrailSurrogate(uint32_t code_point) {
1037  return (code_point >= kMinTrailSurrogate) &&
1038  (code_point < kMaxTrailSurrogate);
1039 }
1040 
1041 // Combine a head and trail surrogate into a single Unicode code point.
1042 static uint32_t AssembleUTF16(uint32_t head_surrogate,
1043  uint32_t trail_surrogate) {
1044  GOOGLE_DCHECK(IsHeadSurrogate(head_surrogate));
1045  GOOGLE_DCHECK(IsTrailSurrogate(trail_surrogate));
1046  return 0x10000 + (((head_surrogate - kMinHeadSurrogate) << 10) |
1047  (trail_surrogate - kMinTrailSurrogate));
1048 }
1049 
1050 // Convert the escape sequence parameter to a number of expected hex digits.
1051 static inline int UnicodeLength(char key) {
1052  if (key == 'u') return 4;
1053  if (key == 'U') return 8;
1054  return 0;
1055 }
1056 
1057 // Given a pointer to the 'u' or 'U' starting a Unicode escape sequence, attempt
1058 // to parse that sequence. On success, returns a pointer to the first char
1059 // beyond that sequence, and fills in *code_point. On failure, returns ptr
1060 // itself.
1061 static const char* FetchUnicodePoint(const char* ptr, uint32_t* code_point) {
1062  const char* p = ptr;
1063  // Fetch the code point.
1064  const int len = UnicodeLength(*p++);
1065  if (!ReadHexDigits(p, len, code_point)) return ptr;
1066  p += len;
1067 
1068  // Check if the code point we read is a "head surrogate." If so, then we
1069  // expect it to be immediately followed by another code point which is a valid
1070  // "trail surrogate," and together they form a UTF-16 pair which decodes into
1071  // a single Unicode point. Trail surrogates may only use \u, not \U.
1072  if (IsHeadSurrogate(*code_point) && *p == '\\' && *(p + 1) == 'u') {
1073  uint32_t trail_surrogate;
1074  if (ReadHexDigits(p + 2, 4, &trail_surrogate) &&
1075  IsTrailSurrogate(trail_surrogate)) {
1076  *code_point = AssembleUTF16(*code_point, trail_surrogate);
1077  p += 6;
1078  }
1079  // If this failed, then we just emit the head surrogate as a code point.
1080  // It's bogus, but so is the string.
1081  }
1082 
1083  return p;
1084 }
1085 
1086 // The text string must begin and end with single or double quote
1087 // characters.
1088 void Tokenizer::ParseStringAppend(const std::string& text,
1089  std::string* output) {
1090  // Reminder: text[0] is always a quote character. (If text is
1091  // empty, it's invalid, so we'll just return).
1092  const size_t text_size = text.size();
1093  if (text_size == 0) {
1094  GOOGLE_LOG(DFATAL) << " Tokenizer::ParseStringAppend() passed text that could not"
1095  " have been tokenized as a string: "
1096  << CEscape(text);
1097  return;
1098  }
1099 
1100  // Reserve room for new string. The branch is necessary because if
1101  // there is already space available the reserve() call might
1102  // downsize the output.
1103  const size_t new_len = text_size + output->size();
1104  if (new_len > output->capacity()) {
1105  output->reserve(new_len);
1106  }
1107 
1108  // Loop through the string copying characters to "output" and
1109  // interpreting escape sequences. Note that any invalid escape
1110  // sequences or other errors were already reported while tokenizing.
1111  // In this case we do not need to produce valid results.
1112  for (const char* ptr = text.c_str() + 1; *ptr != '\0'; ptr++) {
1113  if (*ptr == '\\' && ptr[1] != '\0') {
1114  // An escape sequence.
1115  ++ptr;
1116 
1117  if (OctalDigit::InClass(*ptr)) {
1118  // An octal escape. May one, two, or three digits.
1119  int code = DigitValue(*ptr);
1120  if (OctalDigit::InClass(ptr[1])) {
1121  ++ptr;
1122  code = code * 8 + DigitValue(*ptr);
1123  }
1124  if (OctalDigit::InClass(ptr[1])) {
1125  ++ptr;
1126  code = code * 8 + DigitValue(*ptr);
1127  }
1128  output->push_back(static_cast<char>(code));
1129 
1130  } else if (*ptr == 'x') {
1131  // A hex escape. May zero, one, or two digits. (The zero case
1132  // will have been caught as an error earlier.)
1133  int code = 0;
1134  if (HexDigit::InClass(ptr[1])) {
1135  ++ptr;
1136  code = DigitValue(*ptr);
1137  }
1138  if (HexDigit::InClass(ptr[1])) {
1139  ++ptr;
1140  code = code * 16 + DigitValue(*ptr);
1141  }
1142  output->push_back(static_cast<char>(code));
1143 
1144  } else if (*ptr == 'u' || *ptr == 'U') {
1145  uint32_t unicode;
1146  const char* end = FetchUnicodePoint(ptr, &unicode);
1147  if (end == ptr) {
1148  // Failure: Just dump out what we saw, don't try to parse it.
1149  output->push_back(*ptr);
1150  } else {
1152  ptr = end - 1; // Because we're about to ++ptr.
1153  }
1154  } else {
1155  // Some other escape code.
1156  output->push_back(TranslateEscape(*ptr));
1157  }
1158 
1159  } else if (*ptr == text[0] && ptr[1] == '\0') {
1160  // Ignore final quote matching the starting quote.
1161  } else {
1162  output->push_back(*ptr);
1163  }
1164  }
1165 }
1166 
1167 template <typename CharacterClass>
1168 static bool AllInClass(const std::string& s) {
1169  for (const char character : s) {
1170  if (!CharacterClass::InClass(character)) return false;
1171  }
1172  return true;
1173 }
1174 
1175 bool Tokenizer::IsIdentifier(const std::string& text) {
1176  // Mirrors IDENTIFIER definition in Tokenizer::Next() above.
1177  if (text.size() == 0) return false;
1178  if (!Letter::InClass(text.at(0))) return false;
1179  if (!AllInClass<Alphanumeric>(text.substr(1))) return false;
1180  return true;
1181 }
1182 
1183 } // namespace io
1184 } // namespace protobuf
1185 } // namespace google
google::protobuf::io::ReadHexDigits
static bool ReadHexDigits(const char *ptr, int len, uint32_t *result)
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1012
ptr
char * ptr
Definition: abseil-cpp/absl/base/internal/low_level_alloc_test.cc:45
check_grpcio_tools.content
content
Definition: check_grpcio_tools.py:26
prev_trailing_comments
const char * prev_trailing_comments
Definition: bloaty/third_party/protobuf/src/google/protobuf/io/tokenizer_unittest.cc:525
_gevent_test_main.result
result
Definition: _gevent_test_main.py:96
unicode
Definition: bloaty/third_party/re2/re2/unicode.py:1
current_
Block * current_
Definition: protobuf/src/google/protobuf/descriptor.cc:1035
comment_buffer_
std::string comment_buffer_
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:794
grpc::protobuf::io::ZeroCopyInputStream
GRPC_CUSTOM_ZEROCOPYINPUTSTREAM ZeroCopyInputStream
Definition: include/grpcpp/impl/codegen/config_protobuf.h:101
false
#define false
Definition: setup_once.h:323
CHARACTER_CLASS
#define CHARACTER_CLASS(NAME, EXPRESSION)
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:117
GOOGLE_DCHECK
#define GOOGLE_DCHECK
Definition: bloaty/third_party/protobuf/src/google/protobuf/stubs/logging.h:194
next_leading_comments
const char * next_leading_comments
Definition: bloaty/third_party/protobuf/src/google/protobuf/io/tokenizer_unittest.cc:527
line_
int line_
Definition: bloaty/third_party/protobuf/src/google/protobuf/compiler/objectivec/objectivec_helpers.cc:1468
detached_comments_
std::vector< std::string > * detached_comments_
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:791
google::protobuf::io::kMinHeadSurrogate
static const uint32_t kMinHeadSurrogate
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1027
google::protobuf.text_format.ParseInteger
def ParseInteger(text, is_signed=False, is_long=False)
Definition: bloaty/third_party/protobuf/python/google/protobuf/text_format.py:1634
testing::internal::string
::std::string string
Definition: bloaty/third_party/protobuf/third_party/googletest/googletest/include/gtest/internal/gtest-port.h:881
error
grpc_error_handle error
Definition: retry_filter.cc:499
google::protobuf
Definition: bloaty/third_party/protobuf/benchmarks/util/data_proto2_to_proto3_util.h:12
google::protobuf::io::NoLocaleStrtod
double NoLocaleStrtod(const char *text, char **original_endptr)
Definition: bloaty/third_party/protobuf/src/google/protobuf/io/strtod.cc:82
absl::CEscape
std::string CEscape(absl::string_view src)
Definition: abseil-cpp/absl/strings/escaping.cc:854
xds_manager.p
p
Definition: xds_manager.py:60
google::protobuf::StringAppendF
void StringAppendF(string *dst, const char *format,...)
Definition: bloaty/third_party/protobuf/src/google/protobuf/stubs/stringprintf.cc:127
absl::base_internal::Next
static AllocList * Next(int i, AllocList *prev, LowLevelAlloc::Arena *arena)
Definition: abseil-cpp/absl/base/internal/low_level_alloc.cc:453
true
#define true
Definition: setup_once.h:324
google::protobuf::io::AllInClass
static bool AllInClass(const std::string &s)
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1168
input_
const uint8_t * input_
Definition: json_reader.cc:120
prev_trailing_comments_
std::string * prev_trailing_comments_
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:790
uint32_t
unsigned int uint32_t
Definition: stdint-msvc2008.h:80
buffer_
static uint8 buffer_[kBufferSize]
Definition: bloaty/third_party/protobuf/src/google/protobuf/io/coded_stream_unittest.cc:136
google::protobuf::ghtonl
uint32 ghtonl(uint32 x)
Definition: bloaty/third_party/protobuf/src/google/protobuf/stubs/common.cc:307
google::protobuf::io::ErrorCollector::~ErrorCollector
virtual ~ErrorCollector()
Definition: bloaty/third_party/protobuf/src/google/protobuf/io/tokenizer.cc:191
start
static uint64_t start
Definition: benchmark-pound.c:74
c
void c(T a)
Definition: miscompile_with_no_unique_address_test.cc:40
gen_server_registered_method_bad_client_test_body.text
def text
Definition: gen_server_registered_method_bad_client_test_body.py:50
end
char * end
Definition: abseil-cpp/absl/strings/internal/str_format/float_conversion.cc:1008
google::protobuf::io::UnicodeLength
static int UnicodeLength(char key)
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1051
google::protobuf::io::AssembleUTF16
static uint32_t AssembleUTF16(uint32_t head_surrogate, uint32_t trail_surrogate)
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1042
gmock_output_test.output
output
Definition: bloaty/third_party/googletest/googlemock/test/gmock_output_test.py:175
next_leading_comments_
std::string * next_leading_comments_
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:792
uint64_t
unsigned __int64 uint64_t
Definition: stdint-msvc2008.h:90
google::protobuf::StringPrintf
string StringPrintf(const char *format,...)
Definition: bloaty/third_party/protobuf/src/google/protobuf/stubs/stringprintf.cc:109
google::protobuf::io::kMaxTrailSurrogate
static const uint32_t kMaxTrailSurrogate
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1030
io
gen_synthetic_protos.base
base
Definition: gen_synthetic_protos.py:31
data
char data[kBufferLength]
Definition: abseil-cpp/absl/strings/internal/str_format/float_conversion.cc:1006
google::protobuf::io::Tokenizer::Tokenizer
Tokenizer(ZeroCopyInputStream *input, ErrorCollector *error_collector)
Definition: bloaty/third_party/protobuf/src/google/protobuf/io/tokenizer.cc:195
absl::strings_internal::ParseFloat
strings_internal::ParsedFloat ParseFloat(const char *begin, const char *end, chars_format format_flags)
Definition: abseil-cpp/absl/strings/internal/charconv_parse.cc:355
google::protobuf::io::kMinTrailSurrogate
static const uint32_t kMinTrailSurrogate
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1029
google::protobuf::io::IsTrailSurrogate
static bool IsTrailSurrogate(uint32_t code_point)
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1036
key
const char * key
Definition: hpack_parser_table.cc:164
detached_comments
const char * detached_comments[10]
Definition: bloaty/third_party/protobuf/src/google/protobuf/io/tokenizer_unittest.cc:526
google::protobuf::io::FetchUnicodePoint
static const char * FetchUnicodePoint(const char *ptr, uint32_t *code_point)
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1061
google::protobuf::io::AppendUTF8
static void AppendUTF8(uint32_t code_point, std::string *output)
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:981
GOOGLE_LOG_IF
#define GOOGLE_LOG_IF(LEVEL, CONDITION)
Definition: bloaty/third_party/protobuf/src/google/protobuf/stubs/logging.h:150
input
std::string input
Definition: bloaty/third_party/protobuf/src/google/protobuf/io/tokenizer_unittest.cc:197
has_comment_
bool has_comment_
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:798
is_line_comment_
bool is_line_comment_
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:801
code
Definition: bloaty/third_party/zlib/contrib/infback9/inftree9.h:24
google::protobuf::io::IsHeadSurrogate
static bool IsHeadSurrogate(uint32_t code_point)
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1032
len
int len
Definition: abseil-cpp/absl/base/internal/low_level_alloc_test.cc:46
autogen_x86imm.tmp
tmp
Definition: autogen_x86imm.py:12
google::protobuf::io::kMaxHeadSurrogate
static const uint32_t kMaxHeadSurrogate
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:1028
can_attach_to_prev_
bool can_attach_to_prev_
Definition: protobuf/src/google/protobuf/io/tokenizer.cc:805
GOOGLE_LOG
#define GOOGLE_LOG(LEVEL)
Definition: bloaty/third_party/protobuf/src/google/protobuf/stubs/logging.h:146
error_collector_
MockErrorCollector error_collector_
Definition: bloaty/third_party/protobuf/src/google/protobuf/compiler/importer_unittest.cc:129
setup.target
target
Definition: third_party/bloaty/third_party/protobuf/python/setup.py:179
google
Definition: bloaty/third_party/protobuf/benchmarks/util/data_proto2_to_proto3_util.h:11


grpc
Author(s):
autogenerated on Fri May 16 2025 03:00:39