Web Excursions 2022-01-01
How Telegram Messenger circumvents Google Translate's API
Telegram has released arguably their biggest update of the year this week.
The big new feature is Message Translations, which allows to translate the text of messages within the app.
What is interesting is how this is implemented in the official Android app.
To get around the problem of translating text on Android in Telegram and not pay huge Google Cloud fees and risk leaking their API key, Telegram found some obscure way of querying the Cloud Translate API directly at no cost to them.
the official API uses a versioned API path (e.g. /language/translate/v2), and human readable query parameters, which importantly include the API key key.
They use another path, and also seem to intentionally split up the request path with multiple string joins
(perhaps for obscurity / avoid detection in the Play Store review process?)
Telegram keeps an array of strings containing various User Agents, with comments indicating percentages
they randomly pull a user agent from this array and pass it to the request to Google
It seems like a classic example of user agent rotation,
a technique often used by web scrapers to avoid being rate limited / blacklisted by web services.
The case of the programs that were launched with impossible command line options
a lot of failures seem to be “impossible”,
but the fact that they’re happening proves that it’s possible,
and you just have to do some sleuthing and adopt a more creative mindset to figure them out.
One of the tools for investigating these types of failures is seeing what other programs are running at the time,
or what other programs crashed shortly before or after the failure occurred.
[In a reported case where the Start menu keeps crashing,] there was one specific third-party program running.
This program billed itself as a utility that boosts your system’s gaming performance by terminating all processes it deemed to be non-essential.
And when the game is over, it relaunches all the programs it terminated
one of the programs that this utility decide was not worthy of keeping around was the Start menu
the utility relaunched the program as a normal program with no command line arguments.
But the Start menu is a UWP program,
so it is supposed to be launched a special way,
with specific command line arguments, and
it is supposed to be run in a low integrity app container.
As a result, the Start menu found itself running in an unexpected environment, with an unexpected command line, and it realized that something was messed up and failed fast, leaving a nice corpse for Watson to analyze.
instead of terminating UWP apps, just minimize them, which causes them to save their state and suspend.
If the system is placed under memory pressure, the system will terminate suspended UWP apps automatically.
Professional programmers routinely use fuzzing to check for problems that could occur in the wild and might not be easy to anticipate.
fuzz mean[s] unstructured, random data.
the basic idea of fuzzing: You
automatically generate random input,
check to see if the programs fed with it then do unpredictable things, and
repeat these two steps very often and very quickly.
fuzzers use various techniques to find errors.
Purely random input is easy to generate; it finds errors in input processing, such as buffer overflows.
Model-based fuzzers use grammars and other language models to generate valid and targeted input.
Evolution-based fuzzers mutate test input to find variants that cover as much code as possible. Constraint-based fuzzers can solve complex constraints in program code, but they take a long time to do so.
[Fuzzing as a hacking tool]
If the program receives its input via a web page,
attackers could, for example, enter a string like the one above into a form and thus attempt to disrupt the program or render it unusable.
it may be possible to design the input in such a way that the attacker gains control of the program or even the computer.
Today’s programs are protected against such attacks.
As a general rule, you should not trust any data that is under the control of a third party
In 1988, such mechanisms were uncommon, and what Miller [the inventor of fuzzing]’s students found was alarming:
They could crash more than a third of all Unix utilities within seconds by hitting them with random input.
In 1988, however, the Internet was still in its infancy, and every administrator knew the users on their machines personally.
In fact, Miller initially had trouble getting his findings published.
Model-based fuzzers use a specification of the input format to generate valid inputs a priori,
bypassing the numerous failed attempts with purely random strings.
Since the user can create and extend grammars, they allow the fuzzer to be more targeted
Evolution-based fuzz generators also aim to produce the most valid input possible in order to get deep into the program.
However, they achieve this goal not by using an input description such as a grammar,
but by systematically changing (mutating) existing input
fuzzers maintain a set (known as a population) of particularly promising input.
They start with a set of known and valid input, which they mutate, increasing the population.
The fuzzer now measures which places in the program the input reaches (coverage).
Input that reaches previously uncovered locations is considered particularly promising, is retained, and serves as the basis for further mutations. In contrast, input that does not reach new locations is dropped from the population (selection).
In this way, the population continues to evolve with respect to the goal of reaching as many program locations as possible and thus covering as many program behaviors as possible.
Evolution-based fuzzers resemble natural evolution – many failed tests, but effective in the long run.
Most importantly, they are frugal: Test input is often available or can be gleaned from concrete sequences; the coverage of a test is easy to measure.
Moreover, evolution-based fuzzers do not require further knowledge about input formats.
Their major drawback is their dependence on good starting input: If the initial input does not contain a certain feature, it is unlikely that the feature will ever be generated.
In constraint-based fuzzers, specialized constraint solvers are used – programs that automatically seek a solution for a given set of conditions
[Despite its speediness,] [i]f it is necessary to recover the original from an encrypted text, even a constraint solver very, very often cannot do more than guess the key.
Moreover, constraint solvers work slowly.