Tuesday, November 21, 2023

Understanding AWS SDK Java 2.x (Async vs Sync) HTTP Clients

November 21, 2023 amazon-web-services, aws-sdk, http, java, multithreading No comments

Issue

The AWS Java 2.x SDK offers many different http client alternatives. Two of which are the:

Apache-based sync HTTP client
AWS CRT-based async HTTP client

To help me understand the difference between the Sync and Async, I was trying to understand the differences described in the AWS documentation found here.

For example, when browsing their recommendations as to which to use, I found the following very peculiar:

The Apache-based sync HTTP client is recommended for "low latency over high throughput", while the AWS CRT-based async HTTP client is recommended for "high throughput over low latency".

In short, how on earth could the Apache-based sync HTTP client have a lower latency than one of the Async HTTP clients? I would expect the latency to be based on network latency and for that it's completely irrelevant what client (whether sync or async) you use.

Also, how could an Async client have a higher throughput than a sync HTTP client, that also doesn't make sense to me? I would expect the opposite because the sync client will busy wait until the response is available, while the async will go run off do something else in the meantime, and therefore might not be ready to handle the response when it arrives.

Below is more of the thinking to the above.

My understanding is that high throughput and low latency are inversely correlated. If you increase one, you generally decrease the other, and vice-versa. Of course, there could be other factors that could, for example, decrease throughput while holding latency steady. Nonetheless, I don't see how either clients would effect network throughput or latency.

It's a bit confusing to me what the difference would be in running the Apache-based sync HTTP client wrapped in a promise vs. directly using one of the async HTTP client (for example either the netty or CRT-based async clients) - I believe I understand the differences, but let me state below.

They both use thread pools (at least, for sync if you're using the Apache-based sync client), they both have to wait for responses.

I would guess that when using the Apache-based sync HTTP client to create an HTTP client request, it typically draws a thread from the thread pool, uses that thread to create an HTTP connection, initiates the request, then busy waits for the response that will continually consume CPU time for the thread to continue to poll to see if the response is available.

Meanwhile, with the async HTTP client, it follows the same procedure as above, except it doesn't busy wait for a response, it will use that thread for some other purpose in the meantime, or if no other purpose is found it will sleep without consuming CPU time until the response comes in; the response will via some sort of OS mechanism trigger the JVM to wake up that slept thread to handle the response.

As per my understanding, I don't see how the Apache-based sync HTTP client could have a lower latency than the AWS CRT-based async HTTP client? I don't see how the AWS CRT-based async HTTP client could have a higher throughput? It would seem to me like the difference between the two mostly just determine the efficiency at which the program uses the CPU.

Solution

AWS CRT-based HTTP client is only available for S3, and it uses multipart upload API and byte-range fetches. That's why it can handle higher throughput. The overhead of using these mechanisms results in a higher latency.

https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/crt-based-s3-client.html

The AWS CRT-based S3 client—built on top of the AWS Common Runtime (CRT)—is an alternative S3 asynchronous client. It transfers objects to and from Amazon Simple Storage Service (Amazon S3) with enhanced performance and reliability by automatically using Amazon S3's multipart upload API and byte-range fetches.

The AWS CRT-based S3 client improves transfer reliability in case there is a network failure. Reliability is improved by retrying individual failed parts of a file transfer without restarting the transfer from the beginning.

In addition, the AWS CRT-based S3 client offers enhanced connection pooling and Domain Name System (DNS) load balancing, which also improves throughput.

You can use the AWS CRT-based S3 client in place of the SDK's standard S3 asynchronous client and take advantage of its improved throughput right away.

Netty-based HTTP client and AWS CRT-based HTTP client uses nonblocking I/O which allows you to have a few threads what handle many concurrent requests in comparison to Apache-based sync HTTP client where you need to have as many threads as concurrent requests.

https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/asynchronous.html

The AWS SDK for Java 2.x features truly nonblocking asynchronous clients that implement high concurrency across a few threads. The AWS SDK for Java 1.x has asynchronous clients that are wrappers around a thread pool and blocking synchronous clients that don’t provide the full benefit of nonblocking I/O.

Answered By - abler98

This Answer collected from stackoverflow and tested by AndroidBugFix community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, November 21, 2023

Understanding AWS SDK Java 2.x (Async vs Sync) HTTP Clients

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels