How to change user agent when downloading images on import

Images from one vendor cannot be imported via import function, here is a sample https://cdn.vegis.ro/images/products/img_201611250935/2380/full/sonnentor-ceai-boboci-de-trandafir-bio-30gr-101821.jpg , but it works to manually add the image on product or with wget user agent.

It seems like connection is not allowed (406 Not acceptable) ,I manually tried with wget command and got the error. How to change user agent when downloading the images from this vendor ? When using wget with user agent it works.

@CS-Cart_team can you please help me ? Today I got the third vendor that I cant work with because I cant download images for the same reason.

By default, CS-Cart does not specify any user agent when makes outgoing requests, like downloading images.

Please try to add:

curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0');

before:

        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_HEADER, 1);

in the app/Tygh/Http.php.

P.S. I haven’t tested it.

1 Like

Works perfectly, thank you ! Is it any problem if I leave it like this with your added line ?

The changes may be lost after the upgrade. As for any other issues, I can only think that some servers may have advanced user agent checking and may not believe in this masquerade, resulting in requests being rejected.

I have another issue, some vendor image links have semicolon in link , because of that on import it doesn’t download the product images. I need to surround the URL with single quotes, that way I won’t have to worry and seems to work when I did wget test. How should I alter app/Tygh/Http.php to achieve that ?

@CS-Cart_team , any help here please.

Sorry, but I cannot help you there. It is much more complex than just a small change somewhere in the Http class, as these changes should only affect links to the images in the import and nothing else.

@CS-Cart_team This is the problematic image link sample, what option do I have to import them ? https://img.modivo.cloud/product(6/2/2/4/62247c4b0894b8c0baa7058ff78deb7c503f454b_22_0197063252405_rz.jpg,jpg)/tenisi-vans-old-skool-stackform-vn0009pzchn1-chintz-rose-0000303792381.jpg

What’s interesting is that if I add the image to the product using the above link, it works, only on import it fail.

@CS-Cart_team I have new issues with this image links that have no extension, any way to download this images from the feed file using import ? When I try to upload the image on product page it work. Here is a sample https://www.marionnaud.ro/medias/sys_master/prd-images/h6e/h7e/9237028110366/9237028110366

Have imported via import without almost any issues. Once I’ve opened this link for the second time the Cloudflare server told me:

Sorry, you have been blocked

so I had to use VPN to make tests. And everything went well after that.

CURL request to this image returns error:

Spoiler
$ curl -v https://www.marionnaud.ro/medias/sys_master/prd-images/h6e/h7e/9237028110366/9237028110366
*   Trying 2.16.96.14:443...
* Connected to www.marionnaud.ro (2.16.96.14) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=NL; L=Renswoude; O=AS Watson (Europe) Holdings B.V.; CN=aswatson.eu
*  start date: Jun  5 00:00:00 2024 GMT
*  expire date: Nov  5 23:59:59 2024 GMT
*  subjectAltName: host "www.marionnaud.ro" matched cert's "www.marionnaud.ro"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=GeoTrust RSA CA 2018
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Using Stream ID: 1 (easy handle 0x5f5078fa3f70)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
> GET /medias/sys_master/prd-images/h6e/h7e/9237028110366/9237028110366 HTTP/2
> Host: www.marionnaud.ro
> user-agent: curl/7.81.0
> accept: */*
> 
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)
* stopped the pause stream!
* Connection #0 to host www.marionnaud.ro left intact
curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)

specifically:

curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)

This is a server related error and unfortunately I don’t have any advice on it. I asked GPT about it and I didn’t like the answer, so I won’t post it here :slight_smile:

I can assume that this server uses some kind of advanced protection against bots, parsers and anything else that is not a real browser.

1 Like