30 April 2026

India’s Shashi Tharoor did not say Pakistan engaged in ‘better diplomacy’

Authentic clip shows Tharoor discussing Strait of Hormuz, Iran’s other leverages

By Haseem uz Zaman

Claim: A video shows Indian politician Shashi Tharoor saying Pakistan has engaged in “much better diplomacy” as opposed to his country and calling Prime Minister Narendra Modi’s actions a “historical blunder”.

Fact: The video is a manipulated version of an authentic interview, in which Tharoor spoke of the US-Israel war against Iran, the destruction in the Islamic Republic, and the Strait of Hormuz tensions. He did not talk about Pakistan.

On 15 March 2026, Facebook page ‘Nova News’ posted (archive) a video of Indian politician and lawmaker Shashi Tharoor speaking to reporters, saying Pakistan engaged in “much better diplomacy” as opposed to his own country.

The video also shows Tharoor — who is also a member of the Indian National Congress (INC) representing the Thiruvananthapuram city — reportedly saying his country’s prime minister, Narendra Modi, has made a “historical blunder”.

The following caption accompanies the viral clip:

“پاکستان نے موجودہ صورتحال میں بھارت سے کہیں زیادہ بہتر سفارتکاری کی ہے۔مودی نے بڑے بڑے بلنڈر کئے ہیں۔ششی تھرو
[Pakistan has engaged in much better diplomacy than India in the current situation. Modi has made big blunders. Shashi Tharoor]”

The full transcript of the Indian politician’s purported remarks is as follows:

“What Modi has done remains a historical blunder. You can’t make friends with declining powers and expect India to go ahead. The global south is a reality. China, Russia, and Pakistan are now more closer than ever. I feel so sorry to say this, but Pakistan is faring much better diplomatically than India, which has never happened in the past 70 years. I believe we are on our way down. This legacy built on the innocent lives of Gaza will not survive for Netanyahu and it won’t work for Prime Minister Modi.”

Fact or Fiction?

Soch Fact Check reverse-searched keyframes from the viral video and traced it back to this clip posted by the Press Trust of India (PTI) on 14 March 2026.

In the video, Tharoor says, “The Strait of Hormuz can only be opened through the end of the war because Iran has very few chokeholds on the rest of the world. At the moment, they’re very much on the back foot. You know, 13 American soldiers have been killed, 5,000 Iranians have been killed, according to media reports.

“The balance of the conflict has been entirely against Iran. Much more destruction in Iran, many more missiles have hit targets in Iran than Iranian drones and missiles have hit targets elsewhere. So Iran is on the receiving end.

“One of the areas they can do is they can make these things difficult and expensive for the world by tightly restricting channels of oil, channels of shipping, channels … airlines and so on. So that is the leverage they’ve got. They’re not going to surrender that, unless we can persuade those who are conducting this war to quickly call it off.

“That should be our objective in the interest of saving not only our economy but the entire region,” he adds.

Since Tharoor did not make the remarks he purportedly makes in the viral video, we suspected that it is doctored using artificial intelligence (AI) tools.

Deepfake detectors

Therefore, we decided to test the video in various deepfake detection tools, such as Hiya Deepfake Voice Detector, DeepFake-O-Meter, and Global Online Deepfake Detection System (GODDS).

We tested three phrases from Tharoor’s alleged remarks in Hiya Deepfake Voice Detector:

“done remains a historical blunder. You can’t”
“but Pakistan is faring much better diplomatically than India”
“legacy built on the innocent lives of Gaza will not”

All of these were “likely a deepfake”, with scores of 20, 31, and 5 out of 100, respectively, indicating that they lack authenticity.

Results from different detectors available on DeepFake-O-Meter

We used nine detectors in DeepFake-O-Meter, a tool developed by the University at Buffalo’s Media Forensics Lab (UB MDFL), which said the video was “likely AI-generated”. The scores were 100%, 99.9%, 99.7%, 98.9%, 76.1%, 69.4%, 67.1%, 57%, and 49.4%, respectively.

Results from different detectors available on DeepFake-O-Meter

Soch Fact Check also tested the viral clip in GODDS, a tool developed by Northwestern University’s Security & AI Lab (NSAIL) that uses a combination of various models along with human analysis to provide a holistic summary of the results.

GODDS used 22 deepfake detection algorithms for the visual content and 70 for the audio component. Two trained analysts also examined the clip.

All predictive models for the visual and audio content said the video “is likely to be fake”:

The video is likely to be fake with a probability above 0.5, according to two of the 22 predictive models; it is likely to be fake with a probability below 0.5, according to the 20 other predictive models.
The audio is likely to be fake with a probability above 0.5, according to 64 of the 70 predictive models; it is likely to be fake with a probability below 0.5, according to the six remaining predictive models.

According to GODDS’ human analysts, the video contains “several indicators” that show it may be digitally manipulated via AI. They said that as Tharoor speaks, “his teeth seem to change shape and often appear blurry (e.g., 0:01, 0:06, 0:07, 0:10, 0:11, 0:17, 0:33, etc.)” and that his “jaw seems to move in an exaggerated manner, almost appearing puppet-like” throughout.

Interestingly, they also noted that at times, the politician’s neck “intersects with the collar of his shirt as he speaks (e.g., 0:05-0:07, 0:11-0:15, 0:20, 0:31, etc.)” and his “voice seems to lack natural tonal and cadence variations characteristic of human voices”.

Markers of AI manipulation in the content at different timestamps highlighted by GODDS

Moreover, the analysts observed that there are no reputable reports regarding Tharoor’s purported remarks. Had he actually said what the video shows, it would have made headlines given his political status, they added.

Soch Fact Check recently debunked another deepfake that showed him calling out the Indian government for its “strategic failure” as Pakistan mediated a ceasefire for the US-Israel war on Iran.

Sound engineer’s analysis

We also sought a comment from Shaur Azher, a lecturer who teaches sound design and sound recording at the University of Karachi and the Shaheed Zulfikar Ali Bhutto Institute of Science and Technology (SZABIST). He also works as an audio engineer at our sister organisation, Soch Videos, and specialises in mixing and mastering audio.

Azher said that for comparison purposes, Sample A is the claim and Sample B is the authentic audio clip uploaded by the Press Trust of India.

The “combined acoustic and computational evidence indicates that Sample A is not consistent with a natural, continuous human recording”, he explained. This, he added, indicates that it “has undergone significant artificial processing, reconstruction or synthesis prior to distribution.

“On the other hand, Sample B exhibits all the acoustic, dynamic, and environmental markers of an authentic field recording.”

He first provided a baseline technical assessment:

The frequency bands of Sample A are highly elevated in all regions whereas Sample B has natural elevation on all frequency bands. Sample A also spectral cluttering from 4,000 to 15,000 Hertz (Hz). It is important to note here that videos downloaded from X have a frequency limiter that caps frequencies at 15,000 Hz)
The traffic noise in Sample A is really subtle and more aligned with that of Western countries whereas in Sample B, it seems very unpredictable — common in South Asian nations — as movement of cars and the sounds of bikes and horns are audible in the background.
In Sample B, Tharoor has a very distinctive voice quality; when his pitch goes lower, his voice seems to break a little bit, whereas in Sample A, his voice is slightly monotonic.
The dynamics of Sample A have a True Peak (TP) of -1.9 db, with a Loudness Units relative to Full Scale (LUFS) of -14 decibels (db), whereas those of Sample B have a TP of -3.8 db, with a LUFS of -19.8 db. The difference in dynamics shows these are two distinct audios.

Spectrogram and LUFS of Samples A (top) and B (bottom), provided by Azher

Then, he provided the following observations to support his conclusion:

Environmental and room tone fingerprint

Sample A features a subtle, homogenous background hum. The lack of acoustic interaction between the voice and the background noise suggests the traffic was artificially layered underneath a dry, studio-rendered vocal track rather than being captured simultaneously through the same microphone capsule.

Sample B exhibits a complex, unpredictable noise floor characteristic of an authentic, unshielded recording environment (likely South Asian/Indian traffic patterns). The inclusion of horns, localised engine noise, and natural acoustic reflections grounds the subject in a real physical space.

Vocal characteristics and synthetic indicators

Breath signature comparison: Sample B contains audible natural respiratory gating inhalations and subsequent exhalations that directly correspond to the phrasing in the speech. Sample A lacks realistic respiratory mechanics and pauses are digitally absolute rather than containing the physiological sounds of a human preparing to speak.

Phase coherence analysis: The visual spectral cluttering identified in Sample A from 4,000 to 15,000 Hz is a common artefact of neural vocoders (like HiFi-GAN or WaveGlow) used in AI voice generation. These vocoders often struggle to perfectly reconstruct phase alignment in higher frequencies, resulting in a slightly metallic or smeared high-end response compared to the natural harmonic decay in Sample B.

Sample A: 0.00195
Sample B: 0.00108
Sample A: Lower harmonic integrity and slightly more noise-like spectral behaviour
Higher spectral flatness creates more noise and less harmonic coherents

Micro-dynamics: The subject’s voice in Sample B breaks naturally as pitch lowers, showing organic vocal fry. Sample A maintains an unnatural, monotone energy level, lacking the emotional voice and micro variations expected in human speech.

Jitter and shimmer deviation

Human voices naturally exhibit an organic nature of vocal folds just like in Sample B. Sample A yields abnormally low jitter and shimmer as the neural network generates a mathematically perfect and overly stable waveform, resulting in the monotone quality as observed.

Sample A: 0.0611
Sample B: 0.0519
Sample A: shows abnormal stability patterns relative to its delivery style
Sample B: shows natural voices exhibit micro instability due to vocal fold vibration

Sample A shows abnormal stability patterns relative to its delivery style but Sample B depicts natural voices exhibit micro instability due to vocal fold vibration.

Cepstral coefficient deviation testing (analysis of Mel Frequency Cepstral Coefficients (MFCC)):

Sample A shows significant deviation in the higher-order coefficients. The synthetic model used to generate Sample A failed to accurately recreate the complex vocal tract resonances of the speaker, substituting them with the generalised spectral clutter as identified above 4,000 Hz.

The following are computed mean MFCCs (13 coefficients):

Sample A shows reduced variation and smoother coefficient distribution
Sample B shows greater deviation in higher order coefficients
Sample A indicates smoothed or generalised spectral representation
Sample B reflects natural vocal tract complexity

Interestingly, we also found that Tharoor himself has warned of “an alarming number of deepfakes” targeting him in an X (formerly Twitter) post.

“There are an alarming number of deepfake videos circulating of me, with convincing-sounding AI generated voice-overs over genuine footage of old interviews, having ‘me’ saying things I have never said. Disappointed that so many on social media are believing these lies and issuing baseless comments attacking me for purported views that I have not expressed.

“One simple rule of thumb: if a statement (video or otherwise) doesn’t appear on my timeline nor on that of the purported interviewer/media source, it’s fake news. Thank you for your attention to this matter!”

Soch Fact Check, therefore, concludes that the viral video was doctored using AI tools.

Virality

Soch Fact Check found the claim circulating here, here, here, here, here, and here on Facebook, here, here, here, here, and here on Instagram, and here and here on Threads.

It was also shared here, here, and here on X. Former Dawn journalist Sanaullah Khan also posted the video — which garnered over 42,800 views — but deleted it later.

Conclusion: The video is a manipulated version of an authentic interview and has been doctored using AI tools.

Background image in cover photo: ShashiTharoor

To appeal against our fact-check, please send an email to appeals@sochfactcheck.com