
Claim: Three videos show Pakistani soldiers criticising the armed forces’ actions, as well as Field Marshal Syed Asim Munir’s allegedly authoritarian leadership. The first shows an officer telling another that there was a need to take “a decisive step”. Two other clips depict senior officers sitting among colleagues while speaking on phone and lamenting the military’s allegedly violent actions against the public. They admit to involvement in enforced disappearances, political engineering, and seizing the judiciary. They also reveal “inside” news about the army chief’s time in power coming to an end.
Fact: All three videos were generated using artificial intelligence tools. An analysis by Soch Fact Check, as well as observations by a media expert and a sound engineer, confirm that the clips are created synthetically. The “original” voices are not attributable to any particular individual either.
In July 2025, three different videos emerged on social media, showing Pakistani soldiers criticising Field Marshal Syed Asim Munir for his alleged misuse of authority and discussing apparent discontent among the armed forces (archived here, here, and here).
In the clips, the officers are talking about the need to take “a decisive step” against the army chief and revealing “inside” news that his time in power is coming to an end.
The soldiers are also heard lamenting the army’s alleged violent and retaliatory actions against the public and admitting to involvement in human and civil rights violations such as enforced disappearances, political engineering, and seizing control of the judiciary.
In the first video — which was posted on 17 July — a soldier tells his colleague: “Sir, we had sworn to protect the country, not safeguard a single individual’s power. If we remain silent over the Field Marshal’s additional authority, then history will not forgive us. The time has come, sir, to take a decisive step.”
The accompanying caption, which includes the hashtags “#PTIofficial,” “#PTI,” and “#ImranKhanPTI,” is as follows:
“فیلڈ مارشل کو ہٹانا اب ضروری ہوگیا ہے
[It has now become necessary to remove the Field Marshal.]”
The same text is also written on top of the video. However, the words “creative content” can be seen at the bottom.
In the second clip — which was shared on 19 July — an officer is heard speaking to someone over the phone, saying: “Sir, once upon a time, we used to proudly say that we are soldiers. Now we hide in shame because you gave us guns, turned us into butchers of the law, and made us criminals of the public. We are reaping what you sowed, sir. Sir, the ‘inside’ news is that your power is about to come to an end, sir. We have a confirmed report about it, sir.
The accompanying caption, which includes the same hashtags as above, is as follows:
“ہم فوجی اب شرم کے مارے چھپے رہتے ہیں
[We soldiers now hide out of shame.]”
The text on top of the video reads, “پکی رپورٹ ہے سر آپ کا اقتدار ختم ہونے والا ہے۔ [It is confirmed that your power is about to end.]” The one on the bottom states, “Creative Content : Fictional AI Character. Voice is original. For educational use only.”
The third video — which was posted on 3 July — also shows a soldier talking on the phone, stating: “Sir, we wore the [Pakistan Army] uniform for honour, not to open fire on the public. Enforced disappearances, political engineering, and seizing the judiciary… these are not war achievements, sir; they are stains of embarrassment. It looks like your power is coming to an end, sir.”
On the top of this clip is the text, “آپ کا اقتدار اب ختم ہوتا ہوا نظر آ رہا ہے سر [It looks like your power is coming to an end, sir].” The same disclaimer as in the second video is also included in this one.
Field Marshal Asim Munir
Pakistan’s army chief was promoted to the rank of Field Marshal in May 2025 after what Prime Minister Shehbaz Sharif said was “his exemplary leadership during Operation Bunyan-um-Marsoos”, which marked the country’s response to India during the most severe military conflict between the two nuclear-armed nations in almost three decades.
Munir is the only other military officer to hold the five-star rank in Pakistan’s history. The first one was the late General Ayub Khan, who “made himself a field marshal in 1965”, Reuters reported. The position, however, is a ceremonial title.
Al Jazeera cited critics of the development as saying that “the promotion ultimately boils down to the political calculations of the government and the military”.
The comments in the second and third videos — particularly that the Pakistan Army soldiers “open[ed] fire on the public” and that Munir turned them into “butchers of the law” and “criminals of the public” — are a reference to claims that the law enforcement agencies (LEAs) shot and killed protesters in Islamabad last year. The November 2024 demonstration was aimed at demanding the release of the incarcerated former Prime Minister and founder of the Pakistan Tehreek-e-Insaf (PTI), Imran Khan, who has been in jail since August 2023.
Other comments in the third video, specifically about “enforced disappearances, political engineering, and seizing the judiciary”, are a reference to accusations against the military of rights abuses and meddling in civilian affairs, such as the judiciary, over the years.
Fact or Fiction?
Soch Fact Check investigated these videos in spite of the text “creative content” disclaimers because many social media users appear to believe they are real or the voices authentic.
We did not find any credible reports in reputable media outlets about these clips. Had such videos or recordings been leaked and made public, they would have made headlines.
We also observed multiple inconsistencies in all three videos, in line with content generated using artificial intelligence (AI) tools.
In all the clips, the men appear to be frozen from neck-down, with only their heads bobbing about — as opposed to exhibiting a natural, fluid movement — and distorted faces as if a head-focused filter had been applied. No natural movement is observed in their bodies.
Some individuals also seem to lack eyes in their sockets and those who do have them carry hollow, lifeless expressions. The lips of the soldiers in the foreground are not in sync with what they are heard saying.
In the first video, the tags on the shirt of the person on the left show “ARMY” and “ARMY”, while those on the shirt of the person to the right have “BRIGIDER” and “PAKISTAN” written on them. They usually originally include the officer’s name and his affiliation with the Pakistan Army, written as “PAK ARMY”.
Moreover, the spelling of brigadier is incorrect. Both the incorrect tags and spellings are hallmarks of AI-generated content.
Additionally, the hair of the man on the right is cut off slightly at one point and the area behind his head is blurred and moves slightly alongside. There are no name tags or the words “PAK ARMY” written on the shirts of the individuals on the right. The hand of the right-most man is also distorted.
In the second video, the stars on the collar of the soldier in the foreground — depicting the rank of an officer — appear to be crossed out or joined by a line. His hands or his phone don’t move either while he is talking.
In the third video, the tags originally seen on the Pakistan Army’s uniforms — containing the soldiers’ names and their affiliation, “PAK ARMY” — do not appear on the shirt of the man in the centre; in fact, the text is illegible. His phone or hands do not move naturally either. The Pakistani flag’s patch on the shoulders of the two men on the right has a circle instead of a crescent and a five-pointed star.
The hand of the man second from left is distorted, while the arms of the second and third men on the right appear to be joined above their elbows.
Deepfake audio detection tools
Soch Fact Check also ran the videos through multiple AI detection tools.
The first tool we tested is DeepFake-O-Meter, developed by the University at Buffalo’s Media Forensics Lab (UB MDFL). Of the available audio/video detectors, we used 11; these check face forensics, face forgery, and face-swapping, perform frame-level predictions, and analyse lip movements, among others.
Their results are available below:
In order to understand the results better, we reached out to UB MDFL Director Dr Siwei Lyu, the principal investigator (PI) of the DeepFake-O-Meter and a professor at the university’s Department of Computer Science and Engineering.
“The three AI videos were generated with text-to-image and lip-synced using text-to-speech audio. The detection algorithms look for artifacts of the generation process. Some artifacts may be visible, e.g, Only the heads/neck areas move while the rest of the body and background remain static. This unnatural separation is typical in AI-generated videos; misspelling of normal words,” Dr Lyu explained.
“There are also invisible artifacts that cannot be perceived by the human eye but can be revealed through algorithms, much like how X-rays allow doctors to see beneath the surface; these hidden traces arise from the generative processes of AI models and can be used to analyze AI-generated videos,” he added.
Dr Lyu also provided visual examples of two glaring errors in the three videos:
- Spelling error (“BRIGIDER” instead of “BRIGADIER”) and blurred (1 example)
- Wrong generation of national flags (3 examples) – Flag symbols vary in shape, alignment, and clarity across three videos.

Visual examples of inconsistencies provided by UB MDFL Director Dr Siwei Lyu, who is also the DeepFake-O-Meter’s principal investigator.
The Hiya Deepfake Voice Detector revealed that there is a 97% chance all three videos were AI-generated, whereas the Zhuque AI Detection Assistant said the probabilities that the first, second, and third clips were synthetically created are 67.8%, 84.6%, and 31.4%, respectively.
According to Hive Moderation, there was 100%, 100%, and 93% likelihood that the first, second, and third videos were AI-generated.
AI Speech Classifier by ElevenLabs showed a 2% probability of the audio being generated using its tools.
Deepware Scanner turned up a result for only the first video; two of its detection algorithms — Seferbekov and Ensemble — said the clip was a deepfake, with scores of 94% and 86%, respectively. For the other two, the scan failed.
Sound engineer’s analysis
Soch Fact Check also sought a comment from Shaur Azher, a lecturer who teaches sound design and sound recording at the University of Karachi and the Shaheed Zulfikar Ali Bhutto Institute of Science and Technology (SZABIST). He also works as an audio engineer at our sister organisation, Soch Videos, and specialises in mixing and mastering audio.
Azher explained that based on his technical and spectral observations of the first and second videos, he has “high confidence that the vocal component of the submitted audio sample exhibits characteristics consistent with AI-generated dialogue, specifically produced through a neural text-to-speech (TTS) system”.
“The combination of flat prosody, lack of reverberation, spectral shaping, and artificial ambience insertion strongly supports this conclusion,” he added, explaining the elements — such as stress and intonation, frequency analysis, and environmental noise — that are indicative of synthetically created content.
He provided the following technical observations to back up his analysis of the first and second videos.
- Dialogue flow & prosody: The dialogue presents a synthetic delivery pattern, marked by a monotone vocal cadence with minimal natural inflection or dynamic emotional variation. Prosodic features such as stress, rhythm, and pitch variance are limited, indicating a lack of human speech spontaneity.
- Spatial characteristics: No environmental or room reverb is present in the recording. The vocal signal appears acoustically isolated, consistent with a signal generated without a physical recording space.
- Ambient sound injection: A constant artificial ambient sound, resembling air conditioning or HVAC [Heating, Ventilation, and Air Conditioning], is layered beneath the dialogue. This ambient layer appears digitally inserted and serves to simulate environmental realism or mask synthetic tonal purity.
- Spectral analysis: The signal exhibits significant spectral processing, particularly in the high-frequency range beyond 5000 hertz (Hz). Gaps and discontinuities are observed in the upper spectrum, indicating possible frequency band suppression or aliasing artifacts commonly found in neural text-to-speech (TTS) outputs. Energy distribution is heavily concentrated between 20 Hz and 2000 Hz whereas, in the second video, it’s 20 Hz till 3100 Hz, after which a progressive spectral roll-off occurs—another typical hallmark of synthetic voice rendering.
With regard to the neural TTS indicators, he provided the following list:
- Flat delivery: Absence of expressive dynamics, emotional tone or voice modulation.
- Lack of acoustic space: Absence of real-world spatial cues, such as room reflections or mic bleed.
- High-frequency spectral decay: Notable attenuation [or reducing amplitude] and inconsistencies in frequencies above 5,000 Hz.
- Excessive post-processing: Presence of noise shaping, band limiting, and potential anti-aliasing operations.
- Noise masking via ambient injection: Addition of artificial ambient layers to obscure or normalise the unnatural clarity of synthetic speech.
As for the third video, Azher said the audio component “strongly exhibits characteristics consistent with AI-generated speech, most likely AI TTS rather than a natural human recording”.
The technical observations behind his analysis of the third video are as follows:
- Synthetic frequency distribution: Low-frequency range (20 Hz-1,600 Hz) exhibits significantly higher energy. Midrange frequencies gradually diminish up to 8,000 Hz. High frequencies fade out almost completely by 15,000 Hz, with no meaningful content beyond this point. Alternatively, there is the presence of a lot of bass.
- Acoustic and tonal observations: Absence of natural room tone or environmental ambience. No perceptible room reverb or sense of spatial distance. Voice delivery remains unnaturally monotone with minimal pitch variation.
- Post-processing indicators: Evidence of compression and contouring [neutralising pitch] applied to maintain uniform loudness and flow. Dialogue pitch and pacing appear artificially stabilised.
Media expert’s observations
Soch Fact Check also spoke to Asad Baig, a former journalist who is now a media expert and founder of the not-for-profit Media Matters for Democracy.
Baig observed that none of the three videos contains any Coalition for Content Provenance and Authenticity (C2PA) metadata stamp, which he explained is a content verification standard adopted at the end of 2021 by many platforms.
“It’s timestamping for authentication; how was it [content] recorded, what device was used to record it, [and] if it was edited using Adobe,” he said, adding that there were separate standards for various products, such as Leica, Sony, Nikon, etc. “So it is a track record of where the audio or video came from, i.e. its origin.”
Baig explained that there was no issue or modification date in the metadata of any of the three videos. “If the audio was original or integrated [superimposed] over it, there would have been some integration date or C2PA data, so the audio is not original, technically speaking,” he said.
“The third point is that if you analyse the [Facebook] page sharing these videos, all three videos have a consistently heavy Pukhtoon accent; if a Pushto-speaking person talks in Urdu, that’s how they would sound,” he added.
The audios are “too clean”, he noted, adding that “nobody talks like that unless they’re talking on a television channel or recording themself”.
Two of the videos contain a disclaimer, stating that they are “creative content”, using “fictional AI character[s]”, and that the “voice is original”. However, the media expert said, “When it says the voice is original, it does not say that the voice is original of an X person; it’s the person himself who’s editing these videos is recording the voice [and] that means that the voice is original — which it is — and the content is AI-produced. So nowhere it actually says that the voice is attributed to a certain person.
“For example, [if] there’s a leaked audio of Imran Khan, so the voice is attributed to Imran Khan. But, here, there’s no attribution, just the indication that the voice is original, which may be the original of the person who’s recording it.”
Baig concluded that according to metadata and visual analyses, there was a 100% probability that the videos and audio are both AI-generated. They’re “just completely fake”, he said. “There’s no C2PA for audio or video. There’s absolutely zero metadata. No timestamping, which is extremely strange and very aligned with how the AI-generation tools work.”
Virality
Soch Fact Check found the videos circulating here, here, here, here, here, and here on Facebook.
The first video was also posted on X (formerly Twitter) here.
The second and third videos were also apparently shared here and here on TikTok, according to results from a Google search using relevant keywords; however, since @17._.armychief — the account that posted them — appears to have been deleted, they are neither accessible nor archived.
We also noticed that the three social media users who posted the videos on Facebook and X appear to be hardline supporters of Khan and his party, the PTI.
Conclusion: The three videos are AI-generated. An analysis by Soch Fact Check, as well as observations by a media expert and a sound engineer, confirm that the clips were created synthetically. The “original” voices are not attributable to any particular individual either.
Background image in cover photo: Soch Fact Check & Soch Videos
To appeal against our fact-check, please send an email to appeals@sochfactcheck.com