We present NeRV-Diffusion, an implicit latent diffusion model that generates neural network weights. The generated weights can be rearranged into a convolutional neural network, which forms an implicit neural representation (INR), and decode into a video with time indices as the input. Our framework consists of two stages: 1) An encoder using hypernetwork to compress raw videos from pixel space to parameter space, whose output parameters modulate the INR weights as the decoder to reconstruct the videos. They are trained as a variational autoencoder. 2) A denoising transformer which performs the diffusion process on the encoded parametric latent, mapping Gaussian noise to INR weights. Unlike traditional latent video diffusion models that work on frame-wise feature maps, NeRV-Diffusion generates videos as holistic neural networks, featuring flexible temporal interpolation and high compression efficiency, and enabling fast, independent decoding. To achieve Gaussian-distributed neural network weights with high expressiveness, we reuse the bottleneck latent across all NeRV layers and redesign its weight assignment and input coordinates. Moreover, we introduce an inverse SNR loss weight and implement scheduled sampling to train the denoising network effectively. Our model reaches superior video generation quality compared to previous implicit generative models, with a compact INR size and a smooth interpolatable weight space.