前回設計したMini16-CPUを使ってメニーコア構成のSoCを実装してみました。
1コアあたりのリソース消費が少ないため、Terasic DE0-CVで33コア、BeMicro-CVA9では171コアのプロセッサーを実装できました。
これらはFPGAの乗算器ブロックの搭載量で決まる上限数で、ロジックリソース自体はまだ余裕があります。
レジスター、データ幅は32bitの構成で、Mini16-CPUのオプションもほぼ全て有効にしています。
16bit版Processor Elementの最小構成のコアを使った場合、BeMicro-CVA9で501コアを実装できました。
SoC構成
- マスターコア
Mini-16コアの制御用CPUです。
- 演算用コア「Processor Element」
Mini-16コア、命令メモリー、データメモリー、マスターコアからのデータ入力用メモリー、出力データ用FIFOからなるブロックです。これが多数マスターコアから制御される形になります。
- Processor Elementからのデータを転送する「Harvester(収穫機)」
全PEから出力されるデータを指定されたアドレスに自動転送していくブロックです。PEの出力FIFOには転送先アドレスとデータを一組にしたデータが積まれているので、それをラウンドロビンで各PEから回収して指定アドレスに転送していきます。転送先では通常の連続したメモリーに格納されるので読み出し側は高速にアクセスできます。今回の構成ではアドレスバンクによって映像出力用のフレームバッファとPEからマスターへの演算結果出力用メモリーに振り分けられます。
- UARTインターフェース
PCに接続されたUSBシリアルコンバーター(ケーブル)を介してFPGA側回路との通信を行うためのインターフェースです。PCからプログラムやデータをマスターコアに転送したり、マスターコアにリセットをかけることができます。マスターコア側からPCのターミナルへ文字列等を送信することもできます。PEへのプログラムやデータの転送、リセットはマスターコアのプログラムから行います。
- VGAインターフェース
フレームバッファの画像データをアナログRGB出力します。
ターゲットボードについて
このプロジェクトは以下のFPGAボードに対応しています。
Terasic DE0-CV
BeMicro-CVA9
Kria KV260
I/O電圧のジャンパ設定について
●BeMicro CV A9の場合
BeMicro CV A9ではボードのI/O電圧を3.3Vに設定することを前提にしています。
BeMicro CV A9 Hardware Reference Guide
のp.23を参照してVCCIO選択ジャンパ (J11)のpin 1とpin 2が接続されていることを確認してください。
論理合成・実行方法
ソースコードのダウンロード:mini16_manycore.tar.gz
ターミナルで、
tar xf mini16_manycore.tar.gz
各ボードのディレクトリに移動してmakeします。
cd mini16_manycore/ボードのディレクトリ名
make
その後、各社のツールでプロジェクトファイルを開いて合成、転送します。
プロジェクトファイル:
Terasic DE0-CV: mini16_manycore/de0-cv/DE0_CV_start.qpf
BeMicro-CVA9: mini16_manycore/bemicro_cva9/bemicro_cva9_start.qpf
Kria KV260: mini16_manycore/kv260/project_1/project_1.xpr
DE0-CV、BeMicro-CVA9の場合: クロックを高めに設定しているので合成ツールのランダムシードによってはTiming metにならない場合があります。この場合はQuartusのAssignments:Settings:Compiler Settings:Advanced Settings:Fitter Initial Placement Seedを1ずつ増やして何度か試してみてください。だいたい10回以内に「当たり」の配置配線が出るはずです。
Verilogシミュレータ「Icarus Verilog」でのシミュレーション
「Icarus Verilog」を使えばFPGAボードがなくても開発・シミュレーションを行うことができます。
「Icarus Verilogコンパイラを使う」の方法で iverilog と gtkwave をインストールし、
cd mini16_manycore/testbench
make run
(16bit版の場合は make run16)
でシミュレーションできます。出力された wave.vcd を gtkwave で開いて画面左側の信号リストから見たい信号を右側の波形画面へドラッグ&ドロップすれば信号波形を観察できます。
Raspberry Pi、PCとの接続
Raspberry Pi、もしくはUSBシリアルケーブルを接続したPCからFPGAにUARTで接続して、プログラムの転送、実行を行えるようにしました。
- このプロジェクトにおける各ボードごとのUARTピン配置
- Raspberry Pi 3の場合
以下のように接続します。TXDとRXDはクロス接続となっていることに注意してください。
RPi RXD0 ---- FPGA UART_TXD
RPi TXD0 ---- FPGA UART_RXD
RPi GND ---- FPGA GND
参考:Raspberry Pi ピン配置図
写真と実際の配線は異なります。(仕様を変更しました。)
クリックして拡大
Raspberry PiのUART端子をRaspberry Pi側から外部デバイスに向けて使用できるように設定します。
ターミナルで、
sudo raspi-config
Interfacing Options: Serial: Would you like a login shell to be accessible over serial?: No
Would you like the serial port hardware to be enabled?: Yes
設定を保存、raspi-configを閉じて、
sudo rnano /boot/config.txt
以下の設定をファイル末尾に追加して保存します。
dtoverlay=pi3-miniuart-bt
SC1-CPUのプログラム転送ツールなどでデバイス名の指定を省略できるようにする設定です。
rnano ~/.bashrc
ファイル末尾に追加
export UART_DEVICE=/dev/ttyAMA0
sudo reboot
- PCの場合
PCに接続する場合、USBシリアルケーブルが別途必要です。(FTDI TTL-232R-3V3など。必ず3.3V 仕様のものを使ってください。電圧が異なるものを使うと最悪FPGAが壊れます。)
FTDI TTL-232R-3V3 にもVCC 5Vのピンが1本あるので、これを間違えて接続しないよう注意してください。
これを以下のように接続します。
TXDとRXDはクロス接続となっていることに注意してください。
シリアルケーブル RXD ---- FPGA UART_TXD
シリアルケーブル TXD ---- FPGA UART_RXD
シリアルケーブル GND ---- FPGA GND
FTDI TTL-232R-3V3の場合、以下のようにudevのパーミッションを設定します。他機種の場合は、idVendor、idProductを読み替えてください。(USBで接続してからlsusbコマンドを打つと調べられます。ID idVendor:idProductの順です。)
sudo rnano /etc/udev/rules.d/99-ft232.rules
KERNEL=="ttyUSB*", ATTRS{idVendor}=="0403", ATTRS{idProduct}=="6001", GROUP="plugdev", MODE="0666", SYMLINK+="ttyUART"
sudo udevadm control --reload-rules
rnano ~/.bashrc
ファイル末尾に追加
export UART_DEVICE=/dev/ttyUART
その他のI/Oの接続
UART経由でのプログラムの転送、実行
上記のように設定したRaspberry PiまたはPCで、
cd mini16_manycore/ボードのディレクトリ名
make run
これでツールのコンパイル、プログラムのコンパイル、転送、実行が行われます。
このCPUでプログラミングする方法
mini16_manycore/asm 以下にJava上で動作する簡易アセンブラが入っています。
実行にはOpenJDK 8.0以上のインストールが必要です。
AsmLibクラスを継承したクラスを作り、init()で初期化設定、program()にプログラム、data()にデータを記述します。AsmTop.javaも修正します。
mini16_manycoreディレクトリに移動して make を実行するとプログラム・バイナリ(default_code_mem.v, default_data_mem.v)が出力されます。
UART使用時は make run を実行するとビルド後に転送されます。
並列化プログラムの例:マンデルブロ集合の描画
mini16_manycore/asm 以下にマンデルブロ集合を描画するデモプログラムが入っています。
MasterProgram.java がマスターコア用プログラムで、PEの制御を行います。
PEProgram.java がPE用プログラムで、マンデルブロ集合の計算とフレームバッファへの描画を行います。
BootProgram.java はPE用プログラムをPEに転送するマスターコア用プログラムです。UARTインターフェース使用時はまずこれが走り、次に MasterProgram.java のプログラムが走るようになっています。(mini16_manycore/tools/Makefile 参照)
PCとUARTで接続している場合は、mini16_manycore ディレクトリ以下で make run すると全てのプログラムがコンパイルされて転送、実行されます。
16bit版の場合はマンデルブロ集合の代わりに画面を色で塗りつぶすプログラムが動きます。
デフォルトでは垂直同期を待たずに描画するようになっているので、ちらつく縞模様が見えます。
asm/MasterProgram.java の
private int DEBUG = 0; を1にするとウェイトがかかり、フレーム数がUARTで出力されます。
private int WAIT_VSYNC = 0; を1にすると垂直同期を待ってから次のフレームを描画するので縞模様がなくなります。
ソースコード
これらのソースコードはBSD 2-Clauseライセンスで公開します。
全てのソースコードはmini16_manycore.tar.gzをダウンロードするか、
https://github.com/miya4649/mini16_manycoreを参照してください。
mini16_cpu.v : CPU本体
/*
Copyright (c) 2018-2019, miya
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
module mini16_cpu
#(
parameter WIDTH_I = 16,
parameter WIDTH_D = 16,
parameter DEPTH_I = 8,
parameter DEPTH_D = 8,
parameter DEPTH_REG = 5,
parameter REGFILE_RAM_TYPE = "auto",
parameter ENABLE_MVIL = 1'b0,
parameter ENABLE_MUL = 1'b0,
parameter ENABLE_MULTI_BIT_SHIFT = 1'b0,
parameter ENABLE_MVC = 1'b0,
parameter ENABLE_WA = 1'b0,
parameter ENABLE_INT = 1'b0,
parameter FULL_PIPELINED_ALU = 1'b0
)
(
input clk,
input reset,
input soft_reset,
output reg [DEPTH_I-1:0] mem_i_r_addr,
input [WIDTH_I-1:0] mem_i_r_data,
output reg [DEPTH_D-1:0] mem_d_r_addr,
input [WIDTH_D-1:0] mem_d_r_data,
output reg [DEPTH_D-1:0] mem_d_w_addr,
output reg [WIDTH_D-1:0] mem_d_w_data,
output reg mem_d_we
);
localparam TRUE = 1'b1;
localparam FALSE = 1'b0;
localparam ONE = 1'd1;
localparam ZERO = 1'd0;
localparam FFFF = {WIDTH_D{1'b1}};
localparam SHIFT_BITS = $clog2(WIDTH_D);
localparam BL_OFFSET = 1'd1;
localparam DEPTH_OPERAND = 5;
// opcode
localparam I_NOP = 5'h00; // 5'b00000;
localparam I_ST = 5'h01; // 5'b00001;
localparam I_MVC = 5'h02; // 5'b00010;
localparam I_BA = 5'h04; // 5'b00100;
localparam I_BC = 5'h05; // 5'b00101;
localparam I_WA = 5'h06; // 5'b00110;
localparam I_BL = 5'h07; // 5'b00111;
localparam I_ADD = 5'h08; // 5'b01000;
localparam I_SUB = 5'h09; // 5'b01001;
localparam I_AND = 5'h0a; // 5'b01010;
localparam I_OR = 5'h0b; // 5'b01011;
localparam I_XOR = 5'h0c; // 5'b01100;
localparam I_MUL = 5'h0d; // 5'b01101;
localparam I_MV = 5'h10; // 5'b10000;
localparam I_MVIL = 5'h11; // 5'b10001;
localparam I_LD = 5'h17; // 5'b10111;
localparam I_SR = 5'h18; // 5'b11000;
localparam I_SL = 5'h19; // 5'b11001;
localparam I_SRA = 5'h1a; // 5'b11010;
localparam I_CNZ = 5'h1c; // 5'b11100;
localparam I_CNM = 5'h1d; // 5'b11101;
// special register
localparam SP_REG_CP = 0;
localparam SP_REG_MVIL = 1;
// debug
`ifdef DEBUG
reg [DEPTH_I-1:0] mem_i_r_addr_d1;
reg [DEPTH_I-1:0] mem_i_r_addr_s1;
always @(posedge clk)
begin
mem_i_r_addr_d1 <= mem_i_r_addr;
mem_i_r_addr_s1 <= mem_i_r_addr_d1;
end
`endif
// stage 1 fetch
reg [WIDTH_I-1:0] inst_s1;
wire [DEPTH_OPERAND-1:0] reg_d_s1;
wire [DEPTH_OPERAND-1:0] reg_a_s1;
wire [4:0] op_s1;
wire is_im_s1;
assign reg_d_s1 = inst_s1[15:11];
assign reg_a_s1 = inst_s1[10:6];
assign is_im_s1 = inst_s1[5];
assign op_s1 = inst_s1[4:0];
generate
if (ENABLE_WA == TRUE)
begin
always @(posedge clk)
begin
if (reset == TRUE)
begin
inst_s1 <= ZERO;
end
else
begin
if (wait_en_s2 == TRUE)
begin
inst_s1 <= ZERO;
end
else
begin
inst_s1 <= mem_i_r_data;
end
end
end
end
else
begin
always @(posedge clk)
begin
if (reset == TRUE)
begin
inst_s1 <= ZERO;
end
else
begin
inst_s1 <= mem_i_r_data;
end
end
end
endgenerate
// stage 2 wait counter
wire wait_en_s2;
reg [4:0] wait_count_m1;
reg [9:0] wait_counter_s2;
generate
if (ENABLE_WA == TRUE)
begin
assign wait_en_s2 = (wait_counter_s2 == ZERO) ? FALSE : TRUE;
always @(posedge clk)
begin
if (reset == TRUE)
begin
wait_counter_s2 <= ZERO;
wait_count_m1 <= ZERO;
end
else
begin
if (op_s1 == I_WA)
begin
wait_counter_s2 <= reg_a_s1;
wait_count_m1 <= reg_a_s1 - ONE;
end
else
begin
if (wait_en_s2 == TRUE)
begin
wait_counter_s2 <= wait_counter_s2 - ONE;
end
end
end
end
end
endgenerate
// stage 2 set reg read addr
reg [DEPTH_REG-1:0] reg_addr_a_s2;
reg [DEPTH_REG-1:0] reg_addr_b_s2;
generate
if (ENABLE_MVC == TRUE)
begin
always @(posedge clk)
begin
reg_addr_b_s2 <= reg_a_s1[DEPTH_REG-1:0];
if (op_s1 == I_MVC)
begin
reg_addr_a_s2 <= SP_REG_CP;
end
else
begin
reg_addr_a_s2 <= reg_d_s1[DEPTH_REG-1:0];
end
end
end
endgenerate
// stage 2 delay
reg [4:0] op_s2;
reg is_im_s2;
reg [DEPTH_OPERAND-1:0] reg_d_s2;
reg [DEPTH_OPERAND-1:0] reg_a_s2;
always @(posedge clk)
begin
op_s2 <= op_s1;
is_im_s2 <= is_im_s1;
reg_d_s2 <= reg_d_s1;
reg_a_s2 <= reg_a_s1;
end
// stage 3 set dest reg addr
reg [DEPTH_REG-1:0] reg_addr_d_s3;
always @(posedge clk)
begin
if (reset == TRUE)
begin
reg_addr_d_s3 <= ZERO;
end
else
begin
if ((ENABLE_MVIL == TRUE) && (op_s2 == I_MVIL))
begin
reg_addr_d_s3 <= SP_REG_MVIL;
end
else
begin
reg_addr_d_s3 <= reg_d_s2[DEPTH_REG-1:0];
end
end
end
// stage 3 delay
reg [4:0] op_s3;
reg is_im_s3;
reg [DEPTH_OPERAND-1:0] reg_a_s3;
always @(posedge clk)
begin
op_s3 <= op_s2;
is_im_s3 <= is_im_s2;
reg_a_s3 <= reg_a_s2;
end
reg [DEPTH_OPERAND-1:0] reg_d_s3;
generate
if (ENABLE_MVIL == TRUE)
begin
always @(posedge clk)
begin
reg_d_s3 <= reg_d_s2;
end
end
endgenerate
// stage 4 fetch reg_data
wire [WIDTH_D-1:0] reg_data_a_s_s3;
wire [WIDTH_D-1:0] reg_data_b_s_s3;
reg [WIDTH_D-1:0] reg_data_a_s4;
reg [WIDTH_D-1:0] reg_data_b_s4;
always @(posedge clk)
begin
reg_data_a_s4 <= reg_data_a_s_s3;
if (reset == TRUE)
begin
reg_data_b_s4 <= ZERO;
end
else
begin
if ((ENABLE_MVIL == TRUE) && (op_s3 == I_MVIL))
begin
reg_data_b_s4 <= {reg_d_s3, reg_a_s3, is_im_s3};
end
else if (is_im_s3 == TRUE)
begin
reg_data_b_s4 <= $signed(reg_a_s3);
end
else
begin
reg_data_b_s4 <= reg_data_b_s_s3;
end
end
end
// stage 4 load address
always @(posedge clk)
begin
if (reset == TRUE)
begin
mem_d_r_addr <= ZERO;
end
else
begin
if (op_s3 == I_LD)
begin
mem_d_r_addr <= reg_data_b_s_s3;
end
end
end
// stage 4 delay
reg [4:0] op_s4;
reg [DEPTH_REG-1:0] reg_addr_d_s4;
always @(posedge clk)
begin
op_s4 <= op_s3;
reg_addr_d_s4 <= reg_addr_d_s3;
end
// stage 5 execute store
always @(posedge clk)
begin
if (reset == TRUE)
begin
mem_d_w_addr <= ZERO;
mem_d_w_data <= ZERO;
mem_d_we <= FALSE;
end
else
begin
case (op_s4)
I_ST:
begin
mem_d_w_addr <= reg_data_a_s4;
mem_d_w_data <= reg_data_b_s4;
mem_d_we <= TRUE;
end
default:
begin
mem_d_w_addr <= ZERO;
mem_d_w_data <= ZERO;
mem_d_we <= FALSE;
end
endcase
end
end
// stage 5 calc BL address
reg [DEPTH_I-1:0] bl_addr_s5;
always @(posedge clk)
begin
bl_addr_s5 <= mem_i_r_addr + BL_OFFSET;
end
// stage 5 execute branch
wire cond_true_s4;
assign cond_true_s4 = (reg_data_a_s4 != ZERO) ? TRUE : FALSE;
always @(posedge clk)
begin
if (reset == TRUE)
begin
mem_i_r_addr <= ZERO;
end
else
begin
// branch
if ((ENABLE_INT == TRUE) && (soft_reset == TRUE))
begin
mem_i_r_addr <= ZERO;
end
else if ((op_s4 == I_BA) || (op_s4 == I_BL) || ((op_s4 == I_BC) && (cond_true_s4)))
begin
mem_i_r_addr <= reg_data_b_s4;
end
else if ((ENABLE_WA == TRUE) && (op_s4 == I_WA))
begin
mem_i_r_addr <= mem_i_r_addr - wait_count_m1;
end
else
begin
mem_i_r_addr <= mem_i_r_addr + ONE;
end
end
end
// stage 5 delay
reg [4:0] op_s5;
reg [DEPTH_REG-1:0] reg_addr_d_s5;
reg [WIDTH_D-1:0] reg_data_a_s5;
reg [WIDTH_D-1:0] reg_data_b_s5;
always @(posedge clk)
begin
op_s5 <= op_s4;
reg_addr_d_s5 <= reg_addr_d_s4;
reg_data_a_s5 <= reg_data_a_s4;
reg_data_b_s5 <= reg_data_b_s4;
end
reg cond_true_s5;
generate
if (ENABLE_MVC == TRUE)
begin
always @(posedge clk)
begin
cond_true_s5 <= cond_true_s4;
end
end
endgenerate
// stage 6 compare
reg flag_cnz_s6;
reg flag_cnm_s6;
always @(posedge clk)
begin
if (reg_data_b_s5 == ZERO)
begin
flag_cnz_s6 <= FALSE;
end
else
begin
flag_cnz_s6 <= TRUE;
end
if (reg_data_b_s5[WIDTH_D-1] == 1'b0)
begin
flag_cnm_s6 <= TRUE;
end
else
begin
flag_cnm_s6 <= FALSE;
end
end
// stage 6 reg we
reg reg_we_s6;
wire stage6_reg_we_cond;
generate
if (ENABLE_MVC == TRUE)
begin
assign stage6_reg_we_cond = ((op_s5[4:3] != 2'b00) || (op_s5 == I_BL) || ((op_s5 == I_MVC) && (cond_true_s5 == TRUE)));
end
else
begin
assign stage6_reg_we_cond = ((op_s5[4:3] != 2'b00) || (op_s5 == I_BL));
end
endgenerate
always @(posedge clk)
begin
if (stage6_reg_we_cond)
begin
reg_we_s6 <= TRUE;
end
else
begin
reg_we_s6 <= FALSE;
end
end
// stage 6 delay
reg [4:0] op_s6;
reg [DEPTH_REG-1:0] reg_addr_d_s6;
reg [WIDTH_D-1:0] reg_data_a_s6;
reg [WIDTH_D-1:0] reg_data_b_s6;
reg [DEPTH_I-1:0] bl_addr_s6;
always @(posedge clk)
begin
op_s6 <= op_s5;
reg_addr_d_s6 <= reg_addr_d_s5;
reg_data_a_s6 <= reg_data_a_s5;
reg_data_b_s6 <= reg_data_b_s5;
bl_addr_s6 <= bl_addr_s5;
end
// stage 6 pre-execute
reg [WIDTH_D-1:0] reg_data_add_s6;
reg [WIDTH_D-1:0] reg_data_sub_s6;
reg [WIDTH_D-1:0] reg_data_and_s6;
reg [WIDTH_D-1:0] reg_data_or_s6;
reg [WIDTH_D-1:0] reg_data_xor_s6;
generate
if (FULL_PIPELINED_ALU == TRUE)
begin
always @(posedge clk)
begin
reg_data_add_s6 <= reg_data_a_s5 + reg_data_b_s5;
reg_data_sub_s6 <= reg_data_a_s5 - reg_data_b_s5;
reg_data_and_s6 <= reg_data_a_s5 & reg_data_b_s5;
reg_data_or_s6 <= reg_data_a_s5 | reg_data_b_s5;
reg_data_xor_s6 <= reg_data_a_s5 ^ reg_data_b_s5;
end
end
endgenerate
// stage 7 execute
reg [WIDTH_D-1:0] reg_data_w_s7;
always @(posedge clk)
begin
case (op_s6)
I_ADD:
begin
if (FULL_PIPELINED_ALU == TRUE)
begin
reg_data_w_s7 <= reg_data_add_s6;
end
else
begin
reg_data_w_s7 <= reg_data_a_s6 + reg_data_b_s6;
end
end
I_SUB:
begin
if (FULL_PIPELINED_ALU == TRUE)
begin
reg_data_w_s7 <= reg_data_sub_s6;
end
else
begin
reg_data_w_s7 <= reg_data_a_s6 - reg_data_b_s6;
end
end
I_AND:
begin
if (FULL_PIPELINED_ALU == TRUE)
begin
reg_data_w_s7 <= reg_data_and_s6;
end
else
begin
reg_data_w_s7 <= reg_data_a_s6 & reg_data_b_s6;
end
end
I_OR:
begin
if (FULL_PIPELINED_ALU == TRUE)
begin
reg_data_w_s7 <= reg_data_or_s6;
end
else
begin
reg_data_w_s7 <= reg_data_a_s6 | reg_data_b_s6;
end
end
I_XOR:
begin
if (FULL_PIPELINED_ALU == TRUE)
begin
reg_data_w_s7 <= reg_data_xor_s6;
end
else
begin
reg_data_w_s7 <= reg_data_a_s6 ^ reg_data_b_s6;
end
end
I_SR:
begin
reg_data_w_s7 <= sr_result_s6;
end
I_SL:
begin
reg_data_w_s7 <= sl_result_s6;
end
I_SRA:
begin
reg_data_w_s7 <= sra_result_s6;
end
I_CNZ:
begin
reg_data_w_s7 <= {WIDTH_D{flag_cnz_s6}};
end
I_CNM:
begin
reg_data_w_s7 <= {WIDTH_D{flag_cnm_s6}};
end
I_BL:
begin
reg_data_w_s7 <= bl_addr_s6;
end
I_MUL:
begin
if (ENABLE_MUL == TRUE)
begin
reg_data_w_s7 <= mul_result_s6;
end
else
begin
reg_data_w_s7 <= reg_data_b_s6;
end
end
I_LD:
begin
reg_data_w_s7 <= mem_d_r_data;
end
// I_MV, I_MVIL
default:
begin
reg_data_w_s7 <= reg_data_b_s6;
end
endcase
end
// stage 7 delay
reg [DEPTH_REG-1:0] reg_addr_d_s7;
reg reg_we_s7;
always @(posedge clk)
begin
reg_addr_d_s7 <= reg_addr_d_s6;
reg_we_s7 <= reg_we_s6;
end
wire [DEPTH_REG-1:0] reg_file_addr_r_a;
wire [DEPTH_REG-1:0] reg_file_addr_r_b;
generate
if (ENABLE_MVC == TRUE)
begin
assign reg_file_addr_r_a = reg_addr_a_s2;
assign reg_file_addr_r_b = reg_addr_b_s2;
end
else
begin
assign reg_file_addr_r_a = reg_d_s2[DEPTH_REG-1:0];
assign reg_file_addr_r_b = reg_a_s2[DEPTH_REG-1:0];
end
endgenerate
r2w1_port_ram
#(
.DATA_WIDTH (WIDTH_D),
.ADDR_WIDTH (DEPTH_REG),
.RAM_TYPE (REGFILE_RAM_TYPE)
)
reg_file
(
.clk (clk),
.addr_r_a (reg_file_addr_r_a),
.addr_r_b (reg_file_addr_r_b),
.addr_w (reg_addr_d_s7),
.data_in (reg_data_w_s7),
.we (reg_we_s7),
.data_out_a (reg_data_a_s_s3),
.data_out_b (reg_data_b_s_s3)
);
wire [WIDTH_D-1:0] mul_result_s6;
generate
if (ENABLE_MUL == TRUE)
begin
delayed_mul
#(
.WIDTH_D (WIDTH_D)
)
delayed_mul_0
(
.clk (clk),
.a (reg_data_a_s4),
.b (reg_data_b_s4),
.out (mul_result_s6)
);
end
endgenerate
wire [WIDTH_D-1:0] sr_result_s6;
wire [WIDTH_D-1:0] sl_result_s6;
wire [WIDTH_D-1:0] sra_result_s6;
reg [WIDTH_D-1:0] sr_result_s6_reg;
reg [WIDTH_D-1:0] sl_result_s6_reg;
reg [WIDTH_D-1:0] sra_result_s6_reg;
generate
if (ENABLE_MULTI_BIT_SHIFT == TRUE)
begin
delayed_sr
#(
.WIDTH_D (WIDTH_D),
.SHIFT_BITS (SHIFT_BITS)
)
delayed_sr_0
(
.clk (clk),
.a (reg_data_a_s4),
.b (reg_data_b_s4[SHIFT_BITS-1:0]),
.out (sr_result_s6)
);
delayed_sl
#(
.WIDTH_D (WIDTH_D),
.SHIFT_BITS (SHIFT_BITS)
)
delayed_sl_0
(
.clk (clk),
.a (reg_data_a_s4),
.b (reg_data_b_s4[SHIFT_BITS-1:0]),
.out (sl_result_s6)
);
delayed_sra
#(
.WIDTH_D (WIDTH_D),
.SHIFT_BITS (SHIFT_BITS)
)
delayed_sra_0
(
.clk (clk),
.a (reg_data_a_s4),
.b (reg_data_b_s4[SHIFT_BITS-1:0]),
.out (sra_result_s6)
);
end
else
begin
always @(posedge clk)
begin
sr_result_s6_reg <= {1'b0, reg_data_a_s5[WIDTH_D-1:1]};
sl_result_s6_reg <= {reg_data_a_s5[WIDTH_D-2:0], 1'b0};
sra_result_s6_reg <= {reg_data_a_s5[WIDTH_D-1], reg_data_a_s5[WIDTH_D-1:1]};
end
assign sr_result_s6 = sr_result_s6_reg;
assign sl_result_s6 = sl_result_s6_reg;
assign sra_result_s6 = sra_result_s6_reg;
end
endgenerate
endmodule
module delayed_mul
#(
parameter WIDTH_D = 16
)
(
input clk,
input signed [WIDTH_D-1:0] a,
input signed [WIDTH_D-1:0] b,
output reg signed [WIDTH_D-1:0] out
);
reg signed [WIDTH_D-1:0] sa;
reg signed [WIDTH_D-1:0] sb;
always @(posedge clk)
begin
sa <= a;
sb <= b;
out <= sa * sb;
end
endmodule
module delayed_sr
#(
parameter WIDTH_D = 16,
parameter SHIFT_BITS = 4
)
(
input clk,
input [WIDTH_D-1:0] a,
input [SHIFT_BITS-1:0] b,
output reg [WIDTH_D-1:0] out
);
reg [WIDTH_D-1:0] sa;
reg [SHIFT_BITS-1:0] sb;
always @(posedge clk)
begin
sa <= a;
sb <= b;
out <= sa >> sb;
end
endmodule
module delayed_sl
#(
parameter WIDTH_D = 16,
parameter SHIFT_BITS = 4
)
(
input clk,
input [WIDTH_D-1:0] a,
input [SHIFT_BITS-1:0] b,
output reg [WIDTH_D-1:0] out
);
reg [WIDTH_D-1:0] sa;
reg [SHIFT_BITS-1:0] sb;
always @(posedge clk)
begin
sa <= a;
sb <= b;
out <= sa << sb;
end
endmodule
module delayed_sra
#(
parameter WIDTH_D = 16,
parameter SHIFT_BITS = 4
)
(
input clk,
input [WIDTH_D-1:0] a,
input [SHIFT_BITS-1:0] b,
output reg [WIDTH_D-1:0] out
);
reg signed [WIDTH_D-1:0] sa;
reg [SHIFT_BITS-1:0] sb;
always @(posedge clk)
begin
sa <= a;
sb <= b;
out <= sa >>> sb;
end
endmodule
mini16_pe.v : Processor Element
/*
Copyright (c) 2019, miya
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
module mini16_pe
#(
parameter WIDTH_D = 16,
parameter DEPTH_I = 8,
parameter DEPTH_D = 8,
parameter DEPTH_M2S = 8,
parameter DEPTH_FIFO = 7,
parameter CORE_ID = 0,
parameter MASTER_W_BANK_BC = 63,
parameter DEPTH_V_F = 16,
parameter DEPTH_B_F = 15,
parameter DEPTH_V_M = 17,
parameter DEPTH_B_M = 11,
parameter DEPTH_V_S_R = 10,
parameter DEPTH_B_S_R = 8,
parameter DEPTH_V_S_W = 9,
parameter DEPTH_B_S_W = 8,
parameter DEPTH_V_M2S = 9,
parameter DEPTH_B_M2S = 8,
parameter FIFO_RAM_TYPE = "auto",
parameter REGFILE_RAM_TYPE = "auto",
parameter M2S_RAM_TYPE = "auto",
parameter DEPTH_REG = 5,
parameter ENABLE_MVIL = 1'b1,
parameter ENABLE_MUL = 1'b1,
parameter ENABLE_MULTI_BIT_SHIFT = 1'b1,
parameter ENABLE_MVC = 1'b1,
parameter ENABLE_WA = 1'b1
)
(
input clk,
input reset,
input soft_reset,
input fifo_req_r,
output fifo_valid,
output [WIDTH_D+DEPTH_V_F-1:0] fifo_r_data,
input [DEPTH_V_M-1:0] addr_i,
input [WIDTH_D-1:0] data_i,
input we_i
);
localparam WIDTH_I = 16;
localparam TRUE = 1'b1;
localparam FALSE = 1'b0;
localparam ONE = 1'd1;
localparam ZERO = 1'd0;
localparam FFFF = {WIDTH_D{1'b1}};
wire [DEPTH_I-1:0] cpu_i_r_addr;
wire [WIDTH_I-1:0] cpu_i_r_data;
wire [DEPTH_V_S_W-1:0] cpu_d_r_addr;
reg [WIDTH_D-1:0] cpu_d_r_data;
wire [DEPTH_V_S_W-1:0] cpu_d_w_addr;
wire [WIDTH_D-1:0] cpu_d_w_data;
wire cpu_d_we;
wire [DEPTH_V_S_W-DEPTH_B_S_W-1:0] cpu_d_w_bank;
wire [DEPTH_V_S_R-DEPTH_B_S_R-1:0] cpu_d_r_bank;
// cpu data write
reg [DEPTH_D-1:0] mem_d_w_addr;
reg [WIDTH_D-1:0] mem_d_w_data;
reg mem_d_we;
assign cpu_d_w_bank = cpu_d_w_addr[DEPTH_V_S_W-1:DEPTH_B_S_W];
always @(posedge clk)
begin
mem_d_w_addr <= cpu_d_w_addr[DEPTH_D-1:0];
mem_d_w_data <= cpu_d_w_data;
s2mfifo_data_w <= {cpu_d_w_addr[DEPTH_V_F-1:0], cpu_d_w_data};
if (cpu_d_we == TRUE)
begin
case (cpu_d_w_bank)
0:
begin
// mem_d
mem_d_we <= TRUE;
s2mfifo_we <= FALSE;
end
default:
begin
// fifo
mem_d_we <= FALSE;
s2mfifo_we <= TRUE;
end
endcase
end
else
begin
mem_d_we <= FALSE;
s2mfifo_we <= FALSE;
end
end
// cpu data read
wire [DEPTH_D-1:0] mem_d_r_addr;
wire [WIDTH_D-1:0] mem_d_r_data;
wire [WIDTH_D-1:0] shared_m2s_r_data;
assign mem_d_r_addr = cpu_d_r_addr[DEPTH_D-1:0];
assign cpu_d_r_bank = cpu_d_r_addr[DEPTH_V_S_R-1:DEPTH_B_S_R];
always @(posedge clk)
begin
case (cpu_d_r_bank)
// mem_d
0: cpu_d_r_data <= mem_d_r_data;
// shared_m2s
1: cpu_d_r_data <= shared_m2s_r_data;
// register
default: cpu_d_r_data <= s2mfifo_item_count;
endcase
end
// data from master
reg shared_m2s_we;
reg mem_i_we;
reg [DEPTH_V_M-1:0] addr_i_d1;
reg [WIDTH_D-1:0] data_i_d1;
reg [DEPTH_V_M-1:0] addr_i_d2;
reg [WIDTH_D-1:0] data_i_d2;
reg we_i_d1;
wire [DEPTH_V_M-DEPTH_B_M-1:0] core_bank;
wire [DEPTH_V_M2S-DEPTH_B_M2S-1:0] m2s_bank;
assign core_bank = addr_i_d1[DEPTH_V_M-1:DEPTH_B_M];
assign m2s_bank = addr_i_d1[DEPTH_V_M2S-1:DEPTH_B_M2S];
always @(posedge clk)
begin
addr_i_d1 <= addr_i;
data_i_d1 <= data_i;
addr_i_d2 <= addr_i_d1;
data_i_d2 <= data_i_d1;
we_i_d1 <= we_i;
end
always @(posedge clk)
begin
if ((we_i_d1 == TRUE) && ((core_bank == CORE_ID) || (core_bank == MASTER_W_BANK_BC)))
begin
case (m2s_bank)
0:
begin
shared_m2s_we <= TRUE;
mem_i_we <= FALSE;
end
default:
begin
shared_m2s_we <= FALSE;
mem_i_we <= TRUE;
end
endcase
end
else
begin
shared_m2s_we <= FALSE;
mem_i_we <= FALSE;
end
end
mini16_cpu
#(
.WIDTH_I (WIDTH_I),
.WIDTH_D (WIDTH_D),
.DEPTH_I (DEPTH_I),
.DEPTH_D (DEPTH_V_S_W),
.DEPTH_REG (DEPTH_REG),
.ENABLE_MVIL (ENABLE_MVIL),
.ENABLE_MUL (ENABLE_MUL),
.ENABLE_MULTI_BIT_SHIFT (ENABLE_MULTI_BIT_SHIFT),
.ENABLE_MVC (ENABLE_MVC),
.ENABLE_WA (ENABLE_WA),
.ENABLE_INT (TRUE),
.FULL_PIPELINED_ALU (FALSE),
.REGFILE_RAM_TYPE (REGFILE_RAM_TYPE)
)
mini16_cpu_0
(
.clk (clk),
.reset (reset),
.soft_reset (soft_reset),
.mem_i_r_addr (cpu_i_r_addr),
.mem_i_r_data (cpu_i_r_data),
.mem_d_r_addr (cpu_d_r_addr),
.mem_d_r_data (cpu_d_r_data),
.mem_d_w_addr (cpu_d_w_addr),
.mem_d_w_data (cpu_d_w_data),
.mem_d_we (cpu_d_we)
);
default_pe_code_mem
#(
.DATA_WIDTH (WIDTH_I),
.ADDR_WIDTH (DEPTH_I)
)
mem_i
(
.clk (clk),
.addr_r (cpu_i_r_addr),
.addr_w (addr_i_d2[DEPTH_I-1:0]),
.data_in (data_i_d2[WIDTH_I-1:0]),
.we (mem_i_we),
.data_out (cpu_i_r_data)
);
default_pe_data_mem
#(
.DATA_WIDTH (WIDTH_D),
.ADDR_WIDTH (DEPTH_D)
)
mem_d
(
.clk (clk),
.addr_r (mem_d_r_addr),
.addr_w (mem_d_w_addr),
.data_in (mem_d_w_data),
.we (mem_d_we),
.data_out (mem_d_r_data)
);
rw_port_ram
#(
.DATA_WIDTH (WIDTH_D),
.ADDR_WIDTH (DEPTH_M2S),
.RAM_TYPE (M2S_RAM_TYPE)
)
shared_m2s
(
.clk (clk),
.addr_r (mem_d_r_addr[DEPTH_M2S-1:0]),
.addr_w (addr_i_d2[DEPTH_M2S-1:0]),
.data_in (data_i_d2),
.we (shared_m2s_we),
.data_out (shared_m2s_r_data)
);
reg s2mfifo_we;
reg [WIDTH_D+DEPTH_V_F-1:0] s2mfifo_data_w;
wire [DEPTH_FIFO-1:0] s2mfifo_item_count;
fifo
#(
.WIDTH (WIDTH_D+DEPTH_V_F),
.DEPTH_IN_BITS (DEPTH_FIFO),
.MAX_ITEMS (((1 << DEPTH_FIFO) - 7)),
.RAM_TYPE (FIFO_RAM_TYPE)
)
s2mfifo
(
.clk (clk),
.reset (reset),
.req_r (fifo_req_r),
.we (s2mfifo_we),
.data_w (s2mfifo_data_w),
.data_r (fifo_r_data),
.valid_r (fifo_valid),
.full (),
.item_count (s2mfifo_item_count),
.empty ()
);
endmodule
mini16_soc.v : SoC
/*
Copyright (c) 2019, miya
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
module mini16_soc
#(
parameter CORES = 32,
parameter UART_CLK_HZ = 50000000,
parameter UART_SCLK_HZ = 115200,
parameter WIDTH_M_D = 32,
parameter WIDTH_P_D = 32,
parameter DEPTH_M_I = 11,
parameter DEPTH_M_D = 11,
parameter DEPTH_P_I = 10,
parameter DEPTH_P_D = 8,
parameter DEPTH_M2S = 8,
parameter DEPTH_FIFO = 4,
parameter DEPTH_S2M = 9,
parameter DEPTH_U2M = 11,
parameter VRAM_BPP = 3,
parameter VRAM_BPC = 1,
parameter VRAM_WIDTH_BITS = 8,
parameter VRAM_HEIGHT_BITS = 9,
parameter MASTER_REGFILE_RAM_TYPE = "auto",
parameter PE_REGFILE_RAM_TYPE = "auto",
parameter PE_FIFO_RAM_TYPE = "distributed",
parameter PE_M2S_RAM_TYPE = "auto",
parameter PE_DEPTH_REG = 5,
parameter PE_ENABLE_MVIL = 1'b1,
parameter PE_ENABLE_MUL = 1'b1,
parameter PE_ENABLE_MULTI_BIT_SHIFT = 1'b1,
parameter PE_ENABLE_MVC = 1'b1,
parameter PE_ENABLE_WA = 1'b1
)
(
input clk,
input reset,
`ifdef USE_UART
input uart_rxd,
output uart_txd,
`endif
`ifdef USE_VGA
input clkv,
input resetv,
output vga_hs,
output vga_vs,
output vga_de,
output vga_r,
output vga_g,
output vga_b,
`endif
output [15:0] led
);
// instruction width
localparam WIDTH_I = 16;
// register file depth
localparam DEPTH_REG = 5;
// I/O register depth
localparam DEPTH_IO_REG = 5;
localparam DEPTH_VRAM = (VRAM_WIDTH_BITS + VRAM_HEIGHT_BITS);
// UART I/O addr depth
localparam DEPTH_B_U = max(DEPTH_M_I, DEPTH_U2M);
// UART I/O Virtual memory depth
localparam DEPTH_V_U = (DEPTH_B_U + 2);
localparam CORE_BITS = $clog2(CORES + 6);
localparam DEPTH_B_F = max(DEPTH_VRAM, DEPTH_S2M);
localparam DEPTH_B_M2S = max(DEPTH_P_I, DEPTH_M2S);
localparam DEPTH_V_M2S = (DEPTH_B_M2S + 1);
// Master write addr depth
localparam DEPTH_B_M_W = max(DEPTH_V_M2S, max(DEPTH_M_D, DEPTH_IO_REG));
// Master read addr depth
localparam DEPTH_B_M_R = max(DEPTH_M_D, max(DEPTH_IO_REG, max(DEPTH_U2M, DEPTH_S2M)));
// Master virtual memory write depth
localparam DEPTH_V_M_W = (DEPTH_B_M_W + CORE_BITS);
// Master virtual memory read depth
localparam DEPTH_V_M_R = (DEPTH_B_M_R + 2);
localparam DEPTH_V_F = (DEPTH_B_F + 1);
localparam DEPTH_V_M = max(DEPTH_V_M_W, DEPTH_V_M_R);
localparam DEPTH_B_S_R = max(DEPTH_P_D, DEPTH_M2S);
localparam DEPTH_V_S_R = (DEPTH_B_S_R + 2);
localparam DEPTH_B_S_W = max(DEPTH_V_F, DEPTH_P_D);
localparam DEPTH_V_S_W = (DEPTH_B_S_W + 1);
localparam PE_ID_START = 4;
localparam MASTER_W_BANK_BC = ((1 << CORE_BITS) - 1);
localparam MASTER_W_BANK_MEM_D = 0;
localparam MASTER_W_BANK_IO_REG = 1;
localparam MASTER_R_BANK_MEM_D = 0;
localparam MASTER_R_BANK_IO_REG = 1;
localparam MASTER_R_BANK_U2M = 2;
localparam MASTER_R_BANK_S2M = 3;
localparam UART_IO_ADDR_RESET = ((1 << DEPTH_B_U) + 0);
localparam UART_BANK_MEM_I = 0;
localparam UART_BANK_U2M = 2;
localparam FIFO_BANK_S2M = 0;
localparam FIFO_BANK_VRAM = 1;
localparam IO_REG_R_UART_BUSY = 0;
localparam IO_REG_R_VGA_VSYNC = 1;
localparam IO_REG_R_VGA_VCOUNT = 2;
localparam IO_REG_W_RESET_PE = 0;
localparam IO_REG_W_LED = 1;
localparam IO_REG_W_UART = 2;
localparam IO_REG_W_SPRITE_X = 3;
localparam IO_REG_W_SPRITE_Y = 4;
localparam IO_REG_W_SPRITE_SCALE = 5;
localparam TRUE = 1'b1;
localparam FALSE = 1'b0;
localparam ONE = 1'd1;
localparam ZERO = 1'd0;
function integer max (input integer a1, input integer a2);
begin
if (a1 > a2)
begin
max = a1;
end
else
begin
max = a2;
end
end
endfunction
// LED
assign led = io_reg_w[IO_REG_W_LED];
// Master IO reg
reg [WIDTH_M_D-1:0] io_reg_r[0:((1 << DEPTH_IO_REG) - 1)];
reg [WIDTH_M_D-1:0] io_reg_w[0:((1 << DEPTH_IO_REG) - 1)];
// Master read
wire [DEPTH_V_M_R-DEPTH_B_M_R-1:0] master_d_r_bank;
assign master_d_r_bank = master_d_r_addr[DEPTH_V_M_R-1:DEPTH_B_M_R];
always @(posedge clk)
begin
case (master_d_r_bank)
MASTER_R_BANK_MEM_D:
begin
master_d_r_data <= master_mem_d_r_data;
end
MASTER_R_BANK_IO_REG:
begin
master_d_r_data <= io_reg_r[master_d_r_addr[DEPTH_IO_REG-1:0]];
end
`ifdef USE_UART
MASTER_R_BANK_U2M:
begin
master_d_r_data <= u2m_r_data;
end
`endif
default:
begin
master_d_r_data <= {{(WIDTH_M_D-WIDTH_P_D){1'b0}}, s2m_r_data};
end
endcase
end
// Master mem_d write
reg [DEPTH_V_M_W-1:0] master_d_w_addr_d1;
reg [WIDTH_M_D-1:0] master_d_w_data_d1;
reg master_d_we_d1;
always @(posedge clk)
begin
master_d_w_addr_d1 <= master_d_w_addr;
master_d_w_data_d1 <= master_d_w_data;
master_d_we_d1 <= master_d_we;
end
always @(posedge clk)
begin
if (reset == TRUE)
begin
master_mem_d_we <= FALSE;
end
else
begin
if ((master_d_we == TRUE) && (master_d_w_bank == MASTER_W_BANK_MEM_D))
begin
master_mem_d_we <= TRUE;
end
else
begin
master_mem_d_we <= FALSE;
end
end
end
// Master IO reg read
always @(posedge clk)
begin
`ifdef USE_UART
io_reg_r[IO_REG_R_UART_BUSY] <= uart_io_busy;
`endif
`ifdef USE_VGA
io_reg_r[IO_REG_R_VGA_VSYNC] <= vga_vsync;
io_reg_r[IO_REG_R_VGA_VCOUNT] <= vga_vcount;
`endif
end
// Master IO reg write
wire [WIDTH_M_D-1:0] io_reg_w_data;
wire [DEPTH_IO_REG-1:0] io_reg_w_addr;
reg io_reg_we;
assign io_reg_w_data = master_d_w_data_d1;
assign io_reg_w_addr = master_d_w_addr_d1[DEPTH_IO_REG-1:0];
always @(posedge clk)
begin
if ((master_d_we == TRUE) && (master_d_w_bank == MASTER_W_BANK_IO_REG))
begin
io_reg_we <= TRUE;
end
else
begin
io_reg_we <= FALSE;
end
if (io_reg_we == TRUE)
begin
io_reg_w[io_reg_w_addr] <= io_reg_w_data;
end
end
`ifdef USE_UART
// Master IO reg write: UART TX we
always @(posedge clk)
begin
if (reset == TRUE)
begin
uart_io_tx_we <= FALSE;
end
else
begin
if ((master_d_we == TRUE) && (master_d_w_addr == ((MASTER_W_BANK_IO_REG << DEPTH_B_M_W) + IO_REG_W_UART)))
begin
uart_io_tx_we <= TRUE;
end
else
begin
uart_io_tx_we <= FALSE;
end
end
end
`endif
// harvester
reg [DEPTH_V_F-1:0] s2m_w_addr;
reg [WIDTH_P_D-1:0] s2m_w_data;
reg s2m_we;
reg vram_we;
wire [DEPTH_V_F-DEPTH_B_F-1:0] harvester_w_bank;
assign harvester_w_bank = harvester_w_addr[DEPTH_V_F-1:DEPTH_B_F];
always @(posedge clk)
begin
s2m_w_addr <= harvester_w_addr;
s2m_w_data <= harvester_w_data;
if (harvester_we == TRUE)
begin
if (harvester_w_bank == FIFO_BANK_S2M)
begin
s2m_we <= TRUE;
vram_we <= FALSE;
end
else
begin
s2m_we <= FALSE;
vram_we <= TRUE;
end
end
else
begin
s2m_we <= FALSE;
vram_we <= FALSE;
end
end
wire harvester_r_valid [0:CORES-1];
wire [WIDTH_P_D+DEPTH_V_F-1:0] harvester_r_data [0:CORES-1];
wire [CORES-1:0] harvester_r_req;
wire [DEPTH_V_F-1:0] harvester_w_addr;
wire [WIDTH_P_D-1:0] harvester_w_data;
wire harvester_we;
wire [CORE_BITS-1:0] harvester_cs;
harvester
#(
.CORE_BITS (CORE_BITS),
.CORES (CORES),
.WIDTH (WIDTH_P_D),
.DEPTH (DEPTH_V_F)
)
harvester_0
(
.clk (clk),
.reset (reset),
.cs (harvester_cs),
.r_data (harvester_r_data[harvester_cs]),
.r_valid (harvester_r_valid[harvester_cs]),
.r_req (harvester_r_req),
.w_addr (harvester_w_addr),
.w_data (harvester_w_data),
.we (harvester_we)
);
wire [WIDTH_P_D-1:0] s2m_r_data;
rw_port_ram
#(
.DATA_WIDTH (WIDTH_P_D),
.ADDR_WIDTH (DEPTH_S2M)
)
shared_s2m
(
.clk (clk),
.addr_r (master_d_r_addr[DEPTH_S2M-1:0]),
.addr_w (s2m_w_addr[DEPTH_S2M-1:0]),
.data_in (s2m_w_data),
.we (s2m_we),
.data_out (s2m_r_data)
);
`ifdef USE_UART
// UART IO: write to mem_i
reg uart_io_tx_we;
wire uart_io_busy;
wire [31:0] uart_io_rx_addr;
wire [31:0] uart_io_rx_data;
reg [31:0] uart_io_rx_addr_d1;
reg [31:0] uart_io_rx_data_d1;
wire uart_io_rx_we;
reg master_mem_i_we;
wire [DEPTH_V_U-DEPTH_B_U-1:0] uart_io_rx_bank;
assign uart_io_rx_bank = uart_io_rx_addr[DEPTH_V_U-1:DEPTH_B_U];
always @(posedge clk)
begin
uart_io_rx_addr_d1 <= uart_io_rx_addr;
uart_io_rx_data_d1 <= uart_io_rx_data;
end
always @(posedge clk)
begin
if (reset == TRUE)
begin
master_mem_i_we <= FALSE;
end
else
begin
if ((uart_io_rx_we == TRUE) && (uart_io_rx_bank == UART_BANK_MEM_I))
begin
master_mem_i_we <= TRUE;
end
else
begin
master_mem_i_we <= FALSE;
end
end
end
// u2m write
always @(posedge clk)
begin
if (reset == TRUE)
begin
u2m_we <= FALSE;
end
else
begin
if ((uart_io_rx_we == TRUE) && (uart_io_rx_bank == UART_BANK_U2M))
begin
u2m_we <= TRUE;
end
else
begin
u2m_we <= FALSE;
end
end
end
// UART IO: reset master
reg reset_master;
always @(posedge clk)
begin
if (reset == TRUE)
begin
reset_master <= FALSE;
end
else
begin
if ((uart_io_rx_we == TRUE) && (uart_io_rx_addr == UART_IO_ADDR_RESET))
begin
reset_master <= uart_io_rx_data[0];
end
end
end
uart_io
#(
.CLK_HZ (UART_CLK_HZ),
.SCLK_HZ (UART_SCLK_HZ)
)
uart_io_0
(
.clk (clk),
.reset (reset),
.uart_rxd (uart_rxd),
.tx_data (io_reg_w[IO_REG_W_UART][7:0]),
.tx_we (uart_io_tx_we),
.uart_txd (uart_txd),
.uart_busy (uart_io_busy),
.rx_addr (uart_io_rx_addr),
.rx_data (uart_io_rx_data),
.rx_we (uart_io_rx_we)
);
`endif
`ifdef USE_VGA
// sprite
localparam SPRITE_BPP = 3;
wire [SPRITE_BPP-1:0] color_all;
// vga
wire vga_vsync;
wire [WIDTH_M_D-1:0] vga_vcount;
wire [32-1:0] ext_vga_count_h;
wire [32-1:0] ext_vga_count_v;
sprite
#(
.SPRITE_WIDTH_BITS (VRAM_WIDTH_BITS),
.SPRITE_HEIGHT_BITS (VRAM_HEIGHT_BITS),
.BPP (SPRITE_BPP)
)
sprite_0
(
.clk (clk),
.reset (reset),
.bitmap_length (),
.bitmap_address (s2m_w_addr[DEPTH_VRAM-1:0]),
.bitmap_din (s2m_w_data[VRAM_BPP-1:0]),
.bitmap_dout (),
.bitmap_we (vram_we),
.bitmap_oe (FALSE),
.x (io_reg_w[IO_REG_W_SPRITE_X]),
.y (io_reg_w[IO_REG_W_SPRITE_Y]),
.scale (io_reg_w[IO_REG_W_SPRITE_SCALE]),
.ext_clkv (clkv),
.ext_resetv (resetv),
.ext_color (color_all),
.ext_count_h (ext_vga_count_h),
.ext_count_v (ext_vga_count_v)
);
vga_iface
#(
.BPP (VRAM_BPP),
.BPC (VRAM_BPC)
)
vga_iface_0
(
.clk (clk),
.reset (reset),
.vsync (vga_vsync),
.vcount (vga_vcount),
.ext_clkv (clkv),
.ext_resetv (resetv),
.ext_color (color_all),
.ext_vga_hs (vga_hs),
.ext_vga_vs (vga_vs),
.ext_vga_de (vga_de),
.ext_vga_r (vga_r),
.ext_vga_g (vga_g),
.ext_vga_b (vga_b),
.ext_count_h (ext_vga_count_h),
.ext_count_v (ext_vga_count_v)
);
`endif
// Master core
wire [DEPTH_V_M_W-1:0] master_d_w_addr;
wire [WIDTH_M_D-1:0] master_d_w_data;
wire master_d_we;
wire [DEPTH_M_I-1:0] master_i_r_addr;
wire [WIDTH_I-1:0] master_i_r_data;
wire [DEPTH_V_M_R-1:0] master_d_r_addr;
reg [WIDTH_M_D-1:0] master_d_r_data;
wire [DEPTH_V_M_W-DEPTH_B_M_W-1:0] master_d_w_bank;
assign master_d_w_bank = master_d_w_addr[DEPTH_V_M_W-1:DEPTH_B_M_W];
mini16_cpu
#(
.WIDTH_I (WIDTH_I),
.WIDTH_D (WIDTH_M_D),
.DEPTH_I (DEPTH_M_I),
.DEPTH_D (DEPTH_V_M),
.DEPTH_REG (DEPTH_REG),
.ENABLE_MVIL (TRUE),
.ENABLE_MUL (TRUE),
.ENABLE_MULTI_BIT_SHIFT (TRUE),
.ENABLE_MVC (TRUE),
.ENABLE_WA (TRUE),
.ENABLE_INT (TRUE),
.FULL_PIPELINED_ALU (FALSE),
.REGFILE_RAM_TYPE (MASTER_REGFILE_RAM_TYPE)
)
mini16_cpu_master
(
.clk (clk),
`ifdef USE_UART
.soft_reset (reset_master),
`else
.soft_reset (FALSE),
`endif
.reset (reset),
.mem_i_r_addr (master_i_r_addr),
.mem_i_r_data (master_i_r_data),
.mem_d_r_addr (master_d_r_addr),
.mem_d_r_data (master_d_r_data),
.mem_d_w_addr (master_d_w_addr),
.mem_d_w_data (master_d_w_data),
.mem_d_we (master_d_we)
);
default_master_code_mem
#(
.DATA_WIDTH (WIDTH_I),
.ADDR_WIDTH (DEPTH_M_I)
)
master_mem_i
(
.clk (clk),
.addr_r (master_i_r_addr),
`ifdef USE_UART
.addr_w (uart_io_rx_addr_d1[DEPTH_M_I-1:0]),
.data_in (uart_io_rx_data_d1[WIDTH_I-1:0]),
.we (master_mem_i_we),
`else
.addr_w ({DEPTH_M_I{1'b0}}),
.data_in ({WIDTH_I{1'b0}}),
.we (FALSE),
`endif
.data_out (master_i_r_data)
);
wire [WIDTH_M_D-1:0] master_mem_d_r_data;
reg master_mem_d_we;
default_master_data_mem
#(
.DATA_WIDTH (WIDTH_M_D),
.ADDR_WIDTH (DEPTH_M_D)
)
master_mem_d
(
.clk (clk),
.addr_r (master_d_r_addr[DEPTH_M_D-1:0]),
.addr_w (master_d_w_addr_d1[DEPTH_M_D-1:0]),
.data_in (master_d_w_data_d1),
.we (master_mem_d_we),
.data_out (master_mem_d_r_data)
);
`ifdef USE_UART
reg u2m_we;
wire [WIDTH_M_D-1:0] u2m_r_data;
rw_port_ram
#(
.DATA_WIDTH (WIDTH_M_D),
.ADDR_WIDTH (DEPTH_U2M)
)
shared_u2m
(
.clk (clk),
.addr_r (master_d_r_addr[DEPTH_U2M-1:0]),
.addr_w (uart_io_rx_addr_d1[DEPTH_U2M-1:0]),
.data_in (uart_io_rx_data_d1[WIDTH_M_D-1:0]),
.we (u2m_we),
.data_out (u2m_r_data)
);
`endif
generate
genvar i;
for (i = 0; i < CORES; i = i + 1)
begin: mini16_pe_gen
mini16_pe
#(
.WIDTH_D (WIDTH_P_D),
.DEPTH_I (DEPTH_P_I),
.DEPTH_D (DEPTH_P_D),
.DEPTH_M2S (DEPTH_M2S),
.DEPTH_FIFO (DEPTH_FIFO),
.CORE_ID (i + PE_ID_START),
.MASTER_W_BANK_BC (MASTER_W_BANK_BC),
.DEPTH_V_F (DEPTH_V_F),
.DEPTH_B_F (DEPTH_B_F),
.DEPTH_V_M (DEPTH_V_M),
.DEPTH_B_M (DEPTH_B_M_W),
.DEPTH_V_S_R (DEPTH_V_S_R),
.DEPTH_B_S_R (DEPTH_B_S_R),
.DEPTH_V_S_W (DEPTH_V_S_W),
.DEPTH_B_S_W (DEPTH_B_S_W),
.DEPTH_V_M2S (DEPTH_V_M2S),
.DEPTH_B_M2S (DEPTH_B_M2S),
.FIFO_RAM_TYPE (PE_FIFO_RAM_TYPE),
.REGFILE_RAM_TYPE (PE_REGFILE_RAM_TYPE),
.M2S_RAM_TYPE (PE_M2S_RAM_TYPE),
.DEPTH_REG (PE_DEPTH_REG),
.ENABLE_MVIL (PE_ENABLE_MVIL),
.ENABLE_MUL (PE_ENABLE_MUL),
.ENABLE_MULTI_BIT_SHIFT (PE_ENABLE_MULTI_BIT_SHIFT),
.ENABLE_MVC (PE_ENABLE_MVC),
.ENABLE_WA (PE_ENABLE_WA)
)
mini16_pe_0
(
.clk (clk),
.reset (reset),
.soft_reset (io_reg_w[IO_REG_W_RESET_PE][0]),
.fifo_req_r (harvester_r_req[i]),
.fifo_valid (harvester_r_valid[i]),
.fifo_r_data (harvester_r_data[i]),
.addr_i (master_d_w_addr_d1),
.data_i (master_d_w_data_d1),
.we_i (master_d_we_d1)
);
end
endgenerate
endmodule
harvester.v : PEからのデータ転送処理
/*
Copyright (c) 2019, miya
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
module harvester
#(
parameter CORE_BITS = 8,
parameter CORES = 32,
parameter WIDTH = 32,
parameter DEPTH = 8
)
(
input clk,
input reset,
output [CORE_BITS-1:0] cs,
input [WIDTH+DEPTH-1:0] r_data,
input r_valid,
output reg [CORES-1:0] r_req,
output [DEPTH-1:0] w_addr,
output [WIDTH-1:0] w_data,
output reg we
);
localparam TRUE = 1'b1;
localparam FALSE = 1'b0;
localparam ONE = 1'd1;
localparam ZERO = 1'd0;
// fifo to s2m core select
reg [CORE_BITS-1:0] core;
reg [CORE_BITS-1:0] core_d1;
reg [CORE_BITS-1:0] core_d2;
reg [CORE_BITS-1:0] core_d3;
always @(posedge clk)
begin
core_d1 <= core;
core_d2 <= core_d1;
core_d3 <= core_d2;
if (reset == TRUE)
begin
core <= ZERO;
end
else
begin
if (core == CORES - 1)
begin
core <= ZERO;
end
else
begin
core <= core + ONE;
end
end
end
assign cs = core_d3;
assign w_addr = harvester_r_data_fetch_d1[WIDTH+DEPTH-1:WIDTH];
assign w_data = harvester_r_data_fetch_d1[WIDTH-1:0];
reg [WIDTH+DEPTH-1:0] harvester_r_data_fetch;
reg [WIDTH+DEPTH-1:0] harvester_r_data_fetch_d1;
reg r_valid_d1;
always @(posedge clk)
begin
r_req[core] <= TRUE;
r_req[core_d1] <= FALSE;
r_valid_d1 <= r_valid;
we <= r_valid_d1;
harvester_r_data_fetch <= r_data;
harvester_r_data_fetch_d1 <= harvester_r_data_fetch;
end
endmodule
asm/MasterProgram.java : マンデルブロ集合デモ:マスターコア用プログラム
/*
Copyright (c) 2019, miya
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
import java.lang.Math;
public class MasterProgram extends AsmLib
{
private int DEBUG = 0;
private int WAIT_VSYNC = 0;
private int VGA_HEIGHT_BITS = 9;
private int M2S_BC_ADDR_H;
private int M2S_BC_ADDR_SHIFT;
private int S2M_ADDR_H;
private int S2M_ADDR_SHIFT;
private int U2M_ADDR_H;
private int U2M_ADDR_SHIFT;
private int IO_REG_W_ADDR_H;
private int IO_REG_W_ADDR_SHIFT;
private int IO_REG_R_ADDR_H;
private int IO_REG_R_ADDR_SHIFT;
private void f_get_m2s_bc_addr()
{
// output: R3:m2s_bc_addr
int m2s_bc_addr = 3;
// m2s_bc_addr = M2S_BC_ADDR_H << M2S_BC_ADDR_SHIFT;
label("f_get_m2s_bc_addr");
lib_set_im(m2s_bc_addr, M2S_BC_ADDR_H);
as_sli(m2s_bc_addr, M2S_BC_ADDR_SHIFT);
lib_return();
}
private void f_get_m2s_core_addr()
{
// input: R3:core id(0-(N-1))
// output: R3:m2s_core_addr
int core_id = 3;
int m2s_core_addr = 3;
int tmp0 = SP_REG_MVIL;
// m2s_core_addr = ((core_id + PE_ID_START) << DEPTH_B_M_W) + (M2S_BANK_M2S << DEPTH_B_M2S);
label("f_get_m2s_core_addr");
as_addi(core_id, PE_ID_START);
lib_wait_dep_pre();
as_mvi(tmp0, M2S_BANK_M2S);
lib_wait_dep_post();
as_sli(m2s_core_addr, DEPTH_B_M_W);
lib_wait_dep_pre();
as_sli(tmp0, DEPTH_B_M2S);
lib_wait_dep_post();
as_add(m2s_core_addr, tmp0);
lib_return();
}
private void f_get_s2m_addr()
{
// output: R3:s2m_addr
int s2m_addr = 3;
// s2m_addr = S2M_ADDR_H << S2M_ADDR_SHIFT;
label("f_get_s2m_addr");
lib_set_im(s2m_addr, S2M_ADDR_H);
as_sli(s2m_addr, S2M_ADDR_SHIFT);
lib_return();
}
private void f_get_io_reg_w_addr()
{
// input: R3: device reg num
// output: R3:io_reg_w_addr
int io_reg_w_addr = 3;
int tmp0 = LREG0;
// io_reg_w_addr = (IO_REG_W_ADDR_H << IO_REG_W_ADDR_SHIFT) + R3;
label("f_get_io_reg_w_addr");
lib_set_im(tmp0, IO_REG_W_ADDR_H);
lib_wait_dep_pre();
as_sli(tmp0, IO_REG_W_ADDR_SHIFT);
lib_wait_dep_post();
as_add(io_reg_w_addr, tmp0);
lib_return();
}
private void f_get_io_reg_r_addr()
{
// input: R3: device reg num
// output: R3:io_reg_r_addr
int io_reg_r_addr = 3;
int tmp0 = LREG0;
// io_reg_r_addr = (IO_REG_R_ADDR_H << IO_REG_R_ADDR_SHIFT) + R3;
label("f_get_io_reg_r_addr");
lib_set_im(tmp0, IO_REG_R_ADDR_H);
lib_wait_dep_pre();
as_sli(tmp0, IO_REG_R_ADDR_SHIFT);
lib_wait_dep_post();
as_add(io_reg_r_addr, tmp0);
lib_return();
}
private void f_get_u2m_addr()
{
// output: R3:u2m_addr
int u2m_addr = 3;
// u2m_addr = U2M_ADDR_H << U2M_ADDR_SHIFT;
label("f_get_u2m_addr");
lib_set_im(u2m_addr, U2M_ADDR_H);
as_sli(u2m_addr, U2M_ADDR_SHIFT);
lib_return();
}
private void example_led()
{
/*
led_addr = (MASTER_W_BANK_IO_REG << DEPTH_B_M_W) + IO_REG_W_LED;
counter = 0;
shift = 18;
do
{
led = counter >> shift;
mem[led_addr] = led;
counter++;
} while (1);
*/
int led_addr = 3;
int counter = 4;
int shift = 5;
int led = 6;
as_nop();
lib_init_stack();
lib_set_im(R3, IO_REG_W_LED);
lib_call("f_get_io_reg_w_addr");
as_mvi(counter, 0);
lib_set_im(shift, 18);
lib_wait_dep_pre();
as_sli(led_addr, DEPTH_B_M_W);
lib_wait_dep_post();
lib_wait_dep_pre();
as_addi(led_addr, IO_REG_W_LED);
lib_wait_dep_post();
label("example_led_L_0");
as_mv(led, counter);
lib_wait_dep_pre();
as_addi(counter, 1);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sr(led, shift);
lib_wait_dep_post();
as_st(led_addr, led);
lib_ba("example_led_L_0");
// link library
f_get_io_reg_w_addr();
}
private void example_helloworld()
{
as_nop();
lib_call("f_get_u2m_data");
lib_init_stack();
lib_wait_dep_pre();
as_mvi(R4, MASTER_R_BANK_U2M);
lib_wait_dep_post();
as_sli(R4, DEPTH_B_M_R);
lib_set_im(R3, addr_abs("d_helloworld"));
as_add(R3, R4);
lib_call("f_uart_print");
lib_call("f_halt");
// link library
f_uart_char();
f_uart_print();
f_halt();
f_get_u2m_data();
}
private void example_helloworld_data()
{
label("d_helloworld");
string_data32("Hello, world!\n");
}
private void f_reset_pe()
{
/*
addr_reset = MASTER_W_BANK_IO_REG;
addr_reset <<= DEPTH_B_M_W;
addr_reset += IO_REG_W_RESET_PE;
mem[addr_reset] = 1;
mem[addr_reset] = 0;
*/
int addr_reset = LREG0;
label("f_reset_pe");
lib_wait_dep_pre();
as_mvi(addr_reset, MASTER_W_BANK_IO_REG);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sli(addr_reset, DEPTH_B_M_W);
lib_wait_dep_post();
lib_wait_dep_pre();
as_addi(addr_reset, IO_REG_W_RESET_PE);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sti(addr_reset, 1);
lib_wait_dep_post();
as_sti(addr_reset, 0);
lib_return();
}
// copy data from U2M to MEM_D
// call before lib_init_stack()
public void f_get_u2m_data()
{
int addr_dst = LREG0;
int addr_src = LREG1;
int size = LREG2;
int data = LREG3;
label("f_get_u2m_data");
as_mvi(size, 1);
lib_wait_dep_pre();
as_mvi(addr_src, U2M_ADDR_H);
lib_wait_dep_post();
as_sli(addr_src, U2M_ADDR_SHIFT);
as_mvi(addr_dst, 0);
lib_wait_dep_pre();
as_sli(size, DEPTH_M_D);
lib_wait_dep_post();
label("f_get_u2m_data_L_0");
as_ld(data, addr_src);
as_subi(size, 1);
lib_wait_dep_pre();
as_addi(addr_src, 1);
lib_wait_dep_post();
as_st(addr_dst, data);
as_cnz(SP_REG_CP, size);
as_addi(addr_dst, 1);
lib_bc("f_get_u2m_data_L_0");
lib_return();
}
public void f_reset_vga()
{
/*
addr_ioreg = MASTER_W_BANK_IO_REG;
addr_ioreg <<= DEPTH_B_M_W;
addr_sp_x = addr_ioreg;
addr_sp_y = addr_ioreg;
addr_sp_s = addr_ioreg;
addr_sp_x += 3;
addr_sp_y += 4;
addr_sp_s += 5;
mem[addr_sp_x] = 0;
mem[addr_sp_y] = 0;
mem[addr_sp_s] = 12;
*/
int addr_ioreg = LREG0;
int addr_sp_x = LREG1;
int addr_sp_y = LREG2;
int addr_sp_s = LREG3;
int x = LREG5;
label("f_reset_vga");
lib_wait_dep_pre();
as_mvi(addr_ioreg, MASTER_W_BANK_IO_REG);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sli(addr_ioreg, DEPTH_B_M_W);
lib_wait_dep_post();
as_mv(addr_sp_x, addr_ioreg);
as_mv(addr_sp_y, addr_ioreg);
lib_wait_dep_pre();
as_mv(addr_sp_s, addr_ioreg);
lib_wait_dep_post();
as_addi(addr_sp_x, IO_REG_W_SPRITE_X);
as_addi(addr_sp_y, IO_REG_W_SPRITE_Y);
as_addi(addr_sp_s, IO_REG_W_SPRITE_SCALE);
lib_set_im(x, 64);
as_st(addr_sp_x, x);
as_sti(addr_sp_y, 0);
if (WIDTH_P_D == 32)
{
as_sti(addr_sp_s, 7);
}
else
{
as_sti(addr_sp_s, 5);
}
lib_return();
}
private void f_init_core_id()
{
/*
depends: f_get_m2s_core_addr()
*/
int addr_core_id = LREG0;
int next_core_offset = LREG1;
int i = LREG2;
int cores = LREG3;
int addr_cores = LREG4;
int para = LREG5;
/*
R3 = cores - 1;
lib_call("f_get_m2s_core_addr");
addr_core_id = R3;
addr_cores = R3 + 1;
next_core_offset = 1 << DEPTH_B_M_W;
i = CORES;
para = PARALLEL;
do
{
i--;
mem[addr_core_id] = i;
mem[addr_cores] = para;
addr_core_id -= next_core_offset;
addr_cores -= next_core_offset;
} while (i != 0);
*/
label("f_init_core_id");
lib_push(SP_REG_LINK);
lib_push(R3);
lib_set_im(cores, CORES);
lib_set_im(para, PARALLEL);
lib_set_im(R3, CORES - 1); // cores - 1
lib_call("f_get_m2s_core_addr");
as_mv(addr_core_id, R3);
as_mv(addr_cores, R3);
as_mvi(next_core_offset, 1);
lib_wait_dep_pre();
as_mv(i, cores);
lib_wait_dep_post();
as_sli(next_core_offset, DEPTH_B_M_W);
as_addi(addr_cores, 1);
label("f_init_core_id_L_0");
lib_wait_dep_pre();
as_subi(i, 1);
lib_wait_dep_post();
as_st(addr_core_id, i);
as_st(addr_cores, para);
as_sub(addr_core_id, next_core_offset);
as_sub(addr_cores, next_core_offset);
as_cnz(SP_REG_CP, i);
lib_bc("f_init_core_id_L_0");
lib_pop(R3);
lib_pop(SP_REG_LINK);
lib_return();
}
private void m_vga_flip(int reg_task_id)
{
int task_id = reg_task_id;
int addr_sp_y = LREG0;
int tmp0 = LREG1;
int page = LREG2;
/*
addr_sp_y = (MASTER_W_BANK_IO_REG << DEPTH_B_M_W) + IO_REG_W_SPRITE_Y;
page = -(((task_id & 1) ^ 1) << (IMAGE_HEIGHT_BITS + 1));
*addr_sp_y = page;
*/
lib_wait_dep_pre();
as_mvi(addr_sp_y, MASTER_W_BANK_IO_REG);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sli(addr_sp_y, DEPTH_B_M_W);
lib_wait_dep_post();
lib_wait_dep_pre();
as_addi(addr_sp_y, IO_REG_W_SPRITE_Y);
lib_wait_dep_post();
lib_wait_dep_pre();
as_mv(tmp0, task_id);
lib_wait_dep_post();
lib_wait_dep_pre();
as_andi(tmp0, 1);
lib_wait_dep_post();
lib_wait_dep_pre();
as_xori(tmp0, 1);
lib_wait_dep_post();
as_mvi(page, 0);
lib_wait_dep_pre();
// vga_height = 1 << VGA_HEIGHT_BITS
as_sli(tmp0, VGA_HEIGHT_BITS);
lib_wait_dep_post();
lib_wait_dep_pre();
// sp_y = 0(page0), -vga_height(page1)
as_sub(page, tmp0);
lib_wait_dep_post();
as_st(addr_sp_y, page);
}
private void m_wait_vsync()
{
/*
addr_vsync = (MASTER_R_BANK_IO_REG << DEPTH_B_M_R) + IO_REG_R_VGA_VSYNC;
vsync_pre = 0;
do
{
vsync = mem[addr_vsync];
vsync_start = ((vsync == 0) && (vsync_pre == 1));
vsync_pre = vsync;
} while (!vsync_start);
(!vsync_start = ((vsync == 1) || (vsync_pre == 0)))
*/
int addr_vsync = LREG0;
int vsync = LREG1;
int vsync_start = LREG2;
int vsync_pre = LREG3;
lib_wait_dep_pre();
as_mvi(addr_vsync, MASTER_R_BANK_IO_REG);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sli(addr_vsync, DEPTH_B_M_R);
lib_wait_dep_post();
as_mvi(vsync_pre, 0);
lib_wait_dep_pre();
as_addi(addr_vsync, IO_REG_R_VGA_VSYNC);
lib_wait_dep_post();
label("m_wait_vsync_L_0");
lib_wait_dep_pre();
as_ld(vsync, addr_vsync);
lib_wait_dep_post();
as_cnz(vsync_start, vsync);
as_cnz(SP_REG_CP, vsync_pre);
lib_wait_dep_pre();
as_mv(vsync_pre, vsync);
lib_wait_dep_post();
lib_wait_dep_pre();
as_xori(SP_REG_CP, -1);
lib_wait_dep_post();
as_or(SP_REG_CP, vsync_start);
lib_bc("m_wait_vsync_L_0");
}
private void m_init_mandel_param()
{
/*
PE m2s memory map:
3: scale
4: cx
5: cy
*/
int addr_m2s_root = 3;
int addr_scale = LREG0;
int addr_cx = LREG1;
int addr_cy = LREG2;
int scale = LREG3;
int cx = LREG4;
int cy = LREG5;
as_mv(addr_scale, addr_m2s_root);
as_mv(addr_cx, addr_m2s_root);
as_mv(addr_cy, addr_m2s_root);
lib_ld(scale, "d_mandel_scale");
as_addi(addr_scale, 3);
as_addi(addr_cx, 4);
as_addi(addr_cy, 5);
lib_ld(cx, "d_mandel_cx");
lib_ld(cy, "d_mandel_cy");
lib_wait_dep_pre();
as_st(addr_scale, scale);
lib_wait_dep_post();
as_st(addr_cx, cx);
as_st(addr_cy, cy);
}
private void m_update_mandel_param()
{
int addr_m2s_root = 3;
int addr_scale = LREG0;
int scale = LREG1;
int scale_mask = LREG2;
/*
addr_scale = addr_m2s_root + 3;
scale -= 1;
if (scale == 0)
{
scale = 256;
}
m2s[addr_scale] = scale;
mem["d_mandel_scale"] = scale;
*/
as_mv(addr_scale, addr_m2s_root);
lib_ld(scale, "d_mandel_scale");
lib_wait_dep_pre();
as_addi(addr_scale, 3);
lib_wait_dep_post();
lib_wait_dep_pre();
as_subi(scale, 1);
lib_wait_dep_post();
lib_wait_dep_pre();
as_cnz(SP_REG_CP, scale);
lib_wait_dep_post();
lib_wait_dep_pre();
as_mvil(256);
lib_wait_dep_post();
lib_wait_dep_pre();
as_xori(SP_REG_CP, -1);
lib_wait_dep_post();
as_mvc(scale, SP_REG_MVIL);
as_st(addr_scale, scale);
lib_st("d_mandel_scale", scale);
}
private void master_thread_manager()
{
/*
PE m2s memory map:
0: core_id
1: parallel
2: task_id
user parameters
3: scale
4: cx
5: cy
s2m memory map:
0 - PARALLEL-1: Incremented task_id from PE
*/
int addr_m2s_root = 3;
int addr_s2m_root = 4;
int addr_task_id = 5;
int addr_s2m = 6;
int task_id = 7;
int pe_ack = 8;
int i = 9;
/*
if (ENABLE_UART == 1)
{
lib_call("f_get_u2m_data");
}
f_reset_vga();
init_core_id();
addr_m2s_root = M2S_BC_ADDR_H << M2S_BC_ADDR_SHIFT;
addr_task_id = addr_m2s_root + 2;
addr_s2m_root = MASTER_R_BANK_S2M << DEPTH_B_M_R;
m_init_mandel_param();
task_id = 0;
mem[addr_task_id] = task_id;
reset_pe();
do
{
i = PARALLEL;
task_id++;
addr_s2m = addr_s2m_root;
do
{
i--;
do
{
pe_ack = mem[addr_s2m] - task_id;
} while (pe_ack != 0)
addr_s2m++;
} while (i != 0)
m_wait_vsync();
m_vga_flip(task_id);
m_update_mandel_param();
mem[addr_task_id] = task_id;
} while (1);
*/
as_nop();
lib_init_stack();
if (ENABLE_UART == 1)
{
lib_call("f_get_u2m_data");
}
lib_call("f_reset_vga");
lib_call("f_init_core_id");
lib_set_im(addr_m2s_root, M2S_BC_ADDR_H);
as_mvi(task_id, 0);
lib_wait_dep_pre();
as_mvi(addr_s2m_root, MASTER_R_BANK_S2M);
lib_wait_dep_post();
as_sli(addr_s2m_root, DEPTH_B_M_R);
lib_wait_dep_pre();
as_sli(addr_m2s_root, M2S_BC_ADDR_SHIFT);
lib_wait_dep_post();
m_init_mandel_param();
lib_wait_dep_pre();
as_mv(addr_task_id, addr_m2s_root);
lib_wait_dep_post();
lib_wait_dep_pre();
as_addi(addr_task_id, 2);
lib_wait_dep_post();
as_st(addr_task_id, task_id);
lib_call("f_reset_pe");
label("master_thread_manager_L_0");
lib_set_im(i, PARALLEL);
as_addi(task_id, 1);
lib_wait_dep_pre();
as_mv(addr_s2m, addr_s2m_root);
lib_wait_dep_post();
label("master_thread_manager_L_1");
as_subi(i, 1);
label("master_thread_manager_L_2");
lib_wait_dep_pre();
as_ld(pe_ack, addr_s2m);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sub(pe_ack, task_id);
lib_wait_dep_post();
if (WIDTH_P_D < 32)
{
lib_wait_dep_pre();
as_andi(pe_ack, 1);
lib_wait_dep_post();
}
as_cnz(SP_REG_CP, pe_ack);
lib_bc("master_thread_manager_L_2");
as_addi(addr_s2m, 1);
as_cnz(SP_REG_CP, i);
lib_bc("master_thread_manager_L_1");
m_update_mandel_param();
if (DEBUG == 1)
{
lib_push(R3);
as_mv(R3, task_id);
lib_call("f_uart_hex_word_ln");
lib_wait_dep_pre();
as_mvi(R3, 1);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sli(R3, 15);
lib_wait_dep_post();
as_sli(R3, 6);
lib_call("f_wait");
lib_pop(R3);
}
if (WAIT_VSYNC == 1)
{
m_wait_vsync();
}
m_vga_flip(task_id);
as_st(addr_task_id, task_id);
lib_ba("master_thread_manager_L_0");
lib_call("f_halt");
// link library
f_halt();
f_init_core_id();
f_reset_pe();
f_get_m2s_core_addr();
f_reset_vga();
if (ENABLE_UART == 1)
{
f_get_u2m_data();
}
if (DEBUG == 1)
{
f_uart_char();
f_uart_hex();
f_uart_hex_word();
f_uart_hex_word_ln();
f_wait();
}
}
@Override
public void init(String[] args)
{
super.init(args);
M2S_BC_ADDR_SHIFT = DEPTH_B_M2S;
M2S_BC_ADDR_H = ((MASTER_W_BANK_BC << DEPTH_B_M_W) + (M2S_BANK_M2S << DEPTH_B_M2S)) >>> M2S_BC_ADDR_SHIFT;
S2M_ADDR_H = MASTER_R_BANK_S2M;
S2M_ADDR_SHIFT = DEPTH_B_M_R;
U2M_ADDR_H = MASTER_R_BANK_U2M;
U2M_ADDR_SHIFT = DEPTH_B_M_R;
IO_REG_W_ADDR_H = MASTER_W_BANK_IO_REG;
IO_REG_W_ADDR_SHIFT = DEPTH_B_M_W;
IO_REG_R_ADDR_H = MASTER_R_BANK_IO_REG;
IO_REG_R_ADDR_SHIFT = DEPTH_B_M_R;
}
@Override
public void program()
{
set_filename("default_master_code");
set_rom_width(WIDTH_I);
set_rom_depth(DEPTH_M_I);
//example_led();
//example_helloworld();
master_thread_manager();
}
@Override
public void data()
{
set_filename("default_master_data");
set_rom_width(WIDTH_M_D);
set_rom_depth(DEPTH_M_D);
label("d_rand");
dat(0xfc720c27);
label("d_mandel_scale");
dat(256);
label("d_mandel_cx");
dat(161 << 6);
label("d_mandel_cy");
dat(49 << 6);
example_helloworld_data();
}
}
asm/PEProgram.java : マンデルブロ集合デモ:PE用プログラム
/*
Copyright (c) 2019, miya
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
import java.lang.Math;
public class PEProgram extends AsmLib
{
private int FIFO_ADDR;
private int VRAM_ADDR_H;
private int VRAM_ADDR_SHIFT;
private int M2S_ADDR_H;
private int M2S_ADDR_SHIFT;
private int ITEM_COUNT_ADDR_H;
private int ITEM_COUNT_ADDR_SHIFT;
private int S2M_ADDR_H;
private int S2M_ADDR_SHIFT;
private int IMAGE_WIDTH_BITS;
private int IMAGE_HEIGHT_BITS;
private int IMAGE_WIDTH_HALF_BITS;
private int IMAGE_HEIGHT_HALF_BITS;
private int IMAGE_WIDTH;
private int IMAGE_HEIGHT;
private int IMAGE_WIDTH_HALF;
private int IMAGE_HEIGHT_HALF;
private void m_mandel_core()
{
int x = 11;
int y = 12;
int scale = 13;
int count = 14;
int cx = 15;
int cy = 16;
int a = 17;
int b = 18;
int aa = 19;
int bb = 20;
int c = 21;
int x1 = 22;
int y1 = 23;
int cmask = 24;
int max_c = 25;
int pc = 26;
int tmp1 = 27;
int tmp2 = 28;
int tmp3 = 29;
// const
int FIXED_BITS = 13;
int FIXED_BITS_M1 = 12;
int MAX_C = 4;
/*
a = 0;
b = 0;
aa = 0;
bb = 0;
scale = 256;
count = 100;
cmask = 252;
max_c = MAX_C << FIXED_BITS;
x1 = ((x - IMAGE_WIDTH_HALF) * scale) + cx;
y1 = ((y - IMAGE_HEIGHT_HALF) * scale) + cy;
do
{
pc = c;
b = ((a * b) >> FIXED_BITS_M1) - y1;
a = aa - bb - x1;
aa = (a * a) >> FIXED_BITS;
bb = (b * b) >> FIXED_BITS;
c = aa + bb;
count--;
x1 += scale;
pc -= c;
pc >>= 5;
limit = (c < MAX_C) && (count > 0) && (pc != 0);
} while (limit);
as_mvi(a, 0);
as_mvi(b, 0);
as_mvi(aa, 0);
as_mvi(bb, 0);
as_mv(x1, x);
as_mv(y1, y);
lib_set_im(count, 100);
lib_set_im(tmp1, IMAGE_WIDTH_HALF);
lib_set_im(tmp2, IMAGE_HEIGHT_HALF);
as_mvi(max_c, 4);
as_sli(max_c, FIXED_BITS);
as_sub(x1, tmp1);
as_sub(y1, tmp2);
as_mul(x1, scale);
as_mul(y1, scale);
as_add(x1, cx);
as_add(y1, cy);
label("m_mandel_L_0");
as_mv(pc, c);
as_mul(b, a);
as_srai(b, FIXED_BITS_M1);
as_sub(b, y1);
as_mv(a, aa);
as_sub(a, bb);
as_sub(a, x1);
as_mv(aa, a);
as_mul(aa, a);
as_sri(aa, FIXED_BITS);
as_mv(bb, b);
as_mul(bb, b);
as_sri(bb, FIXED_BITS);
as_mv(c, aa);
as_add(c, bb);
as_subi(count, 1);
as_add(x1, scale);
as_mv(tmp1, max_c);
as_sub(pc, c);
as_sub(tmp1, c);
as_sri(pc, 5);
as_cnm(SP_REG_CP, tmp1);
as_cnm(tmp2, count);
as_cnz(tmp3, pc);
as_and(SP_REG_CP, tmp2);
as_and(SP_REG_CP, tmp3);
lib_bc("m_mandel_L_0");
*/
as_mvi(a, 0);
as_mvi(b, 0);
as_mvi(aa, 0);
as_mvi(bb, 0);
as_mv(x1, x);
as_mv(y1, y);
lib_set_im(count, 100);
lib_set_im(tmp1, IMAGE_WIDTH_HALF);
lib_set_im(tmp2, IMAGE_HEIGHT_HALF);
lib_wait_dep_pre();
as_mvi(max_c, 4);
lib_wait_dep_post();
as_sli(max_c, FIXED_BITS);
as_sub(x1, tmp1);
lib_wait_dep_pre();
as_sub(y1, tmp2);
lib_wait_dep_post();
as_mul(x1, scale);
lib_wait_dep_pre();
as_mul(y1, scale);
lib_wait_dep_post();
as_add(x1, cx);
as_add(y1, cy);
label("m_mandel_core_L_0");
as_mv(pc, c);
lib_wait_dep_pre();
as_mul(b, a);
lib_wait_dep_post();
lib_wait_dep_pre();
as_srai(b, FIXED_BITS_M1);
lib_wait_dep_post();
as_sub(b, y1);
lib_wait_dep_pre();
as_mv(a, aa);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sub(a, bb);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sub(a, x1);
lib_wait_dep_post();
lib_wait_dep_pre();
as_mv(aa, a);
lib_wait_dep_post();
lib_wait_dep_pre();
as_mul(aa, a);
lib_wait_dep_post();
as_sri(aa, FIXED_BITS);
lib_wait_dep_pre();
as_mv(bb, b);
lib_wait_dep_post();
lib_wait_dep_pre();
as_mul(bb, b);
lib_wait_dep_post();
as_sri(bb, FIXED_BITS);
lib_wait_dep_pre();
as_mv(c, aa);
lib_wait_dep_post();
as_add(c, bb);
as_subi(count, 1);
as_add(x1, scale);
lib_wait_dep_pre();
as_mv(tmp1, max_c);
lib_wait_dep_post();
as_sub(pc, c);
lib_wait_dep_pre();
as_sub(tmp1, c);
lib_wait_dep_post();
as_sri(pc, 5);
as_cnm(SP_REG_CP, tmp1);
lib_wait_dep_pre();
as_cnm(tmp2, count);
lib_wait_dep_post();
as_cnz(tmp3, pc);
lib_wait_dep_pre();
as_and(SP_REG_CP, tmp2);
lib_wait_dep_post();
as_and(SP_REG_CP, tmp3);
lib_bc("m_mandel_core_L_0");
}
private void m_fill_vram()
{
int MAX_ITEM = 3;
int item_count_addr = 3;
int task_id = 4;
int my_core_id = 7;
int parallel = 8;
int vram_addr = 9;
int i = 10;
int page = 11;
int item_count = 11;
int tmp0 = 12;
/*
lib_push(vram_addr);
page = task_id & 1;
i = (1 << (IMAGE_WIDTH_BITS + IMAGE_HEIGHT_BITS)) - 1 - my_core_id;
vram_addr += (page << (IMAGE_WIDTH_BITS + IMAGE_HEIGHT_BITS)) + i;
item_count_addr = ITEM_COUNT_ADDR_H << ITEM_COUNT_ADDR_SHIFT;
do
{
// fifo full check
do
{
item_count = mem[item_count_addr];
item_count -= MAX_ITEM;
} while (item_count >= 0);
mem[vram_addr] = task_id;
vram_addr -= parallel;
i -= parallel;
} while (i >=0);
lib_pop(vram_addr);
*/
lib_push(vram_addr);
as_mvi(i, 1);
as_mv(page, task_id);
lib_set_im(tmp0, IMAGE_WIDTH_BITS + IMAGE_HEIGHT_BITS);
lib_sl(i, tmp0);
lib_wait_dep_pre();
as_andi(page, 1);
lib_wait_dep_post();
as_subi(i, 1);
lib_wait_dep_pre();
lib_sl(page, tmp0);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sub(i, my_core_id);
lib_wait_dep_post();
lib_wait_dep_pre();
as_add(page, i);
lib_wait_dep_post();
as_add(vram_addr, page);
lib_wait_dep_pre();
as_mvi(item_count_addr, ITEM_COUNT_ADDR_H);
lib_wait_dep_post();
lib_sli(item_count_addr, ITEM_COUNT_ADDR_SHIFT);
lib_wait_dependency();
label("m_fill_vram_L_0");
lib_wait_dep_pre();
as_ld(item_count, item_count_addr);
lib_wait_dep_post();
lib_wait_dep_pre();
as_subi(item_count, MAX_ITEM);
lib_wait_dep_post();
as_cnm(SP_REG_CP, item_count);
lib_bc("m_fill_vram_L_0");
as_st(vram_addr, task_id);
as_sub(vram_addr, parallel);
lib_wait_dep_pre();
as_sub(i, parallel);
lib_wait_dep_post();
as_cnm(SP_REG_CP, i);
lib_bc("m_fill_vram_L_0");
lib_pop(vram_addr);
}
private void m_mandel()
{
int task_id = 4;
int m2s_addr = 5;
int my_core_id = 7;
int parallel = 8;
int vram_addr = 9;
int i = 10;
int page = 11;
int x = 11;
int y = 12;
int scale = 13;
int count = 14;
int cx = 15;
int cy = 16;
// temp
int tmp0 = 17;
int param_addr = 17;
/*
lib_push_regs(4, 6); // push R4-R9
// get param
scale = mem[m2s_addr + 1];
cx = mem[m2s_addr + 2];
cy = mem[m2s_addr + 3];
page = task_id & 1;
vram_addr += (page << (IMAGE_WIDTH_BITS + IMAGE_HEIGHT_BITS)) + (1 << (IMAGE_WIDTH_BITS + IMAGE_HEIGHT_BITS)) - 1 - my_core_id;
i = (1 << (IMAGE_WIDTH_BITS + IMAGE_HEIGHT_BITS)) - 1 - my_core_id;
do
{
x = i & ((1 << IMAGE_WIDTH_BITS) - 1);
y = i >> IMAGE_WIDTH_BITS;
m_mandel_core();
mem[vram_addr] = count;
vram_addr -= parallel;
i -= parallel;
} while (i >=0);
lib_pop_regs(4, 6);
*/
lib_push_regs(4, 6);
// get param
lib_wait_dep_pre();
as_mv(param_addr, m2s_addr);
lib_wait_dep_post();
lib_wait_dep_pre();
as_addi(param_addr, 1);
lib_wait_dep_post();
as_ld(scale, param_addr);
lib_wait_dep_pre();
as_addi(param_addr, 1);
lib_wait_dep_post();
as_ld(cx, param_addr);
lib_wait_dep_pre();
as_addi(param_addr, 1);
lib_wait_dep_post();
as_ld(cy, param_addr);
as_mvi(i, 1);
as_mv(page, task_id);
as_mvi(tmp0, 1);
lib_mvil(IMAGE_WIDTH_BITS + IMAGE_HEIGHT_BITS);
as_sl(i, SP_REG_MVIL);
as_sl(tmp0, SP_REG_MVIL);
lib_wait_dep_pre();
as_andi(page, 1);
lib_wait_dep_post();
as_subi(i, 1);
lib_wait_dep_pre();
as_sl(page, SP_REG_MVIL);
lib_wait_dep_post();
as_sub(i, my_core_id);
lib_wait_dep_pre();
as_add(page, tmp0);
lib_wait_dep_post();
lib_wait_dep_pre();
as_subi(page, 1);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sub(page, my_core_id);
lib_wait_dep_post();
as_add(vram_addr, page);
label("m_mandel_L_0");
as_mv(x, i);
as_mv(y, i);
lib_set_im(tmp0, (1 << IMAGE_WIDTH_BITS) - 1);
lib_wait_dep_pre();
as_sri(y, IMAGE_WIDTH_BITS);
lib_wait_dep_post();
lib_wait_dep_pre();
as_and(x, tmp0);
lib_wait_dep_post();
m_mandel_core();
as_st(vram_addr, count);
as_sub(vram_addr, parallel);
lib_wait_dep_pre();
as_sub(i, parallel);
lib_wait_dep_post();
as_cnm(SP_REG_CP, i);
lib_bc("m_mandel_L_0");
lib_pop_regs(4, 6);
}
private void pe_thread_manager()
{
int task_id = 4;
int m2s_addr = 5;
int s2m_addr = 6;
int my_core_id = 7;
int parallel = 8;
int vram_addr = 9;
// temp
int master_task_id = 10;
int diff = 11;
/*
as_nop();
lib_init_stack();
m2s_addr = m_get_m2s_addr();
vram_addr = m_get_vram_addr();
s2m_addr = m_get_s2m_addr();
my_core_id = mem[m2s_addr];
s2m_addr += my_core_id;
m2s_addr++;
parallel = mem[m2s_addr];
m2s_addr++;
task_id = mem[m2s_addr];
if (my_core_id >= parallel) goto "pe_thread_manager_L_end"
do
{
task_id++;
mem[s2m_addr] = task_id;
do
{
master_task_id = mem[m2s_addr];
diff = master_task_id - task_id;
} while (diff != 0);
m_mandel();
} (1);
*/
as_nop();
lib_init_stack();
// get m2s,vram,s2m addr
as_mvi(m2s_addr, M2S_ADDR_H);
as_mvi(s2m_addr, S2M_ADDR_H);
lib_wait_dep_pre();
as_mvi(vram_addr, VRAM_ADDR_H);
lib_wait_dep_post();
lib_sli(m2s_addr, M2S_ADDR_SHIFT);
lib_sli(s2m_addr, S2M_ADDR_SHIFT);
lib_sli(vram_addr, VRAM_ADDR_SHIFT);
lib_wait_dependency();
as_ld(my_core_id, m2s_addr);
lib_wait_dep_pre();
as_addi(m2s_addr, 1);
lib_wait_dep_post();
as_mv(diff, my_core_id);
as_add(s2m_addr, my_core_id);
as_ld(parallel, m2s_addr);
lib_wait_dep_pre();
as_addi(m2s_addr, 1);
lib_wait_dep_post();
as_ld(task_id, m2s_addr);
lib_wait_dep_pre();
as_sub(diff, parallel);
lib_wait_dep_post();
as_cnm(SP_REG_CP, diff);
lib_bc("pe_thread_manager_L_end");
label("pe_thread_manager_L_0");
lib_wait_dep_pre();
as_addi(task_id, 1);
lib_wait_dep_post();
as_st(s2m_addr, task_id);
label("pe_thread_manager_L_1");
lib_wait_dep_pre();
as_ld(master_task_id, m2s_addr);
lib_wait_dep_post();
lib_wait_dep_pre();
as_mv(diff, master_task_id);
lib_wait_dep_post();
lib_wait_dep_pre();
as_sub(diff, task_id);
lib_wait_dep_post();
as_cnz(SP_REG_CP, diff);
lib_bc("pe_thread_manager_L_1");
if (WIDTH_P_D == 32)
{
m_mandel();
}
else
{
m_fill_vram();
}
lib_ba("pe_thread_manager_L_0");
label("pe_thread_manager_L_end");
lib_call("f_halt");
// link
f_halt();
}
@Override
public void init(String[] args)
{
super.init(args);
DEPTH_REG = opts.getIntValue("pe_depth_reg");
REGS = (1 << DEPTH_REG);
SP_REG_STACK_POINTER = (REGS - 1);
STACK_ADDRESS = ((1 << DEPTH_P_D) - 1);
ENABLE_MVIL = opts.getIntValue("pe_enable_mvil");
ENABLE_MUL = opts.getIntValue("pe_enable_mul");
ENABLE_MVC = opts.getIntValue("pe_enable_mvc");
ENABLE_WA = opts.getIntValue("pe_enable_wa");
ENABLE_MULTI_BIT_SHIFT = opts.getIntValue("pe_enable_multi_bit_shift");
LREG0 = opts.getIntValue("lreg_start") + 0;
LREG1 = opts.getIntValue("lreg_start") + 1;
LREG2 = opts.getIntValue("lreg_start") + 2;
LREG3 = opts.getIntValue("lreg_start") + 3;
LREG4 = opts.getIntValue("lreg_start") + 4;
LREG5 = opts.getIntValue("lreg_start") + 5;
LREG6 = opts.getIntValue("lreg_start") + 6;
FIFO_ADDR = (PE_W_BANK_FIFO << DEPTH_B_S_W);
VRAM_ADDR_SHIFT = DEPTH_B_S_W - 3;
VRAM_ADDR_H = ((FIFO_ADDR + (FIFO_BANK_VRAM << DEPTH_B_F)) >>> VRAM_ADDR_SHIFT);
M2S_ADDR_H = PE_R_BANK_M2S;
M2S_ADDR_SHIFT = DEPTH_B_S_R;
ITEM_COUNT_ADDR_H = PE_R_BANK_ITEM_COUNT;
ITEM_COUNT_ADDR_SHIFT = DEPTH_B_S_R;
S2M_ADDR_SHIFT = DEPTH_B_S_W - 3;
S2M_ADDR_H = ((FIFO_ADDR + (FIFO_BANK_S2M << DEPTH_B_F)) >>> S2M_ADDR_SHIFT);
IMAGE_WIDTH_BITS = opts.getIntValue("image_width_bits");
IMAGE_HEIGHT_BITS = opts.getIntValue("image_height_bits");
IMAGE_WIDTH_HALF_BITS = (IMAGE_WIDTH_BITS - 1);
IMAGE_HEIGHT_HALF_BITS = (IMAGE_HEIGHT_BITS - 1);
IMAGE_WIDTH = (1 << IMAGE_WIDTH_BITS);
IMAGE_HEIGHT = (1 << IMAGE_HEIGHT_BITS);
IMAGE_WIDTH_HALF = (1 << IMAGE_WIDTH_HALF_BITS);
IMAGE_HEIGHT_HALF = (1 << IMAGE_HEIGHT_HALF_BITS);
}
@Override
public void program()
{
set_filename("default_pe_code");
set_rom_width(WIDTH_I);
set_rom_depth(DEPTH_P_I);
pe_thread_manager();
// link
if (ENABLE_MULTI_BIT_SHIFT != 1)
{
f_lib_sl();
}
}
@Override
public void data()
{
set_filename("default_pe_data");
set_rom_width(WIDTH_P_D);
set_rom_depth(DEPTH_P_D);
label("d_rand");
dat(0xfc720c27);
}
}